Multiple Comparisons of Abundance and Spatial Scores
In many biological analyses, researchers focus on comparing a single condition or cell type to a specific reference group. For example, they might compare B cells to CD4 T cells or examine treated versus control monocytes. However, there are scenarios where investigating and comparing multiple experimental conditions, cell types, or time points against a single reference group can yield valuable insights. Traditionally, one approach to achieve this has been to manually loop over samples or conditions in the metadata slot of a PXL file, running multiple tests in succession. While effective, this method can be time-consuming and prone to error.
The recently added RunDAA
function, as well as updated pixelatorR
functions RunDPA
and RunDCA
now offer a more streamlined approach to
do this for the abundance and spatial score testing. These functions now
allow for multi-group comparisons without the need to set up complex
looping code. Moreover, they automatically adjust p-values to account
for multiple comparisons, ensuring statistical rigor.
By the end of this tutorial, you will be able to:
- Perform differential abundance, polarization, and colocalization tests comparing multiple groups to a single reference group.
- Utilize the enhanced capabilities of
RunDAA
,RunDPA
andRunDCA
for efficient multi-group analyses. - Interpret results from these multiple comparisons, taking into account the adjusted p-values.
Setup
library(pixelatorR)
library(dplyr)
library(tibble)
library(stringr)
library(tidyr)
library(SeuratObject)
library(Seurat)
library(ggplot2)
library(here)
library(ggrepel)
Loading data
In this tutorial, we’ll begin by downloading and loading the necessary PXL files. To demonstrate the multiple comparison functionality, we’ll revisit the uropod data first introduced in the spatial analysis tutorial. As a refresher, this experiment involves T cells immobilized with CD54 and stimulated with either RANTES or MCP1 to induce uropod formation. A uropod is a protrusion on the cell surface that aids in cellular movement. During uropod formation, certain surface proteins migrate to one pole of the cell, creating a bulge. This process can be captured by MPX and is evident in the data as increased polarization and colocalization of the proteins involved in uropod formation.
DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)
Open a terminal and download the data using the following commands:
BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/0.19.x/uropod-t-cells-v1.0-immunology-I"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl"
And then proceed in R.
# The folders the data is located in:
data_files <-
c(Control = file.path(DATA_DIR, "Uropod_CD54_fixed.layout.dataset.pxl"),
Rantes = file.path(DATA_DIR, "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"),
Mcpi = file.path(DATA_DIR, "Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl"))
# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)
# Combine data to Seurat object
obj <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data)
)
# Add sample column to meta data
obj <-
AddMetaData(obj,
metadata = str_remove(colnames(obj), "_.*"),
col.name = "sample")
Since quality control and filtering are not the primary focus of this tutorial, we will only perform minimal steps in these areas. However, it’s important to note that thorough quality control and appropriate filtering are crucial for ensuring the reliability and accuracy of your analysis results. For this demonstration, we’ll implement basic quality control measures and apply some essential filtering criteria to prepare our data for further analysis.
QC & Filter Data
edgerank_plot <- MoleculeRankPlot(obj, group_by = "sample")
edgerank_plot
Prior to downstream analysis, we will remove components with less than 3000 molecules as well as the antibody count outliers.
# Molecule cutoff
cutoff <- 3000
# Define filters
cells_to_keep <- obj[[]] %>%
as_tibble(rownames = "component") %>%
group_by(sample) %>%
mutate(rank = rank(-molecules, ties.method = "random")) %>%
filter(molecules >= cutoff,
tau_type == "normal")
# Filter data
obj <- obj %>%
subset(cells = cells_to_keep$component)
Differential Abundance
Before running the differential abundance test, we normalize the data.
obj <- NormalizeMPX(obj, method = "dsb")
To perform differential abundance testing across multiple conditions, we
use the RunDAA
function from pixelatorR
, which is a convenient
wrapper around the FindMarkers
function from Seurat
. While
FindMarkers
is designed for comparing one experimental group to a
reference, we extended its use for multiple comparisons. This allows us
to efficiently compare various conditions to a single reference group
(‘contrasts’) without modifying the original metadata.
The output of RunDAA
is a dataframe containing differential abundance
results for all specified contrasts. In addition, several common methods
are available to adjust p-values to account for the increased number of
comparisons resulting from this new functionality. Overall, this
approach streamlines the analysis process and facilitates easier
interpretation of results across multiple experimental conditions.
targets <- c("Rantes", "Mcpi")
reference <- "Control"
DAA_res <- RunDAA(
obj,
contrast_column = "sample",
targets = targets,
reference = reference,
p_adjust_method = "BH",
mean_fxn = rowMeans,
fc_name = "difference"
)
DAA_res %>%
arrange(p_adj) %>%
head()
# A tibble: 6 × 8
marker p p_adj difference pct_1 pct_2 target reference
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 CD43 2.34e-101 3.74e-99 -1.75 0.264 0.921 Rantes Control
2 CD50 8.71e- 62 6.97e-60 0.854 0.837 0.441 Rantes Control
3 CD162 2.70e- 54 1.44e-52 0.785 0.784 0.371 Rantes Control
4 CD18 2.45e- 24 9.82e-23 0.324 0.861 0.711 Rantes Control
5 CD53 5.36e- 21 1.72e-19 0.395 0.82 0.609 Rantes Control
6 CD29 1.48e- 19 3.94e-18 0.410 0.789 0.671 Rantes Control
We can visualize the results of this differential analysis as a volcano plot using functions presented in the Abundance Normalization tutorial.
ggplot(DAA_res, aes(x = difference, y = -log10(p_adj), color = difference)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-0.25,0.25), col = "grey") +
coord_cartesian(xlim = c(-1.5,1.5)) +
theme_bw() +
geom_text(aes(label=ifelse(abs(difference)>0.25,as.character(marker),'')),hjust=-0.2,vjust=1) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
facet_grid(vars(target), scales = "free")
Differential Polarization
In this section, we’ll explore how to perform differential polarization testing, comparing both RANTES and MCP1 treatments against a control reference. While typical workflows often begin with filtering steps to remove lowly abundant markers, we’ll omit that for this demonstration.
We’ll highlight a new feature of the RunDPA
function from the
pixelatorR
package. As mentioned before, comparing multiple conditions
to a single reference group used to require iterative approaches and/or
data subsetting. However, the updated RunDPA
function streamlines this
process and now accepts multiple experimental conditions to be tested
against a single reference group in one operation. This new
functionality not only saves time but also automatically handles p-value
adjustments for multiple comparisons. By leveraging this feature, we can
simultaneously assess the differential polarization effects of both
RANTES and MCP1 treatments compared to the control condition.
targets <- c("Rantes", "Mcpi")
reference <- "Control"
DPA_res <- RunDPA(
obj,
contrast_column = "sample",
targets = targets,
reference = reference,
alternative = "two.sided",
polarity_metric = "morans_z",
p_adjust_method = "BH"
)
- Rantes vs Control
- Mcpi vs Control
DPA_res %>%
arrange(p_adj) %>%
head()
# A tibble: 6 × 14
estimate data_type target reference n1 n2 statistic p p_adj
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 3.49 morans_z Rantes Control 407 548 173494 6.20e-49 9.92e-47
2 2.65 morans_z Rantes Control 413 496 142245 5.34e-24 4.27e-22
3 1.48 morans_z Mcpi Control 409 548 147577 4.67e-17 2.49e-15
4 1.09 morans_z Rantes Control 415 581 152673 7.22e-13 2.89e-11
5 0.986 morans_z Rantes Control 417 598 155618 1.67e-11 5.34e-10
6 0.792 morans_z Rantes Control 413 591 150928 1.67e-10 4.45e- 9
# ℹ 5 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
# alternative <chr>, marker <chr>
To help with the interpretation of the results, this function will output informative messages informing the user when certain tests have not been performed (for example due to a lack of observations in a condition).
We can visualize the output of these tests next to each other in a facetted volcano plot.
ggplot(DPA_res, aes(x = estimate, y = -log10(p_adj), color = estimate)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-0.25,0.25), col = "grey") +
coord_cartesian(xlim = c(-2,3.2)) +
theme_bw() +
geom_text(aes(label=ifelse(abs(estimate)>1,as.character(marker),'')),hjust=-0.2,vjust=1) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
facet_grid(vars(target), scales = "free")
Differential Colocalization
Similarly, we can run multiple differential colocalization tests with
RunDCA
.
targets <- c("Rantes", "Mcpi")
reference <- "Control"
DCA_res <- RunDCA(
obj,
contrast_column = "sample",
targets = c("Mcpi", "Rantes"),
reference = "Control",
alternative = "two.sided",
polarity_metric = "morans_z",
p_adjust_method = "BH",
cl = 4
)
- Mcpi vs Control
- Rantes vs Control
DCA_res %>%
arrange(p_adj) %>%
head()
# A tibble: 6 × 15
estimate data_type target reference n1 n2 statistic p p_adj
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 7.32 pearson_z Rantes Control 406 472 138420 5.82e-30 3.68e-26
2 5.16 pearson_z Rantes Control 417 598 168134 3.19e-21 1.01e-17
3 3.92 pearson_z Rantes Control 416 597 163662 6.73e-18 1.42e-14
4 -3.93 pearson_z Rantes Control 407 548 75944 3.20e-17 5.06e-14
5 5.45 pearson_z Rantes Control 383 496 125037 8.16e-16 1.03e-12
6 -3.45 pearson_z Rantes Control 230 584 43994 1.72e-14 1.81e-11
# ℹ 6 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
# alternative <chr>, marker_1 <chr>, marker_2 <chr>
Side-by-side visualization of the volcano plots is easily achieved by facetting on the targets.
# Volcano plot
ColocalizationScores(obj, meta_data_columns = "sample") %>%
group_by(marker_1, marker_2) %>%
summarize(coloc_mean = mean(pearson_z), .groups = "drop") %>%
left_join(DCA_res, by = c("marker_1", "marker_2")) %>%
ggplot(aes(estimate, -log10(p_adj), label = paste(marker_1, marker_2, sep = "/"), color = coloc_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 5) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = -10) +
scale_x_continuous(expand = expansion(0.1)) +
scale_y_continuous(expand = expansion(0.1)) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean colocalization score"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
) +
facet_wrap(vars(target), scales = "free")
This concludes this short tutorial on performing multiple comparisons of abundance, polarization, and colocalization. Now, you’ll be able to perform multiple comparisons simultaneously, which significantly accelerates the analysis process for more complex experimental designs. The combination of speed, accuracy, and statistical robustness provided by this approach will help ensure that your results are not only more reliable but also more likely to stand up to peer review and replication efforts.
Remember, while this method is powerful, it’s always important to consider the biological context of your experiments and interpret the results in light of your specific research questions and hypotheses.