Skip to main content
Version: 0.19.x

Multiple Comparisons of Abundance and Spatial Scores

In many biological analyses, researchers focus on comparing a single condition or cell type to a specific reference group. For example, they might compare B cells to CD4 T cells or examine treated versus control monocytes. However, there are scenarios where investigating and comparing multiple experimental conditions, cell types, or time points against a single reference group can yield valuable insights. Traditionally, one approach to achieve this has been to manually loop over samples or conditions in the metadata slot of a PXL file, running multiple tests in succession. While effective, this method can be time-consuming and prone to error.

The recently added RunDAA function, as well as updated pixelatorR functions RunDPA and RunDCA now offer a more streamlined approach to do this for the abundance and spatial score testing. These functions now allow for multi-group comparisons without the need to set up complex looping code. Moreover, they automatically adjust p-values to account for multiple comparisons, ensuring statistical rigor.

By the end of this tutorial, you will be able to:

  • Perform differential abundance, polarization, and colocalization tests comparing multiple groups to a single reference group.
  • Utilize the enhanced capabilities of RunDAA, RunDPA and RunDCA for efficient multi-group analyses.
  • Interpret results from these multiple comparisons, taking into account the adjusted p-values.

Setup

library(pixelatorR)
library(dplyr)
library(tibble)
library(stringr)
library(tidyr)
library(SeuratObject)
library(Seurat)
library(ggplot2)
library(here)
library(ggrepel)

Loading data

In this tutorial, we’ll begin by downloading and loading the necessary PXL files. To demonstrate the multiple comparison functionality, we’ll revisit the uropod data first introduced in the spatial analysis tutorial. As a refresher, this experiment involves T cells immobilized with CD54 and stimulated with either RANTES or MCP1 to induce uropod formation. A uropod is a protrusion on the cell surface that aids in cellular movement. During uropod formation, certain surface proteins migrate to one pole of the cell, creating a bulge. This process can be captured by MPX and is evident in the data as increased polarization and colocalization of the proteins involved in uropod formation.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)

Open a terminal and download the data using the following commands:

BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/0.19.x/uropod-t-cells-v1.0-immunology-I"

curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl"

And then proceed in R.

# The folders the data is located in:
data_files <-
c(Control = file.path(DATA_DIR, "Uropod_CD54_fixed.layout.dataset.pxl"),
Rantes = file.path(DATA_DIR, "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"),
Mcpi = file.path(DATA_DIR, "Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl"))

# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)

# Combine data to Seurat object
obj <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data)
)

# Add sample column to meta data
obj <-
AddMetaData(obj,
metadata = str_remove(colnames(obj), "_.*"),
col.name = "sample")

Since quality control and filtering are not the primary focus of this tutorial, we will only perform minimal steps in these areas. However, it’s important to note that thorough quality control and appropriate filtering are crucial for ensuring the reliability and accuracy of your analysis results. For this demonstration, we’ll implement basic quality control measures and apply some essential filtering criteria to prepare our data for further analysis.

QC & Filter Data

edgerank_plot <- MoleculeRankPlot(obj, group_by = "sample")
edgerank_plot

Prior to downstream analysis, we will remove components with less than 3000 molecules as well as the antibody count outliers.

# Molecule cutoff
cutoff <- 3000

# Define filters
cells_to_keep <- obj[[]] %>%
as_tibble(rownames = "component") %>%
group_by(sample) %>%
mutate(rank = rank(-molecules, ties.method = "random")) %>%
filter(molecules >= cutoff,
tau_type == "normal")

# Filter data
obj <- obj %>%
subset(cells = cells_to_keep$component)

Differential Abundance

Before running the differential abundance test, we normalize the data.

obj <- NormalizeMPX(obj, method = "dsb")

To perform differential abundance testing across multiple conditions, we use the RunDAA function from pixelatorR, which is a convenient wrapper around the FindMarkers function from Seurat. While FindMarkers is designed for comparing one experimental group to a reference, we extended its use for multiple comparisons. This allows us to efficiently compare various conditions to a single reference group (‘contrasts’) without modifying the original metadata.

The output of RunDAA is a dataframe containing differential abundance results for all specified contrasts. In addition, several common methods are available to adjust p-values to account for the increased number of comparisons resulting from this new functionality. Overall, this approach streamlines the analysis process and facilitates easier interpretation of results across multiple experimental conditions.

targets <- c("Rantes", "Mcpi")
reference <- "Control"

DAA_res <- RunDAA(
obj,
contrast_column = "sample",
targets = targets,
reference = reference,
p_adjust_method = "BH",
mean_fxn = rowMeans,
fc_name = "difference"
)

DAA_res %>%
arrange(p_adj) %>%
head()
# A tibble: 6 × 8
marker p p_adj difference pct_1 pct_2 target reference
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 CD43 2.34e-101 3.74e-99 -1.75 0.264 0.921 Rantes Control
2 CD50 8.71e- 62 6.97e-60 0.854 0.837 0.441 Rantes Control
3 CD162 2.70e- 54 1.44e-52 0.785 0.784 0.371 Rantes Control
4 CD18 2.45e- 24 9.82e-23 0.324 0.861 0.711 Rantes Control
5 CD53 5.36e- 21 1.72e-19 0.395 0.82 0.609 Rantes Control
6 CD29 1.48e- 19 3.94e-18 0.410 0.789 0.671 Rantes Control

We can visualize the results of this differential analysis as a volcano plot using functions presented in the Abundance Normalization tutorial.

ggplot(DAA_res, aes(x = difference, y = -log10(p_adj), color = difference)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-0.25,0.25), col = "grey") +
coord_cartesian(xlim = c(-1.5,1.5)) +
theme_bw() +
geom_text(aes(label=ifelse(abs(difference)>0.25,as.character(marker),'')),hjust=-0.2,vjust=1) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
facet_grid(vars(target), scales = "free")

Differential Polarization

In this section, we’ll explore how to perform differential polarization testing, comparing both RANTES and MCP1 treatments against a control reference. While typical workflows often begin with filtering steps to remove lowly abundant markers, we’ll omit that for this demonstration.

We’ll highlight a new feature of the RunDPA function from the pixelatorR package. As mentioned before, comparing multiple conditions to a single reference group used to require iterative approaches and/or data subsetting. However, the updated RunDPA function streamlines this process and now accepts multiple experimental conditions to be tested against a single reference group in one operation. This new functionality not only saves time but also automatically handles p-value adjustments for multiple comparisons. By leveraging this feature, we can simultaneously assess the differential polarization effects of both RANTES and MCP1 treatments compared to the control condition.

targets <- c("Rantes", "Mcpi")
reference <- "Control"

DPA_res <- RunDPA(
obj,
contrast_column = "sample",
targets = targets,
reference = reference,
alternative = "two.sided",
polarity_metric = "morans_z",
p_adjust_method = "BH"
)
  - Rantes vs Control
- Mcpi vs Control
DPA_res %>%  
arrange(p_adj) %>%
head()
# A tibble: 6 × 14
estimate data_type target reference n1 n2 statistic p p_adj
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 3.49 morans_z Rantes Control 407 548 173494 6.20e-49 9.92e-47
2 2.65 morans_z Rantes Control 413 496 142245 5.34e-24 4.27e-22
3 1.48 morans_z Mcpi Control 409 548 147577 4.67e-17 2.49e-15
4 1.09 morans_z Rantes Control 415 581 152673 7.22e-13 2.89e-11
5 0.986 morans_z Rantes Control 417 598 155618 1.67e-11 5.34e-10
6 0.792 morans_z Rantes Control 413 591 150928 1.67e-10 4.45e- 9
# ℹ 5 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
# alternative <chr>, marker <chr>

To help with the interpretation of the results, this function will output informative messages informing the user when certain tests have not been performed (for example due to a lack of observations in a condition).

We can visualize the output of these tests next to each other in a facetted volcano plot.

ggplot(DPA_res, aes(x = estimate, y = -log10(p_adj), color = estimate)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-0.25,0.25), col = "grey") +
coord_cartesian(xlim = c(-2,3.2)) +
theme_bw() +
geom_text(aes(label=ifelse(abs(estimate)>1,as.character(marker),'')),hjust=-0.2,vjust=1) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
facet_grid(vars(target), scales = "free")

Differential Colocalization

Similarly, we can run multiple differential colocalization tests with RunDCA.

targets <- c("Rantes", "Mcpi")
reference <- "Control"

DCA_res <- RunDCA(
obj,
contrast_column = "sample",
targets = c("Mcpi", "Rantes"),
reference = "Control",
alternative = "two.sided",
polarity_metric = "morans_z",
p_adjust_method = "BH",
cl = 4
)
  - Mcpi vs Control
- Rantes vs Control
DCA_res %>%  
arrange(p_adj) %>%
head()
# A tibble: 6 × 15
estimate data_type target reference n1 n2 statistic p p_adj
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 7.32 pearson_z Rantes Control 406 472 138420 5.82e-30 3.68e-26
2 5.16 pearson_z Rantes Control 417 598 168134 3.19e-21 1.01e-17
3 3.92 pearson_z Rantes Control 416 597 163662 6.73e-18 1.42e-14
4 -3.93 pearson_z Rantes Control 407 548 75944 3.20e-17 5.06e-14
5 5.45 pearson_z Rantes Control 383 496 125037 8.16e-16 1.03e-12
6 -3.45 pearson_z Rantes Control 230 584 43994 1.72e-14 1.81e-11
# ℹ 6 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
# alternative <chr>, marker_1 <chr>, marker_2 <chr>

Side-by-side visualization of the volcano plots is easily achieved by facetting on the targets.

# Volcano plot
ColocalizationScores(obj, meta_data_columns = "sample") %>%
group_by(marker_1, marker_2) %>%
summarize(coloc_mean = mean(pearson_z), .groups = "drop") %>%
left_join(DCA_res, by = c("marker_1", "marker_2")) %>%
ggplot(aes(estimate, -log10(p_adj), label = paste(marker_1, marker_2, sep = "/"), color = coloc_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 5) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = -10) +
scale_x_continuous(expand = expansion(0.1)) +
scale_y_continuous(expand = expansion(0.1)) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean colocalization score"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
) +
facet_wrap(vars(target), scales = "free")

This concludes this short tutorial on performing multiple comparisons of abundance, polarization, and colocalization. Now, you’ll be able to perform multiple comparisons simultaneously, which significantly accelerates the analysis process for more complex experimental designs. The combination of speed, accuracy, and statistical robustness provided by this approach will help ensure that your results are not only more reliable but also more likely to stand up to peer review and replication efforts.

Remember, while this method is powerful, it’s always important to consider the biological context of your experiments and interpret the results in light of your specific research questions and hypotheses.