Skip to main content
Version: 0.18.x

Differential Abundance

Now we have merged, normalized, integrated and annotated the cell types in our dataset, we can proceed to the biological interpretation of our experiment. MPX data contains a wealth of spatial data that we will discuss in the following tutorials. Here, we will focus on the differential abundance of markers across conditions.

After completing this tutorial, you should be able to:

  • Identify abundance changes across conditions within a cell type
  • Visualize these changes in a clear way

Setup

library(pixelatorR)
library(dplyr)
library(tibble)
library(stringr)
library(tidyr)
library(SeuratObject)
library(Seurat)
library(ggplot2)
library(here)
library(pheatmap)

Load data

To illustrate the differential abundance testing we continue with the annotated object we created in the previous tutorial. As described before, this object contains four samples: a resting and PHA stimulated PBMC sample, both in duplicate.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)
# Load the object that you saved in the previous tutorial. This is not needed if it is still in your workspace.
pg_data_combined <- readRDS(file.path(DATA_DIR, "combined_data_annotated.rds"))
pg_data_combined
An object of class Seurat 
168 features across 3844 samples within 2 assays
Active assay: mpxCells (84 features, 84 variable features)
3 layers present: counts, data, scale.data
1 other assay present: log_assay
4 dimensional reductions calculated: pca, umap, harmony, harmony_umap

Differential Abundance in CD8 T cells

As PHA is a T cell mitogen, one might be interested in how PHA treatment affects protein abundance in T cells. Therefore, in this tutorial, we will subset the data to only include the CD8 T cells and then perform a differential abundance analysis of the PHA treated cells versus the resting cells. As mentioned in the Abundance Normalization tutorial, we set the mean.fxn parameter to rowMeans and the fc.name parameter to “difference”.

# Caculate the difference in dsb value between stimulated and resting CD8 T cells
DAA <- pg_data_combined %>%
subset(subset = cell_type == "CD8 T cells") %>%
SetIdent(value = "condition") %>%
FindMarkers(
test.use = "wilcox",
ident.1 = "stimulated",
ident.2 = "resting",
mean.fxn = rowMeans,
fc.name = "difference"
) %>%
arrange(-difference) %>%
rownames_to_column("marker") %>%
as_tibble()

DAA
# A tibble: 59 × 6
marker p_val difference pct.1 pct.2 p_val_adj
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 CD71 2.28e-185 3.05 0.767 0.007 1.92e-183
2 CD25 1.43e-202 2.31 0.966 0.014 1.20e-200
3 CD38 1.47e-153 1.78 0.994 0.486 1.24e-151
4 CD27 8.82e- 93 1.51 0.82 0.446 7.41e- 91
5 CD82 5.49e-164 1.40 0.997 0.635 4.61e-162
6 CD7 1.92e-160 1.29 0.997 0.852 1.61e-158
7 CD278 6.32e-176 1.00 0.778 0.012 5.31e-174
8 CD8 1.28e- 4 0.965 0.987 0.79 1.08e- 2
9 CD26 1.77e- 52 0.942 0.701 0.295 1.49e- 50
10 ACTB 1.45e-165 0.920 1 0.772 1.22e-163
# ℹ 49 more rows

Let’s visualize these results as a volcano plot and annotate some T cell-relevant markers

ggplot(DAA, aes(x = difference, y = -log10(p_val_adj), color = difference)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-1,1), col = "grey") +
coord_cartesian(xlim = c(1.1*min(DAA$difference), 1.1*max(DAA$difference))) +
theme_bw() +
geom_text(aes(label=ifelse(abs(difference)>1,as.character(marker),'')),hjust=-0.2,vjust=0) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())

Alternatively, in a heatmap we can visualize the average normalized values in each of the four samples for the 7 most upregulated and the 7 most downregulated markers in PHA versus resting conditions.

# Select 14 most different markers
DA_markers <- DAA %>%
arrange(difference) %>%
{
rbind( head(., 7), tail(., 7) )
} %>%
pull(marker)

# Visualize them in a hetmap of dsb values for all four samples
FetchData(pg_data_combined, vars=c("sample", DA_markers)) %>%
group_by(sample) %>%
summarise_all(mean) %>%
column_to_rownames("sample") %>%
t() %>%
pheatmap(cellwidth = 10,
cellheight = 10,
angle_col = 90)

In this example, it seems that many of the T cell relevant markers (such as CD71, CD38 and CD25) show a significant increase in abundance upon PHA stimulation.

However, abundance is only one facet of the rich multidimensional data generated by MPX experiments. We are now ready to transition into the spatial realm, exploring the arrangement and interactions of proteins on the cell surface. In the upcoming tutorials, we will leverage the graph nature and spatial metrics within MPX data to unravel the spatial biology of cells, surveying protein polarization and colocalization patterns and how these contribute to cell identity, signaling, and function. Abundance has illuminated cell types; spatiality will illuminate the molecular geography that defines them.