Skip to main content

Differential Abundance

Having merged, normalized, integrated, and annotated the cell types in our dataset, we can now delve into more insightful analyses focused on the differences between our two conditions. While PNA data offers rich spatial information (to be explored in subsequent tutorials), here we will concentrate on the differential abundance of markers across these conditions.

After completing this tutorial, you should be able to:

  • Identify changes in protein abundance levels across conditions within a specific cell type population.
  • Visualize these changes effectively.

Setup

library(pixelatorR)
library(dplyr)
library(tibble)
library(stringr)
library(tidyr)
library(SeuratObject)
library(Seurat)
library(ggplot2)
library(here)
library(pheatmap)

Load data

To illustrate the differential abundance testing we continue with the annotated object that we created in the previous tutorial. As described before, this object contains two samples: resting PBMCs and PHA-stimulated PBMCs.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)
# Load the object that you saved in the previous tutorial. This is not needed if it is still in your workspace.
pg_data_combined <- readRDS(file.path(DATA_DIR, "combined_data_annotated.rds"))
pg_data_combined
An object of class Seurat 
474 features across 2066 samples within 3 assays
Active assay: PNA (158 features, 158 variable features)
3 layers present: counts, data, scale.data
2 other assays present: clr_assay, log_assay
4 dimensional reductions calculated: pca, umap, harmony, harmony_umap

Differential Abundance in CD8 T cells

Given that PHA is a T cell mitogen, we will focus on its impact on protein abundance within T cells. Therefore, this tutorial will guide you through subsetting the data to isolate CD8 T cells and then performing a differential abundance test to compare protein levels between PHA-treated and resting cells.

Here we use the RunDAA function from the pixelatorR package to perform differential abundance analysis. This function requires a contrast_column that specifies the column in the metadata to be used for the comparison, a targets character vector that indicates the condition(s) of interest, and a reference column that indicates the baseline condition.

# Caculate the difference in dsb value between stimulated and resting CD8 T cells
DAA <- pg_data_combined %>%
subset(cell_type == "CD8 T cells") %>%
RunDAA(
contrast_column = "condition",
targets = "stimulated",
reference = "resting",
mean_fxn = Matrix::rowMeans
) %>%
arrange(-difference)

DAA %>% head() %>% knitr::kable()
marker p p_adj difference pct_1 pct_2 target reference
CD38 0 0 1.8350568 0.945 0.689 stimulated resting
CD103 0 0 1.5356460 1.000 0.977 stimulated resting
CD366 0 0 1.4204115 0.989 0.826 stimulated resting
CD39 0 0 1.0978315 0.732 0.500 stimulated resting
CD71 0 0 1.0968033 1.000 0.803 stimulated resting
CD279 0 0 0.9309085 1.000 0.902 stimulated resting

Let’s visualize these results as a volcano plot and label the most differentially abundant proteins.

ggplot(DAA, aes(x = difference, y = -log10(p_adj), color = difference)) +
geom_point() +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 0) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
geom_vline(xintercept = c(-1,1), col = "grey") +
coord_cartesian(xlim = c(1.1*min(DAA$difference), 1.1*max(DAA$difference))) +
theme_bw() +
geom_text(aes(label=ifelse(abs(difference)>1,as.character(marker),'')),hjust=-0.2,vjust=0) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())

Alternatively, in a heatmap we can visualize the average normalized values in each of the four samples for the 10 most up-regulated and the 10 most down-regulated markers in PHA versus resting conditions.

# Select 14 most different markers
DA_markers <- DAA %>%
arrange(-difference) %>%
{
rbind(head(., 10), tail(., 10))
} %>%
pull(marker)

# Visualize them in a hetmap of dsb values for the two conditions
FetchData(pg_data_combined, vars = c("condition", DA_markers)) %>%
group_by(condition) %>%
summarise_all(mean) %>%
column_to_rownames("condition") %>%
t() %>%
pheatmap(cellwidth = 10,
cellheight = 10,
angle_col = 90)

This example reveals a significant increase in the abundance of CD38, CD25, CD278, and CD279 upon PHA stimulation, while KLRG1 and CD127 show a significant decrease.

However, protein abundance is just one dimension of the rich data from PNA experiments. We are now poised to explore the spatial organization and interactions of proteins on the cell surface. The following tutorials will leverage the graph structure and spatial metrics inherent in PNA data to uncover the spatial biology of cells, examining protein clustering and colocalization patterns and their contributions to cell identity, signaling, and function.