Spatial Analyses
In this tutorial we will show you how you can analyze the spatial configuration of proteins on the cell surface of single cells, using MPX data.
Pixelator will generate a number of metrics from the graph structure inherent to the MPX data. In this tutorial we will show you how you can approach spatial analysis using the MPX polarity scores, and MPX colocalization scores to get a deeper understanding of how proteins are organized on the cell surface.
The polarity score measures the degree of non-random spatial clustering of a protein on the cell surface of a single cell, using Moran’s I. Colocalization, on the other hand, measures the co-expression of two proteins in small regions on the surface of a single cell. Here we will use both these measures to look at the spatial changes that occurs when T cells take on a migratory phenotype. In this tutorial we will use a dataset of T cells that have been stimulated to take on such a migratory phenotype (Karlsson et al, Nature Methods 2024). In this experiment T cells have been immobilized with CD54 and stimulated with a chemokine to induce them to form a uropod; a bulge on the cell to help propel the cell forward.
Upon uropod formation, some proteins on the surface of the cell migrate to one pole to form the bulge, the uropod, which can be captured by MPX and manifests itself in the data as an increase in polarization and colocalization of the proteins participating in the uropod.
We will load two samples, one that has undergone the treatment outlined above, and a control sample.
After completing this tutorial, you will be able to:
-
Extract key features from MPX data: Analyze the inherent graph structure of MPX data to understand protein clustering (polarity) and co-expression (colocalization) on cell surfaces.
-
Quantify spatial changes in protein organization: Utilize MPX polarity scores and Wilcoxon Rank Sum tests to measure and compare protein clustering between different conditions, revealing spatial rearrangements upon cellular stimulation.
-
Identify changes in protein colocalization: Conduct differential colocalization analysis to pinpoint protein pairs with significant changes in co-occurance across different samples, uncovering potential protein interactions under specific conditions.
-
Visualize and interpret data: Effectively communicate your findings through volcano plots and heatmaps for differential analyses, and leverage UMAPs to explore relationships between colocalization scores, identifying distinct protein groups based on their spatial dynamics.
-
Connect spatial data to biological functions: Correlate observed changes in protein organization with cellular phenotypes or functions to gaining deeper insights into protein interactions and their roles in biological processes like cellular migration.
Setup
library(dplyr)
library(tibble)
library(stringr)
library(ggrepel)
library(Seurat)
library(here)
library(pheatmap)
library(tidygraph)
library(ggraph)
library(ggplot2)
library(pixelatorR)
Load data
We begin by loading and merging the two samples, so that they can easily be analyzed together.
# The folders the data is located in:
data_files <-
c(
Control = here("data/Uropod_control.dataset.pxl"),
Stimulated = here("data/Uropod_CD54_fixed_RANTES_stimulated.dataset.pxl")
)
# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)
# Combine data to Seurat object
pg_data_combined <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data)
)
# Add sample column to meta data
pg_data_combined <-
AddMetaData(pg_data_combined,
metadata = str_remove(colnames(pg_data_combined), "_.*"),
col.name = "sample"
)
rm(pg_data)
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4077507 217.8 7340994 392.1 NA 6562643 350.5
Vcells 62496126 476.9 205680125 1569.3 36864 213824382 1631.4
Process data
First we need to clean up this data with a few simple quality control steps, by removing cells with fewer than 8000 unique antibody molecules detected (edges), or those marked as possible antibody aggregates by Pixelator. Additionally, we are removing proteins with a median of less than 5 counts. See the Quality Control tutorial for more details.
# Filter cells
pg_data_combined <-
pg_data_combined %>%
subset(edges >= 8000 & tau_type == "normal")
# Calculate median of protein and keep proteins with median >= 5
filtered_proteins <- pg_data_combined %>%
JoinLayers() %>%
LayerData(layer = "counts") %>%
apply(MARGIN = 1, median) %>%
enframe("marker", "median") %>%
arrange(-median) %>%
filter(median >= 5) %>%
filter(!marker %in% c("ACTB", "mIgG1", "mIgG2a", "mIgG2b"))
# Filter proteins
pg_data_combined <-
pg_data_combined %>%
subset(features = filtered_proteins$marker)
Differential polarity analysis
Our first question is whether we can observe an increase in polarity of any proteins upon stimulation, which could indicate an increase in uropod formation. To start with let’s take a look at the polarity scores of all proteins across the samples.
plot_data <- PolarizationScores(pg_data_combined, meta_data_columns = "sample")
# Calculate variance for each marker
marker_variance <- plot_data %>%
group_by(marker) %>%
summarize(variance = var(morans_z)) %>%
arrange(desc(variance))
# Create the plot
ggplot(
plot_data,
aes(
x = morans_z,
y = reorder(marker, morans_z, function(x) var(x)),
color = sample
)
) +
geom_point(
position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.3),
size = 0.5,
alpha = 0.6
) +
labs(
x = "Moran's Z",
y = "",
title = ""
) +
theme_minimal() +
theme(legend.position = "right")
Here we already spot some proteins that have different distributions in the control and stimulated conditions. To formally compare the samples we will perform a differential polarity analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX polarity scores are significantly different in the stimulated sample and control.
pol_test <- RunDPA(pg_data_combined,
target = "Stimulated",
reference = "Control",
contrast_column = "sample"
)
# Volcano plot
PolarizationScores(pg_data_combined, meta_data_columns = "sample") %>%
filter(sample == "Stimulated") %>%
group_by(marker) %>%
summarize(pol_mean = mean(morans_z)) %>%
left_join(pol_test, by = "marker") %>%
ggplot(aes(estimate, -log10(p_adj), label = marker, color = pol_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 18) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 3) +
xlim(-5, 5) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean polarity"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
)
# Filter proteins that differ with more than 1 standard deviation
pol_test %>%
filter(abs(estimate) >= 1, p_adj < 0.001)
# A tibble: 14 × 14
estimate data_type target reference n1 n2 statistic p conf.low
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 1.95 morans_z Stimula… Control 328 457 104107 1.33e-20 1.51
2 2.02 morans_z Stimula… Control 325 457 122970 3.54e-55 1.80
3 3.87 morans_z Stimula… Control 328 454 118894 4.12e-46 3.34
4 1.16 morans_z Stimula… Control 328 457 95325 7.87e-11 0.798
5 1.77 morans_z Stimula… Control 328 457 102682 8.68e-19 1.37
6 1.05 morans_z Stimula… Control 328 457 101522 2.24e-17 0.810
7 3.51 morans_z Stimula… Control 328 457 121224 2.34e-49 3.00
8 1.37 morans_z Stimula… Control 328 457 105355 2.90e-22 1.10
9 3.15 morans_z Stimula… Control 328 457 121287 1.74e-49 2.79
10 2.03 morans_z Stimula… Control 328 457 107974 5.66e-26 1.66
11 4.43 morans_z Stimula… Control 327 457 124819 8.78e-58 3.94
12 3.50 morans_z Stimula… Control 316 457 106863 6.97e-30 3.01
13 1.46 morans_z Stimula… Control 328 457 108907 2.28e-27 1.20
14 2.01 morans_z Stimula… Control 328 457 99968 1.41e-15 1.49
# ℹ 5 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
# marker <chr>, p_adj <dbl>
We see that many proteins are differentially polarized between the two conditions, with higher polarity in the stimulated sample. Each protein is also color coded to reflect the mean polarity score. Based on that we can see that CD50 shows the greatest magnitude of both absolute polarity score and difference from the unsimulated sample. If we take a closer look at the distribution of polarity scores for that protein in particular we see that most of the cells in the unstimulated control have a low degree of polarity of CD50, with polarity scores close to zero, while cells from the stimulated sample have a stronger degree of polarity in general.
# Violin plot of CD50
DefaultAssay(pg_data_combined) <- "mpxCells"
pg_data_combined <- PolarizationScoresToAssay(pg_data_combined)
DefaultAssay(pg_data_combined) <- "polarization"
VlnPlot(pg_data_combined, features = "CD50", group.by = "sample") +
labs(y = "Moran's Z", x = "")
# Don't forget to switch back before proceeding with downstream steps
DefaultAssay(pg_data_combined) <- "mpxCells"
Differential colocalization analysis
Our second question is whether we can observe an difference in colocalization of any protein pairs upon stimulation, which could represent proteins that migrate to the uropod. We will perform a differential colocalization analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX colocalization scores are significantly different in the stimulated sample compared to control.
colocalization_test <- RunDCA(pg_data_combined,
target = "Stimulated",
reference = "Control",
contrast_column = "sample"
)
# Volcano plot
ColocalizationScores(pg_data_combined, meta_data_columns = "sample") %>%
filter(sample == "Stimulated") %>%
group_by(marker_1, marker_2) %>%
summarize(coloc_mean = mean(pearson_z), .groups = "drop") %>%
left_join(colocalization_test, by = c("marker_1", "marker_2")) %>%
ggplot(aes(estimate, -log10(p_adj), label = paste(marker_1, marker_2, sep = "/"), color = coloc_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 10) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = -10) +
scale_x_continuous(expand = expansion(0.1)) +
scale_y_continuous(expand = expansion(0.1)) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean colocalization score"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
)
# Filter proteins that differ with more than 1 standard deviation
colocalization_test %>%
filter(abs(estimate) >= 1, p_adj < 0.001) %>%
arrange(-estimate)
# A tibble: 469 × 15
estimate data_type target reference n1 n2 statistic p conf.low
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 4.96 pearson_z Stimula… Control 327 457 106755 1.24e-24 4.09
2 4.64 pearson_z Stimula… Control 327 454 104493 2.25e-22 3.79
3 4.43 pearson_z Stimula… Control 328 454 101150 1.10e-17 3.50
4 -1.38 pearson_z Stimula… Control 327 454 58524 4.44e- 7 -1.93
5 -1.40 pearson_z Stimula… Control 328 457 57924 5.54e- 8 -1.90
6 -1.43 pearson_z Stimula… Control 327 454 57828 1.34e- 7 -1.97
7 -1.47 pearson_z Stimula… Control 328 457 59388 6.85e- 7 -2.08
8 -1.48 pearson_z Stimula… Control 328 457 58594 1.80e- 7 -2.04
9 -1.48 pearson_z Stimula… Control 316 438 54442 5.67e- 7 -2.09
10 -1.54 pearson_z Stimula… Control 328 457 59468 7.81e- 7 -2.18
# ℹ 459 more rows
# ℹ 6 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
# marker_1 <chr>, marker_2 <chr>, p_adj <dbl>
We find that many protein pairs have significantly different colocalization in the uropod sample in comparison to control. Most of the pairs display a decreased colocalization, while a low number of protein pairs increase in their colocalization. Let’s display the measured differences in colocalization scores in a heatmap to get a better look at the effect for each protein pair.
ColocalizationHeatmap(colocalization_test,
fontsize = 6,
cellwidth = 6,
cellheight = 6
)
We see that CD37, CD50, CD162 increase in colocalization to each other, while their colocalization is decreased to several other proteins, with largest effect towards proteins that we expect to be quite ubiquitous on the T cell, including B2M, HLA-ABC, CD3, CD2, CD18, and CD45. Combined, these two pieces of information seem to indicate that CD37, CD50, CD162 move from being dispersed and intermixed with other protein to cluster and colocalize together, which is consistent with our hypothesis that these proteins migrate to form the uropod upon stimulation. See the tutorial on 2D cell visualization to see some of these cells’ graphs visualized.
We can also visualize the colocalization of these markers in each sample as a network where each node represents a marker and the color and width of each connecting edge represents the average colocalization score.
markers_to_plot <- c("CD50", "CD162", "CD37", "CD18", "CD45")
df_net <- ColocalizationScores(pg_data_combined) %>%
filter(marker_1 %in% markers_to_plot & marker_2 %in% markers_to_plot) %>%
select(marker_1, marker_2, pearson_z, component) %>%
mutate(treatment = gsub("_.*", "", component)) %>%
group_by(treatment, marker_1, marker_2) %>%
summarize(average_coloc = mean(pearson_z)) %>%
filter(!is.na(average_coloc)) %>%
mutate(from = marker_1,
to = marker_2)
tidygraph::as_tbl_graph(df_net) %>%
ggraph(layout = "fr") +
geom_edge_link(aes(color = average_coloc, width = abs(average_coloc))) +
geom_node_label(
aes(label = name),
size = 3
) +
scale_edge_color_gradientn(
colours = c("#728BB1", "white", "#CD6F8D"),
limits = c(-max(abs(df_net$average_coloc)), max(abs(df_net$average_coloc)))
) +
facet_edges(~treatment, nrow = 1) +
theme(aspect.ratio = 1,
panel.background = element_rect(fill = "white"),
panel.spacing = unit(1, "lines"))
These networks clearly show that CD37, CD50 and CD162 colocalize with each other but are strongly separated from markers such as CD45 and CD18. In addition, the stimulation seems to have strengthened the patterns that were already visible in the control cells.
Let’s take a closer look at CD50 and its colocalization with CD162. We see that these two proteins have a higher degree of colocalization in the stimulated sample and when cells are plotted by the colocalization of CD50 and CD162 against the CD45 and CD50, we see that many cells from the stimulated sample are distributed towards lower colocalization between CD50 and CD45 and increased colocalization of CD50 and CD162.
# Make sure that we are using the correct assay
DefaultAssay(pg_data_combined) <- "mpxCells"
# Convert colocalization table to a new assay
pg_data_combined <- ColocalizationScoresToAssay(pg_data_combined)
DefaultAssay(pg_data_combined) <- "colocalization"
# Violin plot of CD162-CD50
VlnPlot(pg_data_combined, features = "CD162-CD50", group.by = "sample")
# Scatter of CD162-CD50 and CD45-CD50
FeatureScatter(pg_data_combined,
feature1 = "CD162-CD50",
feature2 = "CD45-CD50",
group.by = "sample"
) +
theme_minimal() +
theme(plot.title = element_blank())
Now let’s take a holistic view of colocalization scores and make a UMAP. We will filter the proteins we want to include to only the ones that had the largest difference in colocalization score between our two samples.
# Filter to >5 std difference
proteins_of_interest <-
colocalization_test %>%
filter(abs(estimate) >= 5, p_adj < 0.001) %>%
arrange(-estimate) %>%
select(marker_1, marker_2) %>%
unlist() %>%
unique()
# Find pairs of interest
pairs_of_interest <- colocalization_test %>%
select(marker_1, marker_2) %>%
filter(
marker_1 %in% proteins_of_interest,
marker_2 %in% proteins_of_interest
) %>%
tidyr::unite(marker_1, marker_2, col = "pairs", sep = "-") %>%
pull(pairs)
# Set variable faetures
VariableFeatures(pg_data_combined) <- pairs_of_interest
# Create UMAP
pg_data_combined <-
pg_data_combined %>%
ScaleData() %>%
RunPCA() %>%
RunUMAP(dims = 1:10, reduction = "pca")
# Plot
DimPlot(pg_data_combined, group.by = "sample") +
theme_bw()
Armed with the powerful insights gleaned from analyzing MPX data, we have delved into the spatial organization of proteins on the cell surface, uncovering how they cluster and interact differently under various conditions. The next tutorial in this series takes us one step further by exploring the creation of 2D cell graph visualizations. This exciting approach will allow us to not only see, but truly feel, the dynamic nature of proteins as they polarize and colocalize, offering a deeper understanding of their roles in shaping cellular function.