Spatial Analyses
In this tutorial we will show you how you can analyze the spatial configuration of proteins on the cell surface of single cells, using MPX data.
Pixelator will generate a number of metrics from the graph structure inherent to the MPX data. In this tutorial we will show you how you can approach spatial analysis using the MPX polarity scores, and MPX colocalization scores to get a deeper understanding of how proteins are organized on the cell surface.
The polarity score measures the degree of non-random spatial clustering of a protein on the cell surface of a single cell, using Moran’s I. Colocalization, on the other hand, measures the co-abundance of two proteins in small regions on the surface of a single cell. Here we will use both these measures to look at the spatial changes that occurs when T cells take on a migratory phenotype.
In this tutorial we will use both these measures to look at the spatial changes that occurs when T cells take on a migratory phenotype. Therefore, we start from the beginning with a dataset of T cells that have been stimulated to take on such a migratory phenotype (Karlsson et al, Nature Methods 2024). In this experiment T cells have been immobilized with CD54 and stimulated with a chemokine to induce them to form a uropod; a bulge on the cell to help propel the cell forward. Upon uropod formation, some proteins on the surface of the cell migrate to one pole to form the bulge, the uropod, which can be captured by MPX and manifests itself in the data as an increase in polarization and colocalization of the proteins participating in the uropod.
We will load two samples, one that has undergone the treatment outlined above, and a control sample.
After completing this tutorial, you will be able to:
-
Extract key features from MPX data: Analyze the inherent graph structure of MPX data to understand protein clustering (polarity) and co-abundance (colocalization) on cell surfaces.
-
Quantify spatial changes in protein organization: Utilize MPX polarity scores and Wilcoxon Rank Sum tests to measure and compare protein clustering between different conditions, revealing spatial rearrangements upon cellular stimulation.
-
Identify changes in protein colocalization: Conduct differential colocalization analysis to pinpoint protein pairs with significant changes in co-occurance across different samples, uncovering potential protein interactions under specific conditions.
-
Visualize and interpret data: Effectively communicate your findings through volcano plots and heatmaps for differential analyses, and leverage UMAPs to explore relationships between colocalization scores, identifying distinct protein groups based on their spatial dynamics.
-
Connect spatial data to biological functions: Correlate observed changes in protein organization with cellular phenotypes or functions to gaining deeper insights into protein interactions and their roles in biological processes like cellular migration.
Setup
library(dplyr)
library(tibble)
library(stringr)
library(ggrepel)
library(Seurat)
library(here)
library(pheatmap)
library(ggplot2)
library(tidygraph)
library(ggraph)
library(pixelatorR)
Load data
We begin by loading and merging the two samples, so that they can easily be analyzed together.
DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)
Run the following code in the terminal to download the dataset.
BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/0.18.x/uropod-t-cells-v1.0-immunology-I"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_control.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"
# The folders the data is located in:
data_files <-
c(Control = file.path(DATA_DIR, "Uropod_control.layout.dataset.pxl"),
Stimulated = file.path(DATA_DIR, "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"))
# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)
# Combine data to Seurat object
pg_data_combined <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data)
)
# Add sample column to meta data
pg_data_combined <-
AddMetaData(pg_data_combined,
metadata = str_remove(colnames(pg_data_combined), "_.*"),
col.name = "sample")
Process data
First we need to clean up this data with a few simple quality control steps, by removing cells with fewer than 8000 unique antibody molecules detected (molecules), or those marked as possible antibody aggregates by pixelator.
# Filter cells
pg_data_combined <-
pg_data_combined %>%
subset(molecules >= 8000 & tau_type == "normal") %>%
JoinLayers()
Not all proteins will always be expressed by every cell, i.e. some markers might have very low abundance in some cells. For such proteins the spatial scores, polarity and colocalization, should be excluded in the corresponding cells. In pixelator versions >=0.18 we automatically exclude polarity and colocalization scores in a component (cell) for proteins with fewer than 5 counts in that component. This is a conservative threshold to ensure that no significant signals are removed. Typically, further filtration should be performed before spatial analysis to reduce the potential noise from low abundance markers. These further filter steps will be described in the sections on differential analyses below.
Differential polarity analysis
Our first question is whether we can observe an increase in polarity of any proteins upon stimulation, which could indicate an increase in uropod formation. To start with let’s take a look at the polarity scores of all proteins across the samples.
plot_data <- PolarizationScores(pg_data_combined, meta_data_columns = "sample")
# Create the plot
ggplot(
plot_data,
aes(
x = morans_z,
y = reorder(marker, morans_z, function(x) var(x)),
color = sample
)
) +
geom_point(
position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0.3),
size = 0.5,
alpha = 0.6
) +
labs(
x = "Moran's Z",
y = "",
title = ""
) +
theme_minimal() +
theme(legend.position = "right")
Here we already spot some proteins that have different distributions in the control and stimulated conditions. To formally compare the samples we will perform a differential polarity analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX polarity scores are significantly different in the stimulated sample and control.
Removal of low abundance markers
As described above, we will perform some further filtering steps to remove lowly abundant markers. Depending on the nature and quality of the dataset we can decide on heuristic strategies such as to only include markers that have either more than for example 20 reads per component or an abundance that is higher than the most abundant isotype control. To inform the choice of cutoff value, we can plot the distribution of the most abundant isotype control. As seen in the plot, we observe that most components have a maximum isotype control abundance between 5 and 20 counts. Markers with fewer counts, can be considered as negative in this component.
# Collect the isotype controls
isotype_controls <- c("mIgG1", "mIgG2a", "mIgG2b")
# plot isotype abundance distribution
pg_data_combined %>%
JoinLayers() %>%
LayerData(layer = "counts") %>%
as.data.frame() %>%
tibble::rownames_to_column("marker") %>%
tidyr::pivot_longer(-marker, names_to = "component", values_to = "counts") %>%
filter(marker %in% isotype_controls) %>%
group_by(component) %>%
summarize(iso_max = max(counts)) %>%
ggplot(aes(iso_max)) +
geom_histogram(binwidth = 1) +
xlab("Maximum abundance of isotype control per component") +
ylab("Number of components") +
xlim(0,100) +
theme_bw()
# Create a filtered polarization object.
# Start with all scores
pol_scores <- PolarizationScores(pg_data_combined, meta_data_columns = "sample")
# Collect the isotype controls
isotype_controls <- c("mIgG1", "mIgG2a", "mIgG2b")
# Create an object with the count data per component and marker
cts <- pg_data_combined %>%
JoinLayers() %>%
LayerData(layer = "counts") %>%
as.data.frame() %>%
tibble::rownames_to_column("marker") %>%
tidyr::pivot_longer(-marker, names_to = "component", values_to = "counts")
# Create an object with the maximum abundance of the isotype controls per component
max_of_ctrls <- cts %>%
filter(marker %in% isotype_controls) %>%
group_by(component) %>%
summarize(iso_max = max(counts))
# Add this info to the polarization scores
augmented_pol_scores <- pol_scores %>%
left_join(cts) %>%
left_join(max_of_ctrls) %>%
rename(marker_count = counts,
isotype_control_counts = iso_max)
# Filter these to only retain markers with an abundance higher than 20 or the maximum isotype control abundance (whichever is higher)
augmented_pol_scores <- augmented_pol_scores %>%
rowwise() %>%
filter(marker_count > max(20, isotype_control_counts))
After removing markers from components where they are not significantly expressed, some markers might only be present in a few components.
augmented_pol_scores %>%
group_by(marker) %>%
summarise(n = n(),
median = median(morans_z)) %>%
ggplot(aes(x = n, y = median)) +
geom_point() +
xlab("count_of_expressing_cells") +
ylab("morans_z_median") +
ggtitle("Polarity Scores") +
theme_bw()
We observe that markers that appear in only a few components have a high range of median polarity scores. This is not unexpected as these are medians of very few components and more affected by individual outliers. Therefore, in this example we exclude markers that appear in fewer than 100 components.
Note that the choice of removing markers that are present in fewer components is likely depending on your experimental goals. If you would like to detect markers that change polarity in only a small subpopulation of cells, you might want to retain these for the downstream analysis.
# Filter to retain markers that are present in more than 100 cells per sample - note that this cutoff is dataset dependent.
pol_scores_filtered <- augmented_pol_scores %>%
group_by(marker, sample) %>%
filter(n() > 100)
# Filter markers - keep only those that are present in both samples
keep_markers <- pol_scores_filtered %>%
select(marker, sample) %>%
unique() %>%
group_by(marker) %>%
summarise(n = n()) %>%
filter(n == 2) %>%
pull(marker)
This filtered object will then be analyzed with the RunDPA
function.
# Run the polarization test on the filtered scores object (for the present markers only)
pol_test <- RunDPA(pol_scores_filtered %>% filter(marker %in% keep_markers),
target = "Stimulated",
reference = "Control",
contrast_column = "sample"
)
# Visualize this in a volcano plot
PolarizationScores(pg_data_combined, meta_data_columns = "sample") %>%
filter(sample == "Stimulated") %>%
group_by(marker) %>%
summarize(pol_mean = mean(morans_z)) %>%
left_join(pol_test, by = "marker") %>%
ggplot(aes(estimate, -log10(p_adj), label = marker, color = pol_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 5) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = 3) +
xlim(-5, 8) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean polarity"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
)
pol_test %>%
filter(abs(estimate) >= 1, p_adj < 0.001)
# A tibble: 16 × 14
estimate data_type target reference n1 n2 statistic p conf.low
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 1.16 morans_z Stimula… Control 326 458 98751 1.26e-14 0.838
2 1.22 morans_z Stimula… Control 149 407 42322 8.53e-13 0.881
3 4.40 morans_z Stimula… Control 291 436 102609 2.91e-45 3.82
4 1.43 morans_z Stimula… Control 326 458 103092 9.1 e-20 1.12
5 1.68 morans_z Stimula… Control 320 447 102583 9.99e-25 1.34
6 1.16 morans_z Stimula… Control 213 340 51018 5.58e-16 0.872
7 1.22 morans_z Stimula… Control 322 451 105230 1.6 e-26 1.01
8 5.21 morans_z Stimula… Control 259 347 76151 1.57e-48 4.46
9 1.76 morans_z Stimula… Control 113 248 19809 2.89e-10 1.15
10 1.61 morans_z Stimula… Control 328 458 101230 8.74e-17 1.23
11 1.02 morans_z Stimula… Control 326 458 97190 5.56e-13 0.742
12 3.96 morans_z Stimula… Control 281 137 28330 4.78e-15 2.99
13 1.20 morans_z Stimula… Control 306 355 76134 4.98e-19 0.942
14 2.65 morans_z Stimula… Control 178 321 43650 1.46e-22 2.03
15 1.44 morans_z Stimula… Control 318 455 102183 1.56e-22 1.14
16 1.89 morans_z Stimula… Control 328 458 104389 1.09e-20 1.42
# ℹ 5 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
# marker <chr>, p_adj <dbl>
We see that many proteins are differentially polarized between the two conditions, with higher polarity in the stimulated sample. Each protein is also color coded to reflect the mean polarity score. Based on this we can see that for example CD50 shows both a large absolute polarity score and a sizeable difference to the unstimulated sample. If we take a closer look at the distribution of polarity scores for that protein in particular we see that most of the cells in the unstimulated control have a low degree of polarity of CD50, with polarity scores close to zero, while cells from the stimulated sample have a stronger degree of polarity in general.
# Violin plot of CD50
DefaultAssay(pg_data_combined) <- "mpxCells"
pg_data_combined <- PolarizationScoresToAssay(pg_data_combined)
DefaultAssay(pg_data_combined) <- "polarization"
VlnPlot(pg_data_combined, features = "CD50", group.by = "sample") +
labs(y = "Moran's Z", x = "")
# Don't forget to switch back before proceeding with downstream steps
DefaultAssay(pg_data_combined) <- "mpxCells"
Differential colocalization analysis
Our second question is whether we can observe an difference in colocalization of any protein pairs upon stimulation, which could represent proteins that migrate to the uropod. We will perform a differential colocalization analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX colocalization scores are significantly different in the stimulated sample compared to control.
Removal of low abundance markers
As described for the differential polarization analysis, we will perform some further filtering steps to remove lowly abundant markers. We will adopt the same strategy as for the polarization analysis but now filter on the abundance of the most lowly abundant marker of the marker pair.
# Create a filtered colocalization object.
# Start with all scores
col_scores <- ColocalizationScores(pg_data_combined, meta_data_columns = "sample")
# Add this info to the colocalization scores
augmented_col_scores <- col_scores %>%
left_join(cts, by = c("marker_1" = "marker", "component")) %>%
rename(marker_1_count = counts) %>%
left_join(cts, by = c("marker_2" = "marker", "component")) %>%
rename(marker_2_count = counts) %>%
left_join(max_of_ctrls) %>%
rename(isotype_control_counts = iso_max) %>%
rowwise() %>%
mutate(lower_marker_count = min(marker_1_count, marker_2_count))
# Filter these to only retain marker pairs where the lowest abundant marker has an abundance higher than 20 or the maximum isotype control abundance (whichever is higher)
augmented_col_scores <- augmented_col_scores %>%
rowwise() %>%
filter(lower_marker_count > max(20, isotype_control_counts))
After the removal of marker pairs from components where they are not significantly abundant some marker pairs might only be present in a few components.
augmented_col_scores %>%
group_by(marker_1, marker_2) %>%
summarise(n = n(),
median = median(pearson_z)) %>%
ggplot(aes(x = n, y = median)) +
geom_point() +
xlab("count_of_expressing_cells") +
ylab("pearson_z_median") +
ggtitle("Colocalization Scores") +
theme_bw()
We observe that marker pairs that appear in only a few components have a
high range of median
colocalization scores. This is not unexpected as these are medians of
very few components and more affected
by individual outliers. Therefore, in this example we exclude markers
that appear in fewer than 100 components.
Note that the choice of removing marker pairs that are present in fewer components is likely depending on your experimental goals. If you would like to detect marker pairs that change colocalization in only a small subpopulation of cells, you might want to retain these for the downstream analysis.
# Filter to retain markers that are present in more than 100 cells per sample - note that this cutoff is dataset dependent.
col_scores_filtered <- augmented_col_scores %>%
group_by(marker_1, marker_2, sample) %>%
filter(n() >= 100)
This filtered object will then be analyzed with the RunDCA
function.
colocalization_test <- RunDCA(col_scores_filtered,
target = "Stimulated",
reference = "Control",
contrast_column = "sample"
)
# Volcano plot
ColocalizationScores(pg_data_combined, meta_data_columns = "sample") %>%
filter(sample == "Stimulated") %>%
group_by(marker_1, marker_2) %>%
summarize(coloc_mean = mean(pearson_z), .groups = "drop") %>%
left_join(colocalization_test, by = c("marker_1", "marker_2")) %>%
ggplot(aes(estimate, -log10(p_adj), label = paste(marker_1, marker_2, sep = "/"), color = coloc_mean)) +
geom_point() +
geom_text_repel(max.overlaps = 5) +
scale_color_gradient2(low = "#0066FFFF", mid = "#74D055FF", high = "#FF0000FF", midpoint = -10) +
scale_x_continuous(expand = expansion(0.1)) +
scale_y_continuous(expand = expansion(0.1)) +
theme_minimal() +
guides(color = guide_colorbar(title.position = "top")) +
labs(
x = "Median difference",
y = expression(-log[10] ~ (adj. ~ p - value)),
color = "Mean colocalization score"
) +
theme(
legend.position = "bottom",
legend.justification = "right"
)
# Filter proteins that have a median difference colocalization score > 1
colocalization_test %>%
filter(abs(estimate) >= 1, p_adj < 0.001) %>%
arrange(-estimate)
# A tibble: 159 × 15
estimate data_type target reference n1 n2 statistic p conf.low
<dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
1 14.6 pearson_z Stimula… Control 230 331 65768 9.87e-49 13.2
2 14.0 pearson_z Stimula… Control 232 113 21306 4.15e-21 11.5
3 10.6 pearson_z Stimula… Control 260 133 25937 4.87e-16 8.35
4 8.69 pearson_z Stimula… Control 103 237 19030 2.54e-16 6.72
5 8.46 pearson_z Stimula… Control 106 234 19557 1.57e-17 6.39
6 7.78 pearson_z Stimula… Control 255 345 64659 7.01e-23 6.31
7 6.96 pearson_z Stimula… Control 102 239 18761 3.17e-15 5.11
8 6.87 pearson_z Stimula… Control 103 315 22044 4.53e- 8 4.13
9 6.78 pearson_z Stimula… Control 326 458 105894 1.59e-23 5.36
10 6.72 pearson_z Stimula… Control 252 328 62468 4.23e-26 5.57
# ℹ 149 more rows
# ℹ 6 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
# marker_1 <chr>, marker_2 <chr>, p_adj <dbl>
We find that many marker pairs have significantly different colocalization in the uropod sample in comparison to control. Most of the pairs display a decreased colocalization, while a low number of marker pairs increase in their colocalization. Let’s display the measured differences in colocalization scores in a heatmap to get a better look at the effect for each marker pair.
ColocalizationHeatmap(colocalization_test,
fontsize = 6,
cellwidth = 6,
cellheight = 6
)
We see that CD37, CD50, CD162 increase in colocalization to each other, while their colocalization is decreased to several other proteins, with largest effect towards proteins that we expect to be quite ubiquitous on the T cell, including B2M, HLA-ABC, CD3, CD2, CD18, and CD45. Combined, these two pieces of information seem to indicate that CD37, CD50, CD162 move from being dispersed and intermixed with other protein to cluster and colocalize together, which is consistent with our hypothesis that these proteins migrate to form the uropod upon stimulation. See the tutorial on 2D cell visualization to see some of these cells’ graphs visualized.
We can also visualize the colocalization of these markers in each sample as a network where each node represents a marker and the color and width of each connecting edge represents the average colocalization score.
markers_to_plot <- c("CD50", "CD37", "CD162", "CD18", "CD45")
df_net <- ColocalizationScores(pg_data_combined) %>%
filter(marker_1 %in% markers_to_plot & marker_2 %in% markers_to_plot) %>%
select(marker_1, marker_2, pearson_z, component) %>%
mutate(treatment = gsub("_.*", "", component)) %>%
group_by(treatment, marker_1, marker_2) %>%
summarize(average_coloc = mean(pearson_z)) %>%
mutate(from = marker_1,
to = marker_2)
tidygraph::as_tbl_graph(df_net) %>%
ggraph(layout = "kk") +
geom_edge_link(aes(color = average_coloc, width = abs(average_coloc))) +
geom_node_label(
aes(label = name),
size = 3
) +
scale_edge_color_gradientn(
colours = c("#728BB1", "white", "#CD6F8D"),
limits = c(-abs(max(df_net$average_coloc)), abs(max(df_net$average_coloc)))
) +
facet_edges(~treatment, nrow = 1) +
theme(aspect.ratio = 1,
panel.background = element_rect(fill = "white"),
panel.spacing = unit(1, "lines"))
These networks clearly show that CD37, CD50 and CD162 colocalize with each other but are distinct from the colocalizing pair CD45 and CD18. In addition, the stimulation seems to have strengthened the patterns that were already visible in the control cells.
Let’s take a closer look at CD50 and its colocalization with CD162. We see that these two proteins have a higher degree of colocalization in the stimulated sample and when cells are plotted by the colocalization of CD50 and CD162 against the CD45 and CD50, we see that many cells from the stimulated sample are distributed towards lower colocalization between CD50 and CD45 and increased colocalization of CD50 and CD162.
# Make sure that we are using the correct assay
DefaultAssay(pg_data_combined) <- "mpxCells"
# Convert colocalization table to a new assay
pg_data_combined <- ColocalizationScoresToAssay(pg_data_combined)
DefaultAssay(pg_data_combined) <- "colocalization"
# Violin plot of CD162-CD50
VlnPlot(pg_data_combined, features = "CD162-CD50", group.by = "sample")
# Scatter of CD162-CD50 and CD45-CD50
FeatureScatter(pg_data_combined,
feature1 = "CD162-CD50",
feature2 = "CD45-CD50",
group.by = "sample"
) +
theme_minimal() +
theme(plot.title = element_blank())
Now let’s take a holistic view of colocalization scores and make a UMAP. We will filter the proteins we want to include to only the ones that had the largest difference in colocalization score between our two samples.
# Filter to >5 std difference
proteins_of_interest <-
colocalization_test %>%
filter(abs(estimate) >= 5, p_adj < 0.001) %>%
arrange(-estimate) %>%
select(marker_1, marker_2) %>%
unlist() %>%
unique()
# Find pairs of interest
pairs_of_interest <- colocalization_test %>%
select(marker_1, marker_2) %>%
filter(
marker_1 %in% proteins_of_interest,
marker_2 %in% proteins_of_interest
) %>%
tidyr::unite(marker_1, marker_2, col = "pairs", sep = "-") %>%
pull(pairs)
# Set variable faetures
VariableFeatures(pg_data_combined) <- pairs_of_interest
# Create UMAP
pg_data_combined <-
pg_data_combined %>%
ScaleData() %>%
RunPCA() %>%
RunUMAP(dims = 1:10, reduction = "pca")
# Plot
DimPlot(pg_data_combined, group.by = "sample") +
theme_bw()
Armed with the powerful insights gleaned from analyzing MPX data, we have delved into the spatial organization of proteins on the cell surface, uncovering how they cluster and interact differently under various conditions. The next tutorial in this series takes us one step further by exploring the creation of 2D cell graph visualizations. This exciting approach will allow us to not only see, but truly feel, the dynamic nature of proteins as they polarize and colocalize, offering a deeper understanding of their roles in shaping cellular function.
We will now save this new object to proceed to the next tutorial where we will visualize the graph components that constitute cells in MPX data in two dimensions.
# Save filtered dataset for next step. This is an optional pause step; if you prefer to continue to the next tutorial without saving the object explicitly, that will also work.
saveRDS(pg_data_combined, file.path(DATA_DIR, "uropod_spatial.rds"))