Skip to main content
Version: 0.18.x

Visualizing spatial patterns

In the spatial analysis tutorial, we explored the polarity and colocalization metrics that allow us to identify protein-specific patterns of spatial organization across cell populations. Once a cell population has been defined with a large effect size of a spatial metric of interest, we can select components from that population and draw 2D or 3D layouts to show the spatial distribution of protein markers.

The polarity and colocalization metrics are summary metrics that indicate the presence of a general spatial pattern in a component, for example, whether one marker is clustered (polarity) or if two markers are colocalized (colocalization). However, these metrics do not tell us where in the component graph the polarity or colocalization event appear. For this, we can use a local spatial metric that can help us to score nodes in the graph based on how much the local marker abundance deviates from the expectation.

In this tutorial, we will learn how to use the Local G spatial metric to identify regions of MPX component graphs where a spatial pattern in marker abundance is present. In the 2D and 3D visualization tutorials, we could already see some patterns by simply coloring nodes based on marker abundance. However, for low abundant markers, such patterns can be quite challenging to pick up since the marker counts per node are sparse. The Local G spatial metric summarizes the abundance over an extended local neighborhood around each node which addresses this problem.

Setup

library(dplyr)
library(tidyr)
library(here)
library(pixelatorR)
library(stringr)
library(SeuratObject)
library(ggplot2)

Load data

We will continue with the stimulated T cell data that we analyzed in the previous tutorial (Karlsson et al, Nature Methods 2024). Here, we read the data files and apply the same filtering steps as in Spatial Analysis.

We begin by downloading and loading the PXL file that is needed for this first part of the Data Handling tutorial.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)

Run the following code in the terminal to download the dataset.

BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/0.18.x/uropod-t-cells-v1.0-immunology-I"

curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_control.layout.dataset.pxl"
curl -Ls -O -C - --create-dirs --output-dir $DATA_DIR "${BASEURL}/Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"

And than proceed in R.

# The folders the data is located in:
data_files <-
c(Control = file.path(DATA_DIR, "Uropod_control.layout.dataset.pxl"),
Stimulated = file.path(DATA_DIR, "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"))


# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)

# Combine data to Seurat object
pg_data_combined <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data))

# Add sample column to meta data
pg_data_combined <-
AddMetaData(pg_data_combined,
metadata = str_remove(colnames(pg_data_combined), "_.*"),
col.name = "sample")
rm(pg_data)
gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)  max used  (Mb)
Ncells 3516627 187.9 5369504 286.8 NA 5369504 286.8
Vcells 34197691 261.0 113903636 869.1 102400 105874639 807.8
# Filter cells
pg_data_combined <-
pg_data_combined %>%
subset(molecules >= 8000 & tau_type == "normal")

Now we can select components with high colocalization for a specific marker pair to get some good example components to work with. First, we will filter our colocalization scores to keep only CD162/CD50 and then we will add the abundance data for CD162, CD37 and CD50 to the colocalization table.

colocalization_scores <- 
ColocalizationScores(pg_data_combined, meta_data_columns = "sample") %>%
unite(contrast, marker_1, marker_2, sep = "/") %>%
filter(contrast == "CD162/CD50") %>%
left_join(y = FetchData(pg_data_combined, vars = c("CD162", "CD50", "CD37"), layer = "counts") %>%
as_tibble(rownames = "component"),
by = "component")

Next, we will select components with high colocalization scores for CD162/CD50. In the histogram below, we see the distribution of Pearson Z scores for the CD162/CD50 colocalization. Since we want to narrow down the selection of components, we set a threshold for the Pearson Z scores. Here, we use a threshold of 35 to keep the components with the highest colocalization scores. Note that the threshold is arbitrary and should be adjusted based on the data.

# Set Pearson Z threshold
pearson_z_threshold <- 35

ggplot(colocalization_scores, aes(pearson_z)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = pearson_z_threshold, linetype = "dashed") +
labs(title = "Pearson Z scores [CD162/CD50]")

Then we apply our threshold to get a final selection of 13 components.

plot_components <- 
colocalization_scores %>%
filter(pearson_z > pearson_z_threshold, sample == "Stimulated") %>%
arrange(-CD162) %>%
select(component, pearson_z, CD162, CD50, CD37)
plot_components
# A tibble: 13 × 5
component pearson_z CD162 CD50 CD37
<chr> <dbl> <dbl> <dbl> <dbl>
1 Stimulated_RCVCMP0000082 57.6 1039 1105 598
2 Stimulated_RCVCMP0000045 39.5 929 1205 621
3 Stimulated_RCVCMP0000132 42.9 688 700 444
4 Stimulated_RCVCMP0000568 36.8 566 462 185
5 Stimulated_RCVCMP0000296 38.5 486 738 248
6 Stimulated_RCVCMP0000308 47.2 485 557 347
7 Stimulated_RCVCMP0000535 39.3 412 651 286
8 Stimulated_RCVCMP0000059 36.6 302 467 115
9 Stimulated_RCVCMP0001141 37.8 211 113 41
10 Stimulated_RCVCMP0000122 42.0 193 241 79
11 Stimulated_RCVCMP0000453 43.9 192 212 93
12 Stimulated_RCVCMP0001049 35.8 108 212 75
13 Stimulated_RCVCMP0000431 41.5 68 118 69

Local G scores

Local G is a Z score that measures how much the marker counts in a local neighborhood deviates from the global mean. Neighborhoods are formed around each node in the graph, where the size of the neighborhood is determined by the k parameter. The larger k is, the larger the neighborhood around each node becomes, which in turn increases the spatial scale at which the score is calculated.

For instance, if we set k = 1, the local neighborhood around each node will be the direct neighbors (1 step away). If we set k = 2, the neighborhood will include the direct neighbors and the neighbors that are 2 steps away.

The effect of using a higher k is that the spatial scale increases; we are leveraging information from node that are further away in the graph, and the local G score will be more smoothed out. You can visit our algorithms section about Local G if you want to learn more about this statistic.

Before we proceed, we first need to load the component graphs of interest:

pg_data_combined <- LoadCellGraphs(pg_data_combined, cells = plot_components$component)

We can compute the Local G metric for a single component graph with the local_G function. This function requires a graph object and a node count matrix as input and returns a matrix where each row corresponds to a node in the graph and each column corresponds to a marker. Here, we select one of the components from our plot_components table.

cg <- CellGraphs(pg_data_combined)[["Stimulated_RCVCMP0001141"]]
counts <- CellGraphData(cg, "counts")
g <- CellGraphData(cg, "cellgraph")

# Compute local G
gi_mat <- local_G(g, counts, k = 1)
gi_mat[1:5, 1:5]
                                   B2M       CD152        CD18      CD197
ACTACGGTAGTTATCAAATCTAGTG-A -0.2029695 0.74505561 0.03396755 0.2031598
TAGGTTGAAGTTATCCTTTTCTTTG-A 1.1800282 30.75888074 0.90995322 0.7820440
ATGCTTGGACATGAGGTATATGCAG-A 1.8818359 -0.06803439 1.91521763 1.0269832
GTATAAAAATTTGGCCATGATCTTA-A 1.1246014 -0.07468989 1.24239984 4.0527747
GTAGTAATGTGAAGTGATTTGGGGT-A -0.9036178 -0.05201513 -0.55971691 -0.2505301
CD278
ACTACGGTAGTTATCAAATCTAGTG-A 0.01165937
TAGGTTGAAGTTATCCTTTTCTTTG-A -0.17776235
ATGCTTGGACATGAGGTATATGCAG-A -0.21809074
GTATAAAAATTTGGCCATGATCTTA-A -0.23942557
GTAGTAATGTGAAGTGATTTGGGGT-A -0.16673946
note

There are in fact to definitions of local G: GiG_i and GiG_i^*. The main difference between these two variants is how they define a neighborhood around a center node ii; GiG_i excludes ii from the neighborhood while GiG_i^* includes it. You can see the full definitions of the two statistics in the algorithms section for Local G. For visualization purposes, the choice between GiG_i and GiG_i^* is not critical, but it can affect the interpretation of the scores. You can select what method to use by setting type to one of “gi” or “gstari” in the local_G function.

In the plot below, we show the distribution (histogram) of local G scores for CD162 with k = 1 for our selected component. We see that a large fraction of nodes have G scores around 0, while a smaller fraction of the nodes (right tail) have very high scores. The latter corresponds to nodes that are located in regions of the graph where the abundance of CD162 is higher than expected.

gi_mat[, "CD162"] %>% 
as.data.frame() %>%
setNames(c("CD162")) %>%
ggplot(aes(x = CD162)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = 1, color = "red", linetype = "dashed") +
labs(title = "Local G scores [CD162]", x = "Local G", y = "Number of nodes")

If we want to visualize the spatial distribution of the local G scores, we can project them back onto a 3D layout as we did in the 3D cell visualization tutorial. The local G scores cannot be directly visualized with Plot3DGraph, so we will need to create the plot manually.

First, let’s compute 3D weighted pmds layouts for our loaded component graphs:

pg_data_combined <- ComputeLayout(pg_data_combined, dim = 3, pivots = 200, layout_method = "wpmds")
cg <- CellGraphs(pg_data_combined)[["Stimulated_RCVCMP0001141"]]

In the plot below, we project the local G scores for CD162 onto the wpmds 3D layout. The color scale is centered at 0, since we are visualizing Z scores.

layout <- CellGraphData(cg, "layout")$wpmds_3d

# Fetch local G scores for CD162
vals <- gi_mat[, "CD162"]

max_abs_val <- max(abs(vals))

plotly::plot_ly(
layout,
type = "scatter3d",
mode = "markers",
x = ~x, y = ~y, z = ~z,
size = 1,
marker = list(color = vals, colorscale = "RdBu", opacity = 1,
size = scales::rescale(vals, to = c(20, 100)),
cmin = -max_abs_val, cmax = max_abs_val, # Center color scale at 0
line = list(width = 0))) %>%
plotly::layout(title = "Gi scores for CD162 (k = 1)",
scene = list(xaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
yaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
zaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F)))

Note that the Local G scores can have extreme outliers which tend to skew the color scale. To mitigate this, we can trim the values to a certain quantile range. Below, we adjust values higher than the 99th percentile.

layout <- CellGraphData(cg, "layout")$wpmds_3d

vals[vals > quantile(vals, 0.99)] <- quantile(vals, 0.99)

max_abs_val <- max(abs(vals))

plotly::plot_ly(
layout,
type = "scatter3d",
mode = "markers",
x = ~x, y = ~y, z = ~z,
size = 1,
marker = list(color = vals, colorscale = "RdBu", opacity = 1,
size = scales::rescale(vals, to = c(20, 100)),
cmin = -max_abs_val, cmax = max_abs_val,
line = list(width = 0))) %>%
plotly::layout(title = "Gi scores for CD162 (k = 1)",
scene = list(xaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
yaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
zaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F)))

If we generate the same plot using raw counts, we can see that the spatial pattern is still quite clear, but less concentrated in the uropod.

layout <- CellGraphData(cg, "layout")$wpmds_3d

vals <- CellGraphData(cg, "counts")[, "CD162"] %>% log1p()

plotly::plot_ly(
layout,
type = "scatter3d",
mode = "markers",
x = ~x, y = ~y, z = ~z,
size = 1,
marker = list(color = vals, colorscale = c("lightgrey", "mistyrose", "red", "darkred"), opacity = 1,
size = scales::rescale(vals, to = c(20, 100)),
line = list(width = 0))) %>%
plotly::layout(title = "log1p counts for CD162 (k = 1)",
scene = list(xaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
yaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F),
zaxis = list(title = "", showgrid = F, nticks = 0, showticklabels = F)))