Skip to main content

Spatial Analyses

In this tutorial we will show you how you can analyze the spatial configuration of proteins on the cell surface of single cells, using MPX data.

Pixelator will generate a number of metrics from the graph structure inherent to the MPX data. In this tutorial we will show you how you can approach spatial analysis using the MPX polarity scores, and MPX colocalization scores to get a deeper understanding of how proteins are organized on the cell surface.

The polarity score measures the degree of non-random spatial clustering of a protein on the cell surface of a single cell, using Moran’s I. Colocalization, on the other hand, measures the co-expression of two proteins in small regions on the surface of a single cell. Here we will use both these measures to look at the spatial changes that occurs when T cells take on a migratory phenotype. In this tutorial we will use a dataset of T cells that have been stimulated to take on such a migratory phenotype (Karlsson et al, bioRxiv 2023). In this experiment T cells have been immobilized with CD54 and stimulated with a chemokine to induce them to form a uropod; a bulge on the cell to help propel the cell forward.

Upon uropod formation, some proteins on the surface of the cell migrate to one pole to form the bulge, the uropod, which can be captured by MPX and manifests itself in the data as an increase in polarization and colocalization of the proteins participating in the uropod.

In this tutorial we will load two samples, one that has undergone the treatment outlined above, and a control sample.

After completing this tutorial, you will be able to:

  • Extract key features from MPX data: Analyze the inherent graph structure of MPX data to understand protein clustering (polarity) and co-expression (colocalization) on cell surfaces.

  • Quantify spatial changes in protein organization: Utilize MPX polarity scores and Wilcoxon Rank Sum tests to measure and compare protein clustering between different conditions, revealing spatial rearrangements upon cellular stimulation.

  • Identify changes in protein colocalization: Conduct differential colocalization analysis to pinpoint protein pairs with significant changes in co-occurance across different samples, uncovering potential protein interactions under specific conditions.

  • Visualize and interpret data: Effectively communicate your findings through volcano plots and heatmaps for differential analyses, and leverage UMAPs to explore relationships between colocalization scores, identifying distinct protein groups based on their spatial dynamics.

  • Connect spatial data to biological functions: Correlate observed changes in protein organization with cellular phenotypes or functions to gaining deeper insights into protein interactions and their roles in biological processes like cellular migration.

Setup

import numpy as np
import pandas as pd
from pathlib import Path
from pixelator import read, simple_aggregate
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu, false_discovery_control, zscore
from statsmodels.stats.multitest import multipletests
import umap

sns.set_style("whitegrid")

Load data

We begin by loading and merging the two samples, so that they can easily be analyzed together.

DATA_DIR = Path.cwd().parents[3] / "data"
data_files = [
DATA_DIR / "Uropod_control.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.dataset.pxl",
]

pg_data_combined = simple_aggregate(
["Control", "Stimulated"], [read(path) for path in data_files]
)

Process data

First we need to clean up this data with a few simple quality control steps, by removing cells with fewer than 8000 unique antibody molecules detected (edges), or those marked as possible antibody aggregates by Pixelator. Additionally, we are removing proteins with a median of less than 5 counts. See the Quality Control tutorial for more details.

components_to_keep = pg_data_combined.adata[
(pg_data_combined.adata.obs["edges"] >= 8000)
& (pg_data_combined.adata.obs["tau_type"] == "normal")
].obs.index

pg_data_combined = pg_data_combined.filter(
components=components_to_keep
)

control_markers = ["ACTB", "mIgG1", "mIgG2a", "mIgG2b"]
counts_df = pg_data_combined.adata.to_df()
counts_df = counts_df[counts_df.columns[(counts_df.median() >= 5)]]
counts_df = counts_df[counts_df.columns[~counts_df.columns.isin(control_markers)]]

pg_data_combined = pg_data_combined.filter(
markers=counts_df.columns
)

Differential polarity analysis

Our first question is whether we can observe an increase in polarity of any proteins upon stimulation, which could signify an increase in uropod formation. To answer this, we will perform a differential polarity analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX polarity scores are significantly different in the stimulated sample compared to control.

def polarization_wilcoxon(marker, df):
control = df[df["sample"] == "Control"]
stimulated = df[df["sample"] == "Stimulated"]
# mannwhitneyu is equivalent to a Wilcoxon test when the groups are
# on different sizes
stat, p_value = mannwhitneyu(
x=control["morans_z"], y=stimulated["morans_z"], alternative="two-sided"
)
estimate = np.median(
stimulated["morans_z"].to_numpy()[:, None] - control["morans_z"].to_numpy()
)
return {"marker": marker, "stat": stat, "p_value": p_value, "estimate": estimate}


# Prepare the polarization data, and do the Wilcoxon test on each marker separately.
polarization = pg_data_combined.polarization
polarization_test_results = pd.DataFrame.from_records(
polarization.groupby(["marker"]).apply(
lambda marker_data: polarization_wilcoxon(marker_data.name, marker_data)
)
)
# Since we do a large number of tests, we will correct their p-values using
# Bonferroni correction.
_, pvals_corrected, *_ = multipletests(
polarization_test_results["p_value"], method="bonferroni"
)
polarization_test_results["p_adj"] = pvals_corrected
polarization_test_results["-log10(adjusted p-value)"] = -np.log10(pvals_corrected)

# Compute the mean moran's z for each marker and merge with polarization test results
filtered_data = polarization[polarization['sample'] == "Stimulated"]
grouped_data = filtered_data.groupby('marker')['morans_z'].mean().reset_index()
plot_data = pd.merge(grouped_data, polarization_test_results, on='marker', how='left')

# Plot the results in a volcano plot
from matplotlib.colors import LinearSegmentedColormap

color_gradient = ['#0066FFFF', '#74D055FF', '#FF0000FF']
cmap = LinearSegmentedColormap.from_list('custom', color_gradient, N=256)

fig, ax = plt.subplots(figsize=(10,8))
p = ax.scatter(x=plot_data['estimate'], y=plot_data['-log10(adjusted p-value)'],
c=plot_data['morans_z'], s= 20, marker='o', cmap=cmap)

for line in range(0, plot_data.shape[0]):
label = plt.text(plot_data['estimate'][line] + 0.1, -np.log10(plot_data['p_adj'][line]),
plot_data['marker'][line], horizontalalignment='left', size='small', color='black')

axlabs = ax.set(xlim=(-6, 6), xlabel='Median difference', ylabel=r'$-\log_{10}$(adj. p-value)')
cbar = fig.colorbar(p, label="Mean polarity", cmap=cmap)

plt.show()

# Show all proteins that are significan and deviate
# by more than 1 standard deviation
significant_polarization_test_results = polarization_test_results[
(polarization_test_results["estimate"] >= 1)
& (polarization_test_results["p_adj"] < 0.001)
]
significant_polarization_test_results
     marker     stat  ...         p_adj  -log10(adjusted p-value)
0 B2M 45789.0 ... 6.388812e-19 18.194580
2 CD11a 25555.0 ... 1.699994e-53 52.769553
5 CD162 30018.0 ... 1.977512e-44 43.703881
6 CD18 54571.0 ... 3.779161e-09 8.422605
12 CD26 47214.0 ... 4.165984e-17 16.380282
17 CD29 48374.0 ... 1.074197e-15 14.968916
19 CD37 28672.0 ... 1.123060e-47 46.949597
20 CD38 44541.0 ... 1.391553e-20 19.856500
25 CD44 28609.0 ... 8.332447e-48 47.079227
26 CD45 41922.0 ... 2.716767e-24 23.565948
33 CD50 24620.0 ... 4.215047e-56 55.375198
41 CD8 37549.0 ... 3.345798e-28 27.475500
42 CD82 40989.0 ... 1.096330e-25 24.960059
45 HLA-ABC 49928.0 ... 6.757111e-14 13.170239

[14 rows x 6 columns]

We see that many proteins are differentially polarized between the two conditions, with higher polarity in the stimulated sample. Each protein is also color coded to reflect the mean polarity score. Based on that we can see that CD50 shows the greatest magnitude of both absolute polarity score and difference from the unsimulated sample. If we take a closer look at the distribution of polarity scores for that protein in particular we see that most of the cells in the unstimulated control have a low degree of polarity of CD50, with polarity scores close to zero, while cells from the stimulated sample have a stronger degree of polarity in general.

polarization_cd50_df = polarization[polarization["marker"] == "CD50"]
sns.catplot(polarization_cd50_df, kind="box", x="sample", y="morans_z")
<seaborn.axisgrid.FacetGrid object at 0x7f77c3b088d0>
plt.show()

Differential colocalization analysis

Our second question is whether we can observe an difference in colocalization of any protein pairs upon stimulation, which could represent proteins that migrate to the uropod. We will perform a differential colocalization analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX colocalization scores are significantly different in the stimulated sample compared to control.

# Remove all self-correlations
colocalization = pg_data_combined.colocalization[
~(
pg_data_combined.colocalization["marker_1"]
== pg_data_combined.colocalization["marker_2"]
)
]


def colocalization_wilcoxon(markers, df):
control = df[df["sample"] == "Control"]
stimulated = df[df["sample"] == "Stimulated"]
stat, p_value = mannwhitneyu(
x=control["pearson_z"], y=stimulated["pearson_z"], alternative="two-sided"
)
estimate = np.median(
stimulated["pearson_z"].to_numpy()[:, None] - control["pearson_z"].to_numpy()
)
return {
"marker_1": markers[0],
"marker_2": markers[1],
"markers": "/".join(markers),
"stat": stat,
"p_value": p_value,
"estimate": estimate,
}


colocalization_test_results = pd.DataFrame.from_records(
colocalization.groupby(["marker_1", "marker_2"]).apply(
lambda marker_data: colocalization_wilcoxon(marker_data.name, marker_data)
)
)
_, pvals_corrected, *_ = multipletests(
colocalization_test_results["p_value"], method="bonferroni"
)
colocalization_test_results["p_adj"] = pvals_corrected
colocalization_test_results["-log10(p_adj)"] = -np.log10(pvals_corrected)
colocalization_test_results.index = colocalization_test_results["markers"]
colocalization_test_results["estimate_positive"] = (
colocalization_test_results["estimate"] >= 0
)


# Plot colocalization results, only displaying text on the
# top five most changed for both the positive and the negative
# estimates
top_5_estimated = (
colocalization_test_results.groupby("estimate_positive")
.apply(lambda x: x.sort_values("estimate", key=np.abs, ascending=False).head(5))[
"markers"
]
.values
)

# Compute the mean moran's z for each marker and merge with polarization test results
filtered_coloc = colocalization[colocalization['sample'] == "Stimulated"]
grouped_coloc = filtered_coloc.groupby(['marker_1', 'marker_2'])['pearson_z'].mean().reset_index()
plot_data_coloc = pd.merge(grouped_coloc, colocalization_test_results, on=['marker_1', 'marker_2'], how='left')

color_gradient = ['#0066FFFF', '#74D055FF', '#FF0000FF']
cmap = LinearSegmentedColormap.from_list('custom', color_gradient, N=256)

fig, ax = plt.subplots(figsize=(10,8))
p = ax.scatter(x=plot_data_coloc['estimate'], y=plot_data_coloc['-log10(p_adj)'],
c=plot_data_coloc['pearson_z'], s= 20, marker='o', cmap=cmap)

for text, row in colocalization_test_results[
colocalization_test_results["markers"].isin(top_5_estimated)
].iterrows():
x, y = row[["estimate", "-log10(p_adj)"]]
label = plt.text(
x + 0.05, y, text, horizontalalignment="left", fontsize="xx-small"
)

axlabs = ax.set(xlim=(-15, 6), xlabel='Median difference', ylabel=r'$-\log_{10}$(adj. p-value)')
cbar = fig.colorbar(p, label="Mean colocalization score", cmap=cmap)

plt.show()

# Display colocalization test results that have an estimated
# difference of more than one standard deviation, and that
# are significant
significant_colocalization_test_results = colocalization_test_results[
(np.abs(colocalization_test_results["estimate"]) > 1)
& (colocalization_test_results["p_adj"] < 0.001)
].sort_values(by="estimate", ascending=False)
significant_colocalization_test_results
             marker_1 marker_2  ... -log10(p_adj)  estimate_positive
markers ...
CD37/CD50 CD37 CD50 ... 20.855962 True
CD162/CD50 CD162 CD50 ... 18.595506 True
CD162/CD37 CD162 CD37 ... 13.908243 True
CD84/HLA-ABC CD84 HLA-ABC ... 3.300293 False
CD197/CD45 CD197 CD45 ... 4.203870 False
... ... ... ... ... ...
CD45RB/CD50 CD45RB CD50 ... 37.015101 False
CD50/HLA-ABC CD50 HLA-ABC ... 48.941835 False
CD26/CD50 CD26 CD50 ... 47.660276 False
CD45/CD50 CD45 CD50 ... 54.611243 False
CD18/CD50 CD18 CD50 ... 50.698168 False

[469 rows x 9 columns]

We find that many protein pairs have significantly different colocalization in the uropod sample in comparison to control. Most of the pairs display a decreased colocalization, while a low number of protein pairs increase in their colocalization. Let’s display the measured differences in colocalization scores in a heatmap to get a better look at the effect for each protein pair.

# We are adding all markers turned around, in order
# to create a symetric matrix for the heapmap
colocalization_estimated_df = colocalization_test_results[
["marker_1", "marker_2", "estimate"]
]
colocalization_estimated_df = pd.DataFrame(
np.concatenate(
[
colocalization_estimated_df.to_numpy(),
colocalization_test_results[
["marker_2", "marker_1", "estimate"]
].to_numpy(),
],
),
columns=colocalization_estimated_df.columns,
)

# Pivot so that we have markers on both the row and the columns
pivoted_colocalization_estimated_df = pd.pivot_table(
colocalization_estimated_df[["marker_1", "marker_2", "estimate"]],
index=["marker_1"],
columns=["marker_2"],
values=["estimate"],
fill_value=0,
)
pivoted_colocalization_estimated_df.columns = (
pivoted_colocalization_estimated_df.columns.droplevel(0)
)

# We get the largest absolute value of, to let us set a symetric scale
max_value = np.max(
[
pivoted_colocalization_estimated_df.to_numpy().max(),
np.abs(pivoted_colocalization_estimated_df.to_numpy().min()),
]
)

# Create a clustered heatmap or marker vs. marker
sns.clustermap(
pivoted_colocalization_estimated_df,
yticklabels=True,
xticklabels=True,
linewidths=0.1,
method="complete",
vmin=-max_value,
vmax=max_value,
cmap="vlag",
)
<seaborn.matrix.ClusterGrid object at 0x7f7987907c90>
plt.show()

We see that CD37, CD50, CD162 increase in colocalization to each other, while their colocalization is decreased to several other proteins, with largest effect towards proteins that we expect to be quite ubiquitous on the T cell, including B2M, HLA-ABC, CD3, CD2, CD18, and CD45. Combined, these two pieces of information seem to indicate that CD37, CD50, CD162 move from being dispersed and intermixed with other protein to cluster and colocalize together, which is consistent with our hypothesis that these proteins migrate to form the uropod upon stimulation. See the tutorial on 2D cell visualization to see some of these cells’ graphs visualized.

Let’s taka a closer look at CD50 and its colocalization with CD162. We see that these two proteins have a higher degree of colocalization in the stimulated sample and when cells are plotted by the colocalization of CD50 and CD162 against the CD45 and CD50, we see that many cells from the stimulated sample are distributed towards lower colocalization between CD50 and CD45 and increased colocalization of CD50 and CD162.

colocalization_cd162_vs_cd50 = colocalization[
(colocalization["marker_1"] == "CD162") & (colocalization["marker_2"] == "CD50")
]
sns.catplot(colocalization_cd162_vs_cd50, kind="box", x="sample", y="pearson_z")
<seaborn.axisgrid.FacetGrid object at 0x7f798749b190>
plt.show()

colocalization["contrast"] = (
colocalization["marker_1"].astype(str)
+ "/"
+ colocalization["marker_2"].astype(str)
)

uropod_markers_df = colocalization[
colocalization["contrast"].isin(["CD162/CD50", "CD45/CD50"])
]
uropod_markers_df_pivoted = uropod_markers_df.pivot_table(
index=["sample", "component", "contrast"], values=["pearson_z"]
).pivot_table(index=["sample", "component"], columns=["contrast"], values=["pearson_z"])
uropod_markers_df_pivoted.columns = uropod_markers_df_pivoted.columns.droplevel(0)

sns.relplot(uropod_markers_df_pivoted, x="CD162/CD50", y="CD45/CD50", hue="sample")
<seaborn.axisgrid.FacetGrid object at 0x7f768425db90>
plt.show()

Now let’s take a holistic view of colocalization scores and make a UMAP. We will filter the proteins we want to include to only the ones that had the largest difference in colocalization score between our two samples.

# We only pick proteins which have a high estimated difference
# and that are siginficant at out adjusted p-value threshold
proteins_of_interest = pd.unique(
colocalization_test_results[
(
(np.abs(colocalization_test_results["estimate"]) >= 5)
& (colocalization_test_results["p_adj"] < 0.001)
)
][["marker_1", "marker_2"]].values.ravel("K")
)

# We build a matrix with each colocalization with marker pairs
# in the columns, their pearson Z score as the values, and
# the components in the rows
umap_df = colocalization[
colocalization["marker_1"].isin(proteins_of_interest)
& colocalization["marker_2"].isin(proteins_of_interest)
].pivot(index="component", columns="contrast", values="pearson_z")
umap_df[umap_df.isna()] = 0

# Scale all marker pairs using z-scaling
scaled_data = zscore(umap_df.to_numpy(dtype="float64"), axis=0)
umap_df = pd.DataFrame(
scaled_data,
index=umap_df.index,
columns=umap_df.columns,
)

# Build a UMAP, and re-attach the sample information to it
reducer = umap.UMAP()
embedding = pd.DataFrame(
reducer.fit_transform(umap_df), index=umap_df.index, columns=["UMAP1", "UMAP2"]
)
embedding = embedding.merge(
colocalization[["component", "sample"]], how="left", on="component"
)

# Plot the UMAP
umap_plot = sns.scatterplot(
embedding,
x="UMAP1",
y="UMAP2",
hue="sample",
)
plt.show()

Armed with the powerful insights gleaned from analyzing MPX data, we have delved into the spatial organization of proteins on the cell surface, uncovering how they cluster and interact differently under various conditions. The next tutorial in this series takes us a step further by exploring the creation of 2D cell graph visualizations. This exciting approach will allow us to not only see, but truly feel, the dynamic nature of proteins as they polarize and colocalize, offering a deeper understanding of their roles in shaping cellular function.