Skip to main content
Version: 0.18.x

Spatial Analyses

In this tutorial we will show you how you can analyze the spatial configuration of proteins on the cell surface of single cells, using MPX data.

Pixelator will generate a number of metrics from the graph structure inherent to the MPX data. In this tutorial we will show you how you can approach spatial analysis using the MPX polarity scores, and MPX colocalization scores to get a deeper understanding of how proteins are organized on the cell surface.

The polarity score measures the degree of non-random spatial clustering of a protein on the cell surface of a single cell, using Moran’s I. Colocalization, on the other hand, measures the co-expression of two proteins in small regions on the surface of a single cell. Here we will use both these measures to look at the spatial changes that occurs when T cells take on a migratory phenotype. In this tutorial we will use a dataset of T cells that have been stimulated to take on such a migratory phenotype (Karlsson et al, bioRxiv 2023). In this experiment T cells have been immobilized with CD54 and stimulated with a chemokine to induce them to form a uropod; a bulge on the cell to help propel the cell forward.

Upon uropod formation, some proteins on the surface of the cell migrate to one pole to form the bulge, the uropod, which can be captured by MPX and manifests itself in the data as an increase in polarization and colocalization of the proteins participating in the uropod.

In this tutorial we will load two samples, one that has undergone the treatment outlined above, and a control sample.

After completing this tutorial, you will be able to:

  • Extract key features from MPX data: Analyze the inherent graph structure of MPX data to understand protein clustering (polarity) and co-abundance (colocalization) on cell surfaces.

  • Quantify spatial changes in protein organization: Utilize MPX polarity scores and Wilcoxon Rank Sum tests to measure and compare protein clustering between different conditions, revealing spatial rearrangements upon cellular stimulation.

  • Identify changes in protein colocalization: Conduct differential colocalization analysis to pinpoint protein pairs with significant changes in co-occurance across different samples, uncovering potential protein interactions under specific conditions.

  • Visualize and interpret data: Effectively communicate your findings through volcano plots and heatmaps for differential analyses, and leverage UMAPs to explore relationships between colocalization scores, identifying distinct protein groups based on their spatial dynamics.

  • Connect spatial data to biological functions: Correlate observed changes in protein organization with cellular phenotypes or functions to gaining deeper insights into protein interactions and their roles in biological processes like cellular migration.

Setup

import numpy as np
import pandas as pd
from pathlib import Path
from pixelator import read, simple_aggregate
from pixelator.plot import plot_colocalization_diff_heatmap
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as mcolors
import networkx as nx
from scipy.stats import mannwhitneyu, false_discovery_control, zscore
from statsmodels.stats.multitest import multipletests
import umap

sns.set_style("whitegrid")
DATA_DIR = Path("<path to the directory to save datasets to>")

Load data

We begin by loading and merging the two samples, so that they can easily be analyzed together.

baseurl = "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/0.18.x/uropod-t-cells-v1.0-immunology-I"

!curl -L -O -C - --create-dirs --output-dir {DATA_DIR} "{baseurl}/Uropod_control.layout.dataset.pxl"
!curl -L -O -C - --create-dirs --output-dir {DATA_DIR} "{baseurl}/Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl"
data_files = [
DATA_DIR / "Uropod_control.layout.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl",
]

pg_data_combined = simple_aggregate(
["Control", "Stimulated"], [read(path) for path in data_files]
)

Process data

First we need to clean up this data with a few simple quality control steps, by removing cells with fewer than 8000 unique antibody molecules detected (edges), or those marked as possible antibody aggregates by pixelator.

components_to_keep = pg_data_combined.adata[
(pg_data_combined.adata.obs["molecules"] >= 8000)
& (pg_data_combined.adata.obs["tau_type"] == "normal")
].obs.index

pg_data_combined = pg_data_combined.filter(
components=components_to_keep
)

Not all proteins will always be expressed by every cell, i.e. they might have very low abundance in some cells. For such proteins the spatial scores, polarity and colocalization, should be excluded in the corresponding cells. In pixelator versions >=0.18 we automatically exclude polarity and colocalization scores in a component (cell) for proteins with fewer than 5 counts in that component. This is a conservative threshold to ensure that no significant signals are removed. Typically, further filtration should be performed before spatial analysis to reduce the potential noise from low abundance markers. These further filter steps will be described in the sections on differential analyses below.

Differential polarity analysis

Our first question is whether we can observe an increase in polarity of any proteins upon stimulation, which could indicate an increase in uropod formation. To start with let’s take a look at the polarity scores of all proteins across the samples.

polarization = pg_data_combined.polarization

# Calculate the variance of morans_z for each marker
marker_variance = (
polarization.groupby("marker")["morans_z"].var().sort_values(ascending=False)
)

# Find markers present in both conditions
markers_in_both = polarization.groupby("marker")["sample"].nunique() == 2
markers_to_plot = markers_in_both[markers_in_both].index

# Filter the data and recalculate variance
filtered_polarization = polarization[polarization["marker"].isin(markers_to_plot)]
filtered_marker_variance = filtered_polarization.groupby("marker")["morans_z"].var().sort_values(ascending=False)

# Create polarity strip plot
plt.figure(figsize=(10, 12))

_ = sns.stripplot(
y="marker",
x="morans_z",
hue="sample",
data=filtered_polarization,
jitter=0.2,
alpha=0.6,
dodge=True,
size=2,
order=filtered_marker_variance.index,
)

_ = plt.ylabel("", fontsize=14)
_ = plt.xlabel("Moran's Z", fontsize=14)
_ = plt.legend(title="Sample", loc="center left", bbox_to_anchor=(1, 0.5))


plt.tight_layout()
plt.show()

Here we already spot some proteins that have different distributions in the control and stimulated conditions. To formally compare the samples we will perform a differential polarity analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX polarity scores are significantly different in the stimulated sample and control.

Removal of low abundance markers

As described above, we will perform some further filtering steps to remove lowly abundant markers. Depending on the nature and quality of the dataset we can decide on heuristic strategies such as to only include markers that have either more than for example 20 reads per component or an abundance that is higher than the most abundant isotype control. To inform the choice of cutoff value, we can plot the distribution of the most abundant isotype control. As seen in the plot, we observe that most components have a maximum isotype control abundance between 5 and 20 counts. Markers with fewer counts, can be considered as negative in this component.

isotype_controls = ["mIgG1", "mIgG2a", "mIgG2b"]
isotype_control_counts = pg_data_combined.adata.to_df()[isotype_controls].max(axis=1)
sns.histplot(isotype_control_counts, log_scale=True)
plt.xlabel("Maximum abundance of isotype control per component")
plt.ylabel("Number of components")
plt.show()

To be able to apply the filters, we should first add information about marker counts to the spatial data.

def add_polarity_marker_counts(polarity_df, counts_df, isotype_controls):
augmented_polarity_df = polarity_df.copy()
cell_marker_count = counts_df.stack()
marker_index = polarity_df.set_index(["component", "marker"]).index
augmented_polarity_df["marker_count"] = cell_marker_count[marker_index].values
component_index = polarity_df.set_index("component").index
augmented_polarity_df["isotype_control_counts"] = counts_df.loc[component_index, isotype_controls].max(axis=1).values
return augmented_polarity_df

augmented_polarity = add_polarity_marker_counts(pg_data_combined.polarization, pg_data_combined.adata.to_df(), isotype_controls)

The function add_polarity_marker_counts adds multiple columns to the polarization dataframe containing the counts of each marker and the isotype control marker counts in the corresponding component. Using this information, we can now remove the polarization scores for low abundant markers. Here we choose to remove markers with either less counts than the most abundant isotype control or a hard cutoff of 20 molecules - whichever is higher.

polarity_filtered = augmented_polarity[augmented_polarity["marker_count"] > np.maximum(20, augmented_polarity["isotype_control_counts"])]

After removing markers in components where they are not significantly abundant, some markers might only be present in a few components.

polarity_summary = polarity_filtered.groupby("marker").morans_z.agg(count_of_expressing_cells = "count", morans_z_median = "median")
polarity_summary_plot = sns.scatterplot(polarity_summary, x="count_of_expressing_cells", y="morans_z_median")
plt.title("Polarity scores")
plt.show()

We observe that markers that appear in only a few components have a high range of median polarity scores. This is not unexpected as these are medians of very few components and more affected by individual outliers. Therefore, in this example we exclude markers that appear in fewer than 100 components.

note

Note that the choice of removing markers that are present in fewer components is likely depending on your experimental goals. If you would like to detect markers that change polarity in only a small subpopulation of cells, you might want to retain these for the downstream analysis.

pg_data_combined.polarization =  polarity_filtered.groupby(["marker", "sample"]).filter(lambda x: len(x) >= 100)

This filtered object will then be analyzed using a Wilcoxon Rank Sum test.

def polarization_wilcoxon(marker, df):
control = df[df["sample"] == "Control"]
stimulated = df[df["sample"] == "Stimulated"]
if control.shape[0] == 0 or stimulated.shape[0] == 0:
return {"marker": marker, "stat": 0, "p_value": 1, "estimate": 0}
# mannwhitneyu is equivalent to a Wilcoxon test when the groups are
# on different sizes
stat, p_value = mannwhitneyu(
x=control["morans_z"], y=stimulated["morans_z"], alternative="two-sided"
)
estimate = np.median(
stimulated["morans_z"].to_numpy()[:, None] - control["morans_z"].to_numpy()
)
return {"marker": marker, "stat": stat, "p_value": p_value, "estimate": estimate}


# Prepare the polarization data, and do the Wilcoxon test on each marker separately.
polarization = pg_data_combined.polarization
polarization_test_results = pd.DataFrame.from_records(
polarization.groupby(["marker"]).apply(
lambda marker_data: polarization_wilcoxon(marker_data.name, marker_data)
)
)
# Since we do a large number of tests, we will correct their p-values using
# Bonferroni correction.
_, pvals_corrected, *_ = multipletests(
polarization_test_results["p_value"], method="bonferroni"
)
polarization_test_results["p_adj"] = pvals_corrected
polarization_test_results["-log10(adjusted p-value)"] = -np.log10(pvals_corrected)

# Compute the mean moran's z for each marker and merge with polarization test results
filtered_data = polarization[polarization['sample'] == "Stimulated"]
grouped_data = filtered_data.groupby('marker')['morans_z'].mean().reset_index()
plot_data = pd.merge(grouped_data, polarization_test_results, on='marker', how='left')

# Plot the results in a volcano plot
from matplotlib.colors import LinearSegmentedColormap

color_gradient = ['#0066FFFF', '#74D055FF', '#FF0000FF']
cmap = LinearSegmentedColormap.from_list('custom', color_gradient, N=256)

fig, ax = plt.subplots(figsize=(10,8))
p = ax.scatter(x=plot_data['estimate'], y=plot_data['-log10(adjusted p-value)'],
c=plot_data['morans_z'], s= 20, marker='o', cmap=cmap)

for line in range(0, plot_data.shape[0]):
label = plt.text(plot_data['estimate'][line] + 0.1, -np.log10(plot_data['p_adj'][line]),
plot_data['marker'][line], horizontalalignment='left', size='small', color='black')

axlabs = ax.set(xlim=(-4, 8), xlabel='Median difference', ylabel=r'$-\log_{10}$(adj. p-value)')
cbar = fig.colorbar(p, label="Mean polarity", cmap=cmap)

plt.show()

# Show all proteins that are significan and deviate
# by more than 1 standard deviation
significant_polarization_test_results = polarization_test_results[
(polarization_test_results["estimate"] >= 1)
& (polarization_test_results["p_adj"] < 0.001)
]
significant_polarization_test_results
marker stat p_value estimate p_adj -log10(adjusted p-value)
0 B2M 50557.0 1.255757e-14 1.161477 5.776481e-13 12.238337
2 CD11a 18321.0 8.525177e-13 1.223192 3.921581e-11 10.406539
6 CD162 24267.0 2.913696e-45 4.402370 1.340300e-43 42.872798
7 CD18 46216.0 9.095649e-20 1.428976 4.183998e-18 17.378408
13 CD26 40457.0 9.987438e-25 1.683954 4.594222e-23 22.337788
16 CD278 21402.0 5.578068e-16 1.162637 2.565911e-14 13.590758
17 CD29 39992.0 1.603305e-26 1.222381 7.375202e-25 24.132226
19 CD37 13722.0 1.568433e-48 5.205240 7.214794e-47 46.141776
23 CD40 8215.0 2.894927e-10 1.760985 1.331666e-08 7.875605
25 CD44 48994.0 8.736993e-17 1.613348 4.019017e-15 14.395880
26 CD45 52118.0 5.563764e-13 1.024670 2.559332e-11 10.591873
33 CD50 10167.0 4.780272e-15 3.964547 2.198925e-13 12.657790
37 CD59 32496.0 4.978303e-19 1.203938 2.290020e-17 16.640161
40 CD8 13488.0 1.460250e-22 2.648023 6.717150e-21 20.172815
41 CD82 42507.0 1.559294e-22 1.437827 7.172752e-21 20.144314
44 HLA-ABC 45835.0 1.086724e-20 1.892822 4.998931e-19 18.301123

We see that many proteins are differentially polarized between the two conditions, with higher polarity in the stimulated sample. Each protein is also color coded to reflect the mean polarity score. Based on this we can see that CD50 shows both a large absolute polarity score and a sizeable difference to the unstimulated sample. If we take a closer look at the distribution of polarity scores for that protein in particular we see that most of the cells in the unstimulated control have a low degree of polarity of CD50, with polarity scores close to zero, while cells from the stimulated sample have a stronger degree of polarity in general.

polarization_cd50_df = polarization[polarization["marker"] == "CD50"]
sns.catplot(polarization_cd50_df, kind="violin", x="sample", y="morans_z", hue="sample")
plt.show()

Differential colocalization analysis

Our second question is whether we can observe an difference in colocalization of any protein pairs upon stimulation, which could represent proteins that migrate to the uropod. We will perform a differential colocalization analysis by employing a Wilcoxon Rank Sum test to assess whether the MPX colocalization scores are significantly different in the stimulated sample compared to control.

Removal of low abundance markers

As described for the differential polarization analysis, we will perform some further filtering steps to remove lowly abundant markers. We will adopt the same strategy as for the polarization analysis but now filter on the presence of the most lowly abundant marker of the marker pair.

To be able to apply the filters, we should again first add information about marker counts to the spatial data.

def add_colocalization_marker_counts(colocalization_df, counts_df, isotype_controls):
augmented_colocalization_df = colocalization_df.copy()
cell_marker_count = counts_df.stack()
marker_1_index = colocalization_df.set_index(["component", "marker_1"]).index
marker_2_index = colocalization_df.set_index(["component", "marker_2"]).index
augmented_colocalization_df["marker_1_count"] = cell_marker_count[marker_1_index].values
augmented_colocalization_df["marker_2_count"] = cell_marker_count[marker_2_index].values
augmented_colocalization_df["lower_marker_count"] = augmented_colocalization_df[["marker_1_count", "marker_2_count"]].min(axis=1)
component_index = colocalization_df.set_index("component").index
augmented_colocalization_df["isotype_control_counts"] = counts_df.loc[component_index, isotype_controls].max(axis=1).values
return augmented_colocalization_df

augmented_colocalization = add_colocalization_marker_counts(pg_data_combined.colocalization, pg_data_combined.adata.to_df(), isotype_controls)

The function add_colocalization_marker_counts, adds multiple columns to the colocalization dataframe regarding the counts of each marker, the counts of the least abundant marker and the isotype control marker counts in the corresponding component. We can now remove the colocalization scores for low abundance markers. Here we choose to remove pairs whose least abundant marker has either fewer counts than the most abundant isotype control or a hard cutoff of 20 molecules - whichever is higher.

colocalization_filtered = augmented_colocalization[augmented_colocalization["lower_marker_count"] > np.maximum(20, augmented_colocalization["isotype_control_counts"])]

After the removal of marker pairs from components where they are not significantly abundant, some markers might only be present in few components.

colocalization_summary = colocalization_filtered.groupby(["marker_1", "marker_2"]).pearson_z.agg(count_of_expressing_cells = "count", pearson_z_median = "median")

sns.scatterplot(colocalization_summary, x="count_of_expressing_cells", y="pearson_z_median")
plt.title("Colocalization scores")
plt.show()

We observe that marker pairs that appear in only a few components have a high range of median colocalization scores. This is not unexpected as these are medians of very few components and more affected by individual outliers. Therefore, in this example we exclude markers that appear in fewer than 100 components.

note

Note that the choice of removing marker pairs that are present in fewer components is likely depending on your experimental goals. If you would like to detect marker pairs that change colocalization in only a small subpopulation of cells, you might want to retain these for the downstream analysis.

pg_data_combined.colocalization =  colocalization_filtered.groupby(["marker_1", "marker_2", "sample"]).filter(lambda x: len(x) >= 100)

This filtered object will then be analyzed using a Wilcoxon Rank Sum test.

# Remove all self-correlations
colocalization = pg_data_combined.colocalization[
~(
pg_data_combined.colocalization["marker_1"]
== pg_data_combined.colocalization["marker_2"]
)
]


def colocalization_wilcoxon(markers, df):
control = df[df["sample"] == "Control"]
stimulated = df[df["sample"] == "Stimulated"]
if control.shape[0] == 0 or stimulated.shape[0] == 0:
return {
"marker_1": markers[0],
"marker_2": markers[1],
"markers": "/".join(markers),
"stat": 0,
"p_value": 1,
"estimate": 0,
}
stat, p_value = mannwhitneyu(
x=control["pearson_z"], y=stimulated["pearson_z"], alternative="two-sided"
)
estimate = np.median(
stimulated["pearson_z"].to_numpy()[:, None] - control["pearson_z"].to_numpy()
)
return {
"marker_1": markers[0],
"marker_2": markers[1],
"markers": "/".join(markers),
"stat": stat,
"p_value": p_value,
"estimate": estimate,
}


colocalization_test_results = pd.DataFrame.from_records(
colocalization.groupby(["marker_1", "marker_2"]).apply(
lambda marker_data: colocalization_wilcoxon(marker_data.name, marker_data)
)
)
_, pvals_corrected, *_ = multipletests(
colocalization_test_results["p_value"], method="bonferroni"
)
colocalization_test_results["p_adj"] = pvals_corrected
colocalization_test_results["-log10(p_adj)"] = -np.log10(pvals_corrected)
colocalization_test_results.index = colocalization_test_results["markers"]
colocalization_test_results["estimate_positive"] = (
colocalization_test_results["estimate"] >= 0
)


# Plot colocalization results, only displaying text on the
# top five most changed for both the positive and the negative
# estimates
top_5_estimated = (
colocalization_test_results.groupby("estimate_positive")
.apply(lambda x: x.sort_values("estimate", key=np.abs, ascending=False).head(5))[
"markers"
]
.values
)

# Compute the mean moran's z for each marker and merge with polarization test results
filtered_coloc = colocalization[colocalization['sample'] == "Stimulated"]
grouped_coloc = filtered_coloc.groupby(['marker_1', 'marker_2'])['pearson_z'].mean().reset_index()
plot_data_coloc = pd.merge(grouped_coloc, colocalization_test_results, on=['marker_1', 'marker_2'], how='left')

color_gradient = ['#0066FFFF', '#74D055FF', '#FF0000FF']
cmap = LinearSegmentedColormap.from_list('custom', color_gradient, N=256)

fig, ax = plt.subplots(figsize=(10,8))
p = ax.scatter(x=plot_data_coloc['estimate'], y=plot_data_coloc['-log10(p_adj)'],
c=plot_data_coloc['pearson_z'], s= 20, marker='o', cmap=cmap)

for text, row in colocalization_test_results[
colocalization_test_results["markers"].isin(top_5_estimated)
].iterrows():
x, y = row[["estimate", "-log10(p_adj)"]]
label = plt.text(
x + 0.05, y, text, horizontalalignment="left", fontsize="xx-small"
)

axlabs = ax.set(xlim=(-20, 20), xlabel='Median difference', ylabel=r'$-\log_{10}$(adj. p-value)')
cbar = fig.colorbar(p, label="Mean colocalization score", cmap=cmap)

plt.show()

# Display colocalization test results that have an estimated
# difference of more than one standard deviation, and that
# are significant
significant_colocalization_test_results = colocalization_test_results[
(np.abs(colocalization_test_results["estimate"]) > 1)
& (colocalization_test_results["p_adj"] < 0.001)
].sort_values(by="estimate", ascending=False)
significant_colocalization_test_results
marker_1 marker_2 markers stat p_value estimate p_adj -log10(p_adj) estimate_positive
markers
CD162/CD37 CD162 CD37 CD162/CD37 10362.0 9.869027e-49 14.619902 9.405183e-46 45.026633 True
CD37/CD50 CD37 CD50 CD37/CD50 4910.0 4.152916e-21 13.992899 3.957729e-18 17.402554 True
CD162/CD50 CD162 CD50 CD162/CD50 8643.0 4.871441e-16 10.625301 4.642484e-13 12.333250 True
CD162/CD40 CD162 CD40 CD162/CD40 5381.0 2.541274e-16 8.691192 2.421834e-13 12.615856 True
CD40/CD5 CD40 CD5 CD40/CD5 5247.0 1.570786e-17 8.460419 1.496959e-14 13.824790 True
... ... ... ... ... ... ... ... ... ...
CD18/CD37 CD18 CD37 CD18/CD37 68791.0 3.584490e-30 -7.443459 3.416019e-27 26.466480 False
CD37/HLA-ABC CD37 HLA-ABC CD37/HLA-ABC 69085.0 9.795865e-30 -7.489132 9.335460e-27 26.029864 False
CD40/CD45 CD40 CD45 CD40/CD45 21405.0 8.974948e-16 -7.870820 8.553125e-13 12.067875 False
B2M/CD40 B2M CD40 B2M/CD40 21968.0 5.040846e-18 -11.488603 4.803926e-15 14.318404 False
CD40/HLA-ABC CD40 HLA-ABC CD40/HLA-ABC 21941.0 6.517812e-18 -12.764979 6.211475e-15 14.206805 False

153 rows × 9 columns

We find that many marker pairs have significantly different colocalization in the uropod sample in comparison to control. Most of the pairs display a decreased colocalization, while a low number of marker pairs increase in their colocalization. Let’s display the measured differences in colocalization scores in a heatmap to get a better look at the effect for each marker pair.

fig, ax = plot_colocalization_diff_heatmap(pg_data_combined.colocalization, reference="Control", target="Stimulated")
plt.show()

We see that CD37, CD50, CD162 increase in colocalization to each other, while their colocalization is decreased to several other with the largest effect towards proteins that we expect to be quite ubiquitous on the T cell, including B2M, HLA-ABC, CD3, CD2, CD18, and CD45. Combined, these two pieces of information seem to indicate that CD37, CD50, CD162 move from being dispersed and intermixed with other protein to cluster and colocalize together, which is consistent with our hypothesis that these proteins migrate to form the uropod upon stimulation. See the tutorial on 2D cell visualization to see some of these cells’ graphs visualized.

We can also visualize the colocalization of these markers in each sample as a network where each node represents a marker and the color and width of each connecting edge represents the average colocalization score.

markers_to_plot = ["CD37", "CD50", "CD162", "CD18", "CD45"]

# Subset colocalization scores to the markers_to_plot
colocalization_network_df = colocalization[
(colocalization["marker_1"].isin(markers_to_plot)) & (colocalization["marker_2"].isin(markers_to_plot))].groupby(['sample', 'marker_1', 'marker_2'], as_index=False)[['pearson_z']].mean()
# And rearrange columns to feed into Networkx
colocalization_network_df = colocalization_network_df[[
'marker_1', 'marker_2', 'pearson_z', 'sample']]

# Create Graph object
g = nx.from_pandas_edgelist(colocalization_network_df,
source="marker_1",
target="marker_2",
edge_attr="pearson_z")
# Calculate the node coordinates on the whole dataset
pos = nx.spring_layout(g)

# Set up colorbar
pearson_values = list(nx.get_edge_attributes(g, "pearson_z").values())
norm_pearson_values = mcolors.Normalize(vmin = -np.max(pearson_values),
vmax = np.max(pearson_values))
colormap = plt.cm.coolwarm
scalarmap = cm.ScalarMappable(norm=norm_pearson_values, cmap=colormap)
scalarmap.set_array(pearson_values)

# Create the networks per treatment:

# Control
g_control = nx.from_pandas_edgelist(colocalization_network_df[colocalization_network_df["sample"] == "Control"],
source="marker_1",
target="marker_2",
edge_attr="pearson_z")
edges_control, weights_control = zip(
*nx.get_edge_attributes(g_control, 'pearson_z').items())
# Calculate the absolute pearson-z to use in the edge width adjustment
edge_widths_control = np.abs(weights_control)

# Stimulated
g_stim = nx.from_pandas_edgelist(colocalization_network_df[colocalization_network_df["sample"] == "Stimulated"],
source="marker_1",
target="marker_2",
edge_attr="pearson_z")
edges_stim, weights_stim = zip(
*nx.get_edge_attributes(g_stim, 'pearson_z').items())
# Calculate the absolute pearson-z to use in the edge width adjustment
edge_widths_stim = np.abs(weights_stim)

# Create 2 network plots side-by-side
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, width_ratios=[1,1,0.1], figsize=(10, 5))
nx.draw(g_control,
pos,
with_labels=True,
edge_color=weights_control,
width=edge_widths_control,
edge_cmap=colormap,
node_color="grey",
ax=ax1)
ax1.set_title("Control")
nx.draw(g_stim,
pos,
with_labels=True,
edge_color=weights_stim,
width=edge_widths_stim,
edge_cmap=colormap,
node_color="grey",
ax=ax2)
ax2.set_title("Stimulated")
ax3.axis("off")
plt.colorbar(scalarmap, ax=ax3, label = "Average pearson-z")
plt.show()

These networks clearly show that CD37, CD50 and CD162 colocalize with each other but are distinct from the colocalizing pair CD45 and CD18. In addition, the stimulation seems to have strengthened the patterns that were already visible in the control cells.

Let’s take a closer look at CD50 and its colocalization with CD162. We see that these two proteins have a higher degree of colocalization in the stimulated sample and when cells are plotted by the colocalization of CD50 and CD162 against the CD45 and CD50, we see that many cells from the stimulated sample are distributed towards lower colocalization between CD50 and CD45 and increased colocalization of CD50 and CD162.

colocalization_cd162_vs_cd50 = colocalization[
(colocalization["marker_1"] == "CD162") & (colocalization["marker_2"] == "CD50")
]
sns.catplot(colocalization_cd162_vs_cd50, kind="violin", x="sample", y="pearson_z", hue="sample")
plt.show()

colocalization["contrast"] = (
colocalization["marker_1"].astype(str)
+ "/"
+ colocalization["marker_2"].astype(str)
)

uropod_markers_df = colocalization[
colocalization["contrast"].isin(["CD162/CD50", "CD45/CD50"])
]
uropod_markers_df_pivoted = uropod_markers_df.pivot_table(
index=["sample", "component", "contrast"], values=["pearson_z"]
).pivot_table(index=["sample", "component"], columns=["contrast"], values=["pearson_z"])
uropod_markers_df_pivoted.columns = uropod_markers_df_pivoted.columns.droplevel(0)

sns.relplot(uropod_markers_df_pivoted, x="CD162/CD50", y="CD45/CD50", hue="sample")
plt.show()

Now let’s take a holistic view of colocalization scores and make a UMAP. We will filter the proteins we want to include to only the ones that had the largest difference in colocalization score between our two samples.

# We only pick proteins which have a high estimated difference
# and that are siginficant at out adjusted p-value threshold
proteins_of_interest = pd.unique(
colocalization_test_results[
(
(np.abs(colocalization_test_results["estimate"]) >= 5)
& (colocalization_test_results["p_adj"] < 0.001)
)
][["marker_1", "marker_2"]].values.ravel("K")
)

# We build a matrix with each colocalization with marker pairs
# in the columns, their pearson Z score as the values, and
# the components in the rows
umap_df = colocalization[
colocalization["marker_1"].isin(proteins_of_interest)
& colocalization["marker_2"].isin(proteins_of_interest)
].pivot(index="component", columns="contrast", values="pearson_z")
umap_df[umap_df.isna()] = 0

# Scale all marker pairs using z-scaling
scaled_data = zscore(umap_df.to_numpy(dtype="float64"), axis=0)
umap_df = pd.DataFrame(
scaled_data,
index=umap_df.index,
columns=umap_df.columns,
)

# Build a UMAP, and re-attach the sample information to it
reducer = umap.UMAP()
embedding = pd.DataFrame(
reducer.fit_transform(umap_df), index=umap_df.index, columns=["UMAP1", "UMAP2"]
)
embedding = embedding.merge(
colocalization[["component", "sample"]], how="left", on="component"
)

# Plot the UMAP
umap_plot = sns.scatterplot(
embedding,
x="UMAP1",
y="UMAP2",
hue="sample",
)
plt.show()

Armed with the powerful insights gleaned from analyzing MPX data, we have delved into the spatial organization of proteins on the cell surface, uncovering how they cluster and interact differently under various conditions. The next tutorial in this series takes us a step further by exploring the creation of 2D cell graph visualizations. This exciting approach will allow us to not only see, but truly feel, the dynamic nature of proteins as they polarize and colocalize, offering a deeper understanding of their roles in shaping cellular function.

We will now save this new object to proceed to the next tutorial where we will visualize the graph components that constitute cells in MPX data in two dimensions.

pg_data_combined.save(DATA_DIR / "uropod_spatial.pxl", force_overwrite=True)