Skip to main content
Version: 0.18.x

Differential Abundance

Now we have merged, normalized, integrated and annotated the cell types in our dataset, we can proceed to the biological interpretation of our experiment. MPX data contains a wealth of spatial data that we will discuss in the following tutorials. Here, we will focus on the differential abundance of markers across conditions.

After completing this tutorial, you should be able to:

  • Identify abundance changes across conditions within a cell type
  • Visualize these changes in a clear way

Setup

from matplotlib.pyplot import rc_context
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
from pixelator import read
from pixelator.statistics import clr_transformation
import scanpy as sc
import seaborn as sns

sns.set_style("whitegrid")
DATA_DIR = Path("<path to the directory to save datasets to>")

Load data

To illustrate the annotation workflow we continue with the integrated object we created in the previous tutorial. As described before, this merged and harmonized object contains four samples: a resting and PHA stimulated PBMC sample, both in duplicate.

pg_data_combined_pxl_object =  read(DATA_DIR/ "combined_data_annotated.pxl")

# In this tutorial we will mostly work with the AnnData object
# so we will start by selecting that from the pixel data object
pg_data_combined_processed = pg_data_combined_pxl_object.adata

Differential Abundance in CD8 T cells

As PHA is a T cell mitogen, one might be interested in how PHA treatment affects protein abundance in T cells. Therefore, in this tutorial, we will subset the data to only include the CD8 T cells and then perform a differential abundance analysis of the PHA treated cells versus the resting cells. As mentioned in the Abundance Normalization tutorial, we will use the shifted_dsb layer for the differential abundance analysis. This is necessary to avoid negative average expression values in one of the conditions that would otherwise break the calculation of log fold changes. If you have not applied the shifting step in the Abundance Normalization tutorial, you can do so now by running the following code chunk:

# Calculate minimum value per marker
mins = pg_data_combined_processed.to_df("dsb").min()
# If minimum value is positive, set to zero
mins[mins > 0] = 0
# Add layer to use in Differential Abundance test
pg_data_combined_processed.layers["shifted_dsb"] = pg_data_combined_processed.to_df("dsb").sub(mins)

Now we can perform the differential abundance analysis.

# Subset data to only include CD8 T cells
cd8_t_cells = pg_data_combined_processed[pg_data_combined_processed.obs['cell_type'] == 'CD8 T',:]

# Calculate the difference in dsb value between stimulated and resting CD8 T cells
sc.tl.rank_genes_groups(cd8_t_cells, groupby='condition', method='wilcoxon', groups=['stimulated'], reference='resting',layer = 'shifted_dsb')
diff_exp_df = sc.get.rank_genes_groups_df(cd8_t_cells, group=None)
diff_exp_df
names scores logfoldchanges pvals pvals_adj
0 CD25 30.322737 4.042856 5.750194e-202 4.830163e-200
1 CD71 29.554913 4.984287 5.678719e-192 2.385062e-190
2 CD278 28.903751 2.034037 1.071103e-183 2.999088e-182
3 ACTB 28.679626 1.948678 6.849651e-181 1.438427e-179
4 CD82 27.994154 2.719679 1.914118e-172 3.215718e-171
... ... ... ... ... ...
79 CD32 -16.559921 -0.725752 1.357702e-61 4.751956e-61
80 CD244 -18.051432 -1.856854 7.687215e-73 3.398558e-72
81 CD36 -20.640167 -0.997535 1.196482e-94 7.178890e-94
82 CD43 -21.210070 -0.972586 7.709916e-100 4.981792e-99
83 CD50 -22.468721 -0.942513 8.397506e-112 6.412641e-111

84 rows × 5 columns

# Subset data to only include CD8 T cells
cd8_t_cells = pg_data_combined_processed[pg_data_combined_processed.obs['cell_type'] == 'CD8 T',:]

# Calculate the difference in dsb value between stimulated and resting CD8 T cells
sc.tl.rank_genes_groups(cd8_t_cells, groupby='condition', method='wilcoxon', groups=['stimulated'], reference='resting',layer = 'shifted_dsb')
diff_exp_df = sc.get.rank_genes_groups_df(cd8_t_cells, group=None)
diff_exp_df
names scores logfoldchanges pvals pvals_adj
0 CD25 30.322737 4.042856 5.750194e-202 4.830163e-200
1 CD71 29.554913 4.984287 5.678719e-192 2.385062e-190
2 CD278 28.903751 2.034037 1.071103e-183 2.999088e-182
3 ACTB 28.679626 1.948678 6.849651e-181 1.438427e-179
4 CD82 27.994154 2.719679 1.914118e-172 3.215718e-171
... ... ... ... ... ...
79 CD32 -16.559921 -0.725752 1.357702e-61 4.751956e-61
80 CD244 -18.051432 -1.856854 7.687215e-73 3.398558e-72
81 CD36 -20.640167 -0.997535 1.196482e-94 7.178890e-94
82 CD43 -21.210070 -0.972586 7.709916e-100 4.981792e-99
83 CD50 -22.468721 -0.942513 8.397506e-112 6.412641e-111

84 rows × 5 columns

Let’s visualize these results as a volcano plot and annotate some T cell-relevant markers

diff_exp_df["-log10(adjusted p-value)"] = -np.log10(diff_exp_df["pvals_adj"])
diff_exp_df["Significant"] = diff_exp_df["pvals_adj"] < 0.01

# Create scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=diff_exp_df, x='logfoldchanges', y='-log10(adjusted p-value)', hue='Significant')

# Add labels for points with absolute logfoldchanges > 1.5
for idx, row in diff_exp_df.iterrows():
if abs(row['logfoldchanges']) > 1.5:
plt.annotate(row['names'],
(row['logfoldchanges'], row['-log10(adjusted p-value)']),
xytext=(5, 5), textcoords='offset points')

Alternatively, in a heatmap we can visualize the average normalized values in each of the four samples for the 7 most upregulated and the 7 most downregulated markers in PHA versus resting conditions.

# Get the 7 highest and lowest logfoldchanges
top_markers = diff_exp_df.nlargest(7, 'logfoldchanges')['names'].tolist()
bottom_markers = diff_exp_df.nsmallest(7, 'logfoldchanges')['names'].tolist()
markers_to_plot = top_markers + bottom_markers

# Calculate mean dsb values per sample for CD8 T cells
mean_values = pd.DataFrame()
for sample in pg_data_combined_processed.obs['sample'].unique():
sample_cells = pg_data_combined_processed[pg_data_combined_processed.obs['sample'] == sample]
mean_values[sample] = sample_cells[:, markers_to_plot].to_df('dsb').mean()

# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(mean_values.loc[markers_to_plot],
cmap='coolwarm', # Blue-Yellow-Red colormap
center=0,
yticklabels=True,
xticklabels=True,
linewidths=1, # Add cell borders with width of 1
linecolor='black') # Make the borders black
plt.title('Average DSB values in CD8 T cells per sample')
plt.tight_layout()

In this example, it seems that many of the T cell relevant markers (such as CD71, CD38 and CD25) show a significant increase in abundance upon PHA stimulation.

However, abundance is only one facet of the rich multidimensional data generated by MPX experiments. We are now ready to transition into the spatial realm, exploring the arrangement and interactions of proteins on the cell surface. In the upcoming tutorials, we will leverage the graph nature and spatial metrics within MPX data to unravel the spatial biology of cells, surveying protein polarization and colocalization patterns and how these contribute to cell identity, signaling, and function. Abundance has illuminated cell types; spatiality will illuminate the molecular geography that defines them.