Differential Abundance
Now we have merged, normalized, integrated and annotated the cell types in our dataset, we can proceed to the biological interpretation of our experiment. MPX data contains a wealth of spatial data that we will discuss in the following tutorials. Here, we will focus on the differential abundance of markers across conditions.
After completing this tutorial, you should be able to:
- Identify abundance changes across conditions within a cell type
- Visualize these changes in a clear way
Setup
from matplotlib.pyplot import rc_context
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
from pixelator import read
from pixelator.statistics import clr_transformation
import scanpy as sc
import seaborn as sns
sns.set_style("whitegrid")
DATA_DIR = Path("<path to the directory to save datasets to>")
Load data
To illustrate the annotation workflow we continue with the integrated object we created in the previous tutorial. As described before, this merged and harmonized object contains four samples: a resting and PHA stimulated PBMC sample, both in duplicate.
pg_data_combined_pxl_object = read(DATA_DIR/ "combined_data_annotated.pxl")
# In this tutorial we will mostly work with the AnnData object
# so we will start by selecting that from the pixel data object
pg_data_combined_processed = pg_data_combined_pxl_object.adata
Differential Abundance in CD8 T cells
As PHA is a T cell mitogen, one might be interested in how PHA treatment
affects protein abundance in T cells. Therefore, in this tutorial, we
will subset the data to only include the CD8 T cells and then perform a
differential abundance analysis of the PHA treated cells versus the
resting cells. As mentioned in the Abundance
Normalization tutorial, we
will use the shifted_dsb
layer for the differential abundance
analysis. This is necessary to avoid negative average expression values
in one of the conditions that would otherwise break the calculation of
log fold changes. If you have not applied the shifting step in the
Abundance Normalization
tutorial, you can do so now by running the following code chunk:
# Calculate minimum value per marker
mins = pg_data_combined_processed.to_df("dsb").min()
# If minimum value is positive, set to zero
mins[mins > 0] = 0
# Add layer to use in Differential Abundance test
pg_data_combined_processed.layers["shifted_dsb"] = pg_data_combined_processed.to_df("dsb").sub(mins)
Now we can perform the differential abundance analysis.
# Subset data to only include CD8 T cells
cd8_t_cells = pg_data_combined_processed[pg_data_combined_processed.obs['cell_type'] == 'CD8 T',:]
# Calculate the difference in dsb value between stimulated and resting CD8 T cells
sc.tl.rank_genes_groups(cd8_t_cells, groupby='condition', method='wilcoxon', groups=['stimulated'], reference='resting',layer = 'shifted_dsb')
diff_exp_df = sc.get.rank_genes_groups_df(cd8_t_cells, group=None)
diff_exp_df
names | scores | logfoldchanges | pvals | pvals_adj | |
---|---|---|---|---|---|
0 | CD25 | 30.322737 | 4.042856 | 5.750194e-202 | 4.830163e-200 |
1 | CD71 | 29.554913 | 4.984287 | 5.678719e-192 | 2.385062e-190 |
2 | CD278 | 28.903751 | 2.034037 | 1.071103e-183 | 2.999088e-182 |
3 | ACTB | 28.679626 | 1.948678 | 6.849651e-181 | 1.438427e-179 |
4 | CD82 | 27.994154 | 2.719679 | 1.914118e-172 | 3.215718e-171 |
... | ... | ... | ... | ... | ... |
79 | CD32 | -16.559921 | -0.725752 | 1.357702e-61 | 4.751956e-61 |
80 | CD244 | -18.051432 | -1.856854 | 7.687215e-73 | 3.398558e-72 |
81 | CD36 | -20.640167 | -0.997535 | 1.196482e-94 | 7.178890e-94 |
82 | CD43 | -21.210070 | -0.972586 | 7.709916e-100 | 4.981792e-99 |
83 | CD50 | -22.468721 | -0.942513 | 8.397506e-112 | 6.412641e-111 |
84 rows × 5 columns
# Subset data to only include CD8 T cells
cd8_t_cells = pg_data_combined_processed[pg_data_combined_processed.obs['cell_type'] == 'CD8 T',:]
# Calculate the difference in dsb value between stimulated and resting CD8 T cells
sc.tl.rank_genes_groups(cd8_t_cells, groupby='condition', method='wilcoxon', groups=['stimulated'], reference='resting',layer = 'shifted_dsb')
diff_exp_df = sc.get.rank_genes_groups_df(cd8_t_cells, group=None)
diff_exp_df
names | scores | logfoldchanges | pvals | pvals_adj | |
---|---|---|---|---|---|
0 | CD25 | 30.322737 | 4.042856 | 5.750194e-202 | 4.830163e-200 |
1 | CD71 | 29.554913 | 4.984287 | 5.678719e-192 | 2.385062e-190 |
2 | CD278 | 28.903751 | 2.034037 | 1.071103e-183 | 2.999088e-182 |
3 | ACTB | 28.679626 | 1.948678 | 6.849651e-181 | 1.438427e-179 |
4 | CD82 | 27.994154 | 2.719679 | 1.914118e-172 | 3.215718e-171 |
... | ... | ... | ... | ... | ... |
79 | CD32 | -16.559921 | -0.725752 | 1.357702e-61 | 4.751956e-61 |
80 | CD244 | -18.051432 | -1.856854 | 7.687215e-73 | 3.398558e-72 |
81 | CD36 | -20.640167 | -0.997535 | 1.196482e-94 | 7.178890e-94 |
82 | CD43 | -21.210070 | -0.972586 | 7.709916e-100 | 4.981792e-99 |
83 | CD50 | -22.468721 | -0.942513 | 8.397506e-112 | 6.412641e-111 |
84 rows × 5 columns
Let’s visualize these results as a volcano plot and annotate some T cell-relevant markers
diff_exp_df["-log10(adjusted p-value)"] = -np.log10(diff_exp_df["pvals_adj"])
diff_exp_df["Significant"] = diff_exp_df["pvals_adj"] < 0.01
# Create scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=diff_exp_df, x='logfoldchanges', y='-log10(adjusted p-value)', hue='Significant')
# Add labels for points with absolute logfoldchanges > 1.5
for idx, row in diff_exp_df.iterrows():
if abs(row['logfoldchanges']) > 1.5:
plt.annotate(row['names'],
(row['logfoldchanges'], row['-log10(adjusted p-value)']),
xytext=(5, 5), textcoords='offset points')
Alternatively, in a heatmap we can visualize the average normalized values in each of the four samples for the 7 most upregulated and the 7 most downregulated markers in PHA versus resting conditions.
# Get the 7 highest and lowest logfoldchanges
top_markers = diff_exp_df.nlargest(7, 'logfoldchanges')['names'].tolist()
bottom_markers = diff_exp_df.nsmallest(7, 'logfoldchanges')['names'].tolist()
markers_to_plot = top_markers + bottom_markers
# Calculate mean dsb values per sample for CD8 T cells
mean_values = pd.DataFrame()
for sample in pg_data_combined_processed.obs['sample'].unique():
sample_cells = pg_data_combined_processed[pg_data_combined_processed.obs['sample'] == sample]
mean_values[sample] = sample_cells[:, markers_to_plot].to_df('dsb').mean()
# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(mean_values.loc[markers_to_plot],
cmap='coolwarm', # Blue-Yellow-Red colormap
center=0,
yticklabels=True,
xticklabels=True,
linewidths=1, # Add cell borders with width of 1
linecolor='black') # Make the borders black
plt.title('Average DSB values in CD8 T cells per sample')
plt.tight_layout()
In this example, it seems that many of the T cell relevant markers (such as CD71, CD38 and CD25) show a significant increase in abundance upon PHA stimulation.
However, abundance is only one facet of the rich multidimensional data generated by MPX experiments. We are now ready to transition into the spatial realm, exploring the arrangement and interactions of proteins on the cell surface. In the upcoming tutorials, we will leverage the graph nature and spatial metrics within MPX data to unravel the spatial biology of cells, surveying protein polarization and colocalization patterns and how these contribute to cell identity, signaling, and function. Abundance has illuminated cell types; spatiality will illuminate the molecular geography that defines them.