Multiple Comparisons of Abundance and Spatial Scores
In many biological analyses, researchers focus on comparing a single condition or cell type to a specific reference group. For example, they might compare B cells to CD4 T cells or examine treated versus control monocytes. However, there are scenarios where investigating and comparing multiple experimental conditions, cell types, or time points against a single reference group can yield valuable insights. Traditionally, one approach to achieve this has been to manually loop over samples or conditions in the metadata slot of a PXL file object, running multiple tests in succession. While effective, this method can be time-consuming and prone to error.
As we will demonstrate in this tutorial, differential abundance testing
using the rank_genes_groups
function from Scanpy has this
functionality out of the box. Recent extensions to the pixelator
functions get_differential_polarity
and
get_differential_colocalization
now offer a more streamlined approach
to do the same for the spatial score testing. These updated functions
allow for multi-group comparisons without the need to set up complex
looping code.
By the end of this tutorial, you will be able to:
- Perform differential abundance, polarization, and colocalization tests comparing multiple groups to a single reference group.
- Utilize the enhanced capabilities of
get_differential_polarity
andget_differential_colocalization
for efficient multi-group analyses. - Visualize the results from these complex comparisons using
plot_polarity_diff_volcano
andplot_colocalization_diff_volcano
.
Setup
import numpy as np
import pandas as pd
from pathlib import Path
from pixelator.mpx import read, simple_aggregate
from pixelator.common.statistics import dsb_normalize
from pixelator.mpx.analysis.polarization import get_differential_polarity
from pixelator.mpx.analysis.colocalization import get_differential_colocalization
from pixelator.mpx.plot import plot_colocalization_diff_volcano, plot_polarity_diff_volcano
import seaborn as sns
import scanpy as sc
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu, false_discovery_control, zscore
from statsmodels.stats.multitest import multipletests
import umap
sns.set_style("whitegrid")
Loading data
To demonstrate the multiple comparison functionality, we’ll revisit the uropod data first introduced in the spatial analysis tutorial. As a refresher, this experiment involves T cells immobilized with CD54 and stimulated with either RANTES or MCP1 to induce uropod formation. A uropod is a protrusion on the cell surface that aids in cellular movement. During uropod formation, certain surface proteins migrate to one pole of the cell, creating a bulge. This process can be captured by MPX and is evident in the data as increased polarization and colocalization of the proteins involved in uropod formation. To start off this tutorial, we’ll begin by downloading and loading the required PXL files.
DATA_DIR = Path("<path to the directory to save datasets to>")
And then proceed in Python.
# The folders the data is located in:
data_files = [
DATA_DIR / "Uropod_CD54_fixed.layout.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl",
]
datasets = list([read(path) for path in data_files])
obj = simple_aggregate(
["Control", "Rantes", "Mcpi"], datasets
)
obj
Pixel dataset contains:
AnnData with 1607 obs and 80 vars
Edge list with 20651890 edges
Polarization scores with 80583 elements
Colocalization scores with 2160250 elements
Contains precomputed layouts
Metadata:
samples: {'Control': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}, 'Rantes': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed_RANTES_stimulated', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}, 'Mcpi': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed_MCP1_stimulated', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}}
Since quality control and filtering are not the primary focus of this tutorial, we will only perform minimal steps in these areas. However, it’s important to note that thorough quality control and appropriate filtering are crucial for ensuring the reliability and accuracy of your analysis results. For this demonstration, we’ll implement basic quality control measures and apply some essential filtering criteria to prepare our data for further analysis.
QC & Filter Data
edge_rank_df = obj.adata.obs[["sample", "molecules"]].copy()
edge_rank_df["rank"] = edge_rank_df.groupby(["sample"])["molecules"].rank(
ascending=False, method="first"
)
edgerank_plot = (
sns.relplot(data=edge_rank_df, x="rank", y="molecules", hue="sample", aspect=1.6)
.set(xscale="log", yscale="log")
.set_xlabels("Component rank (by number of molecules)")
.set_ylabels("Number of molecules")
)
edgerank_plot
Prior to downstream analysis, we will remove components with less than 3000 molecules as well as the antibody count outliers.
# Add rank to the object for further filtering
obj.adata.obs = obj.adata.obs.join(edge_rank_df["rank"], how="left")
# Remove components with less than 3000 molecules
obj.adata = obj.adata[obj.adata.obs["molecules"] > 3000]
# Only keep the components where tau_type is normal
obj.adata = obj.adata[obj.adata.obs["tau_type"] == "normal"]
Differential Abundance
Prior to running the differential abundance test, we normalize the data.
# Add dsb values as a layer to the object
obj.adata.layers["dsb"] = dsb_normalize(
obj.adata.to_df(),
isotype_controls = ["mIgG1", "mIgG2a", "mIgG2b"]
)
# Calculate minimum value per marker - See the Normalization tutorial for more details
mins = obj.adata.to_df("dsb").min()
# If minimum value is positive, set to zero
mins[mins > 0] = 0
# Add layer to use in Differential Abundance test
obj.adata.layers["shifted_dsb"] = obj.adata.to_df("dsb").sub(mins)
To perform differential abundance testing across multiple conditions, we
use the rank_genes_groups
function from Scanpy.
The end result is a dataframe containing differential abundance results for all conditions, with adjusted p-values to account for multiple comparisons. This approach streamlines the analysis process and facilitates easier interpretation of results across different experimental conditions.
targets = ["Rantes", "Mcpi"]
reference = "Control"
# Run a wilcoxon test to compare groups
sc.tl.rank_genes_groups(
obj.adata,
groupby="sample",
method="wilcoxon",
layer="shifted_dsb",
groups=targets,
reference="Control",
correction_method="BH"
)
# Collect results and calculate -log10(adjusted p-value) for plotting
diff_exp_df = sc.get.rank_genes_groups_df(obj.adata, group=None)
diff_exp_df["-log10(adjusted p-value)"] = -np.log10(diff_exp_df["pvals_adj"])
diff_exp_df["label"] = (diff_exp_df["pvals_adj"] < 0.01) & (abs(diff_exp_df["logfoldchanges"]) > 0.5)
We can visualize the results of this differential analysis as a volcano plot using functions presented in the Abundance Normalization tutorial.
diff_abund_plot = (
sns.relplot(
diff_exp_df,
x="logfoldchanges",
y="-log10(adjusted p-value)",
hue="logfoldchanges",
col="group",
col_wrap=2,
height=4,
s=8,
).set_xlabels("log2 fold change")
)
for text, row in diff_exp_df[diff_exp_df["label"]].iterrows():
x, y, group = row[["logfoldchanges", "-log10(adjusted p-value)", "group"]]
if group == diff_exp_df["group"].unique()[0]:
which_axis = 0
else:
which_axis = 1
label = diff_abund_plot.axes[which_axis].text(
x + 0.05, y, diff_exp_df.loc[text, "names"] , horizontalalignment="left", fontsize="xx-small"
)
diff_abund_plot
Differential Polarization
In this section, we’ll explore how to perform differential polarization testing, comparing both RANTES and MCP1 treatments against a control reference. While typical workflows often begin with filtering steps to remove lowly abundant markers, we’ll omit that for this demonstration.
We’ll highlight a new feature of the get_differential_polarity
function from the pixelator package. As mentioned before, comparing
multiple conditions to a single reference group used to require
iterative approaches and/or data subsetting. However, the updated
get_differential_polarity
function streamlines this process. It
accepts multiple experimental conditions to be tested against a single
reference group in one operation. This functionality not only saves time
but also automatically handles p-value adjustments for multiple
comparisons (Bonferroni correction by default). By leveraging this
feature, we can simultaneously assess the differential polarization
effects of both RANTES and MCP1 treatments compared to the control
condition.
targets = ["Rantes", "Mcpi"]
reference = "Control"
DPA = get_differential_polarity(obj.polarization,
targets = targets,
reference = "Control",
contrast_column = "sample")
DPA.sort_values(by =['p_adj'])
marker | stat | p_value | median_difference | p_adj | target | |
---|---|---|---|---|---|---|
15 | CD162 | 57482.0 | 6.490396e-46 | 3.276248 | 5.192317e-44 | Rantes |
57 | CD50 | 69329.0 | 6.749971e-24 | 2.484719 | 5.399977e-22 | Rantes |
28 | CD26 | 97543.0 | 2.133449e-12 | 1.042112 | 1.706759e-10 | Rantes |
70 | CD82 | 101036.0 | 5.078217e-11 | 0.788962 | 4.062574e-09 | Rantes |
17 | CD18 | 104385.0 | 1.747884e-10 | 0.908812 | 1.398307e-08 | Rantes |
... | ... | ... | ... | ... | ... | ... |
75 | HLA-DR | 177188.0 | 1.143129e-01 | -0.146772 | 1.000000e+00 | Mcpi |
76 | TCRVb5 | 81015.0 | 5.179407e-01 | 0.041924 | 1.000000e+00 | Mcpi |
77 | mIgG1 | 17007.0 | 7.778279e-01 | -0.028209 | 1.000000e+00 | Mcpi |
78 | mIgG2a | 38053.0 | 1.518312e-01 | -0.123990 | 1.000000e+00 | Mcpi |
79 | mIgG2b | 48753.0 | 3.982969e-01 | 0.056669 | 1.000000e+00 | Mcpi |
160 rows × 6 columns
Alternatively, we can visualize the output of these tests next to each
other in a facetted volcano plot. The plot_polarity_diff_volcano
function will perform the differential analysis and create volcano plots
facetted on the targets.
# Create polarity strip plot
pol_volcano_plot = plot_polarity_diff_volcano(obj.polarization,
targets = targets,
reference = "Control",
contrast_column = "sample")
_ = pol_volcano_plot[1][0].set_xlim(-1, 1.5)
_ = pol_volcano_plot[1][1].set_xlim(-1, 3)
Differential Colocalization
Similarly, we can run multiple differential colocalization tests from a
single call to get_differential_colocalization
.
DCA = get_differential_colocalization(obj.colocalization,
targets = targets,
reference = "Control",
contrast_column = "sample")
DCA.sort_values(by =['p_adj'])
marker_1 | marker_2 | stat | p_value | median_difference | p_adj | target | |
---|---|---|---|---|---|---|---|
1121 | CD162 | CD50 | 60522.0 | 1.990746e-28 | 7.118788 | 6.290758e-25 | Rantes |
151 | B2M | HLA-ABC | 90377.0 | 2.380435e-20 | 4.897676 | 7.522174e-17 | Rantes |
1241 | CD18 | CD45RB | 93332.0 | 1.790114e-17 | 3.771053 | 5.656759e-14 | Rantes |
2138 | CD29 | CD43 | 101524.0 | 8.688518e-16 | -3.437684 | 2.745572e-12 | Rantes |
1114 | CD162 | CD45 | 157426.0 | 1.085987e-15 | -3.713952 | 3.431718e-12 | Rantes |
... | ... | ... | ... | ... | ... | ... | ... |
3156 | TCRVb5 | mIgG2b | 39590.0 | 2.514257e-01 | -0.386695 | 1.000000e+00 | Mcpi |
3157 | mIgG1 | mIgG2a | 10733.0 | 1.917360e-01 | -0.804866 | 1.000000e+00 | Mcpi |
3158 | mIgG1 | mIgG2b | 11533.0 | 6.731368e-01 | -0.226777 | 1.000000e+00 | Mcpi |
3159 | mIgG2a | mIgG2b | 19345.0 | 7.051521e-01 | 0.176204 | 1.000000e+00 | Mcpi |
0 | ACTB | B2M | 14082.0 | 9.354814e-01 | 0.079239 | 1.000000e+00 | Rantes |
6320 rows × 7 columns
Again, side-by-side visualization of the volcano plots is easily
achieved by facetting on the targets using the
plot_colocalization_diff_volcano
function. This function will perform
the differential analysis and create volcano plots facetted on the
targets.
# Volcano plot
col_volcano_plot = plot_colocalization_diff_volcano(obj.colocalization,
targets = targets,
reference = "Control",
contrast_column = "sample")
_ = col_volcano_plot[1][1].set_xlim(-4, 6)
This concludes this short tutorial on performing multiple comparisons of abundance, polarization, and colocalization. Now, you’ll be able to perform multiple comparisons simultaneously, which significantly accelerates the analysis process for more complex experimental designs. The combination of speed, accuracy, and statistical robustness provided by this approach will help ensure that your results are not only more reliable but also more likely to stand up to peer review and replication efforts.
Remember, while this method is powerful, it’s always important to consider the biological context of your experiments and interpret the results in light of your specific research questions and hypotheses.