Multiple Comparisons of Abundance and Spatial Scores

In many biological analyses, researchers focus on comparing a single condition or cell type to a specific reference group. For example, they might compare B cells to CD4 T cells or examine treated versus control monocytes. However, there are scenarios where investigating and comparing multiple experimental conditions, cell types, or time points against a single reference group can yield valuable insights. Traditionally, one approach to achieve this has been to manually loop over samples or conditions in the metadata slot of a PXL file object, running multiple tests in succession. While effective, this method can be time-consuming and prone to error.

As we will demonstrate in this tutorial, differential abundance testing using the rank_genes_groups function from Scanpy has this functionality out of the box. Recent extensions to the pixelator functions get_differential_polarity and get_differential_colocalization now offer a more streamlined approach to do the same for the spatial score testing. These updated functions allow for multi-group comparisons without the need to set up complex looping code.

By the end of this tutorial, you will be able to:

Perform differential abundance, polarization, and colocalization tests comparing multiple groups to a single reference group.
Utilize the enhanced capabilities of get_differential_polarity and get_differential_colocalization for efficient multi-group analyses.
Visualize the results from these complex comparisons using plot_polarity_diff_volcano and plot_colocalization_diff_volcano.

Setup

import numpy as np
import pandas as pd
from pathlib import Path
from pixelator.mpx import read, simple_aggregate
from pixelator.common.statistics import dsb_normalize
from pixelator.mpx.analysis.polarization import get_differential_polarity
from pixelator.mpx.analysis.colocalization import get_differential_colocalization
from pixelator.mpx.plot import plot_colocalization_diff_volcano, plot_polarity_diff_volcano
import seaborn as sns
import scanpy as sc
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu, false_discovery_control, zscore
from statsmodels.stats.multitest import multipletests
import umap

sns.set_style("whitegrid")

Loading data

To demonstrate the multiple comparison functionality, we’ll revisit the uropod data first introduced in the spatial analysis tutorial. As a refresher, this experiment involves T cells immobilized with CD54 and stimulated with either RANTES or MCP1 to induce uropod formation. A uropod is a protrusion on the cell surface that aids in cellular movement. During uropod formation, certain surface proteins migrate to one pole of the cell, creating a bulge. This process can be captured by MPX and is evident in the data as increased polarization and colocalization of the proteins involved in uropod formation. To start off this tutorial, we’ll begin by downloading and loading the required PXL files.

DATA_DIR = Path("<path to the directory to save datasets to>")

And then proceed in Python.

# The folders the data is located in:
data_files = [
    DATA_DIR / "Uropod_CD54_fixed.layout.dataset.pxl",
    DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.layout.dataset.pxl",
    DATA_DIR / "Uropod_CD54_fixed_MCP1_stimulated.layout.dataset.pxl",
] 

datasets = list([read(path) for path in data_files])
obj = simple_aggregate(
      ["Control", "Rantes", "Mcpi"], datasets
  )
obj

Pixel dataset contains:
  AnnData with 1607 obs and 80 vars
  Edge list with 20651890 edges
  Polarization scores with 80583 elements
  Colocalization scores with 2160250 elements
  Contains precomputed layouts
  Metadata:
    samples: {'Control': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}, 'Rantes': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed_RANTES_stimulated', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}, 'Mcpi': {'version': '0.19.0', 'sample': 'Uropod_CD54_fixed_MCP1_stimulated', 'analysis': {'params': {'polarization': {'polarization': {'transformation': 'log1p', 'permutations': 50, 'min_marker_count': 5, 'random_seed': None}}, 'colocalization': {'colocalization': {'transformation_type': 'rate-diff', 'neighbourhood_size': 1, 'n_permutations': 50, 'min_region_count': 5, 'min_marker_count': 5}}}}}}

Since quality control and filtering are not the primary focus of this tutorial, we will only perform minimal steps in these areas. However, it’s important to note that thorough quality control and appropriate filtering are crucial for ensuring the reliability and accuracy of your analysis results. For this demonstration, we’ll implement basic quality control measures and apply some essential filtering criteria to prepare our data for further analysis.

QC & Filter Data

edge_rank_df = obj.adata.obs[["sample", "molecules"]].copy()
edge_rank_df["rank"] = edge_rank_df.groupby(["sample"])["molecules"].rank(
    ascending=False, method="first"
)
edgerank_plot = (
    sns.relplot(data=edge_rank_df, x="rank", y="molecules", hue="sample", aspect=1.6)
    .set(xscale="log", yscale="log")
    .set_xlabels("Component rank (by number of molecules)")
    .set_ylabels("Number of molecules")
)
edgerank_plot

Prior to downstream analysis, we will remove components with less than 3000 molecules as well as the antibody count outliers.

# Add rank to the object for further filtering
obj.adata.obs = obj.adata.obs.join(edge_rank_df["rank"], how="left")

# Remove components with less than 3000 molecules
obj.adata = obj.adata[obj.adata.obs["molecules"] > 3000]
# Only keep the components where tau_type is normal
obj.adata = obj.adata[obj.adata.obs["tau_type"] == "normal"]

Differential Abundance

Prior to running the differential abundance test, we normalize the data.

# Add dsb values as a layer to the object
obj.adata.layers["dsb"] = dsb_normalize(
    obj.adata.to_df(), 
    isotype_controls = ["mIgG1", "mIgG2a", "mIgG2b"]
    )

# Calculate minimum value per marker - See the Normalization tutorial for more details
mins = obj.adata.to_df("dsb").min()
# If minimum value is positive, set to zero
mins[mins > 0] = 0
# Add layer to use in Differential Abundance test
obj.adata.layers["shifted_dsb"] = obj.adata.to_df("dsb").sub(mins)

To perform differential abundance testing across multiple conditions, we use the rank_genes_groups function from Scanpy.

The end result is a dataframe containing differential abundance results for all conditions, with adjusted p-values to account for multiple comparisons. This approach streamlines the analysis process and facilitates easier interpretation of results across different experimental conditions.

targets = ["Rantes", "Mcpi"]
reference = "Control"

# Run a wilcoxon test to compare groups
sc.tl.rank_genes_groups(
    obj.adata,
    groupby="sample",
    method="wilcoxon",
    layer="shifted_dsb",
    groups=targets,
    reference="Control",
    correction_method="BH"
)

# Collect results and calculate -log10(adjusted p-value) for plotting
diff_exp_df = sc.get.rank_genes_groups_df(obj.adata, group=None)
diff_exp_df["-log10(adjusted p-value)"] = -np.log10(diff_exp_df["pvals_adj"])
diff_exp_df["label"] = (diff_exp_df["pvals_adj"] < 0.01) & (abs(diff_exp_df["logfoldchanges"]) > 0.5)  

We can visualize the results of this differential analysis as a volcano plot using functions presented in the Abundance Normalization tutorial.

diff_abund_plot = (
    sns.relplot(
    diff_exp_df,
    x="logfoldchanges",
    y="-log10(adjusted p-value)",
    hue="logfoldchanges",
    col="group",
    col_wrap=2,
    height=4,
    s=8,
).set_xlabels("log2 fold change")
)

for text, row in diff_exp_df[diff_exp_df["label"]].iterrows():
    x, y, group = row[["logfoldchanges", "-log10(adjusted p-value)", "group"]]
    if group == diff_exp_df["group"].unique()[0]:
        which_axis = 0
    else:
        which_axis = 1 
    label = diff_abund_plot.axes[which_axis].text(
        x + 0.05, y, diff_exp_df.loc[text, "names"] , horizontalalignment="left", fontsize="xx-small"
    )

diff_abund_plot

Differential Polarization

In this section, we’ll explore how to perform differential polarization testing, comparing both RANTES and MCP1 treatments against a control reference. While typical workflows often begin with filtering steps to remove lowly abundant markers, we’ll omit that for this demonstration.

We’ll highlight a new feature of the get_differential_polarity function from the pixelator package. As mentioned before, comparing multiple conditions to a single reference group used to require iterative approaches and/or data subsetting. However, the updated get_differential_polarity function streamlines this process. It accepts multiple experimental conditions to be tested against a single reference group in one operation. This functionality not only saves time but also automatically handles p-value adjustments for multiple comparisons (Bonferroni correction by default). By leveraging this feature, we can simultaneously assess the differential polarization effects of both RANTES and MCP1 treatments compared to the control condition.

targets = ["Rantes", "Mcpi"]
reference = "Control"

DPA = get_differential_polarity(obj.polarization,
    targets = targets,
    reference = "Control",
    contrast_column = "sample")

DPA.sort_values(by =['p_adj'])

	marker	stat	p_value	median_difference	p_adj	target
15	CD162	57482.0	6.490396e-46	3.276248	5.192317e-44	Rantes
57	CD50	69329.0	6.749971e-24	2.484719	5.399977e-22	Rantes
28	CD26	97543.0	2.133449e-12	1.042112	1.706759e-10	Rantes
70	CD82	101036.0	5.078217e-11	0.788962	4.062574e-09	Rantes
17	CD18	104385.0	1.747884e-10	0.908812	1.398307e-08	Rantes
...	...	...	...	...	...	...
75	HLA-DR	177188.0	1.143129e-01	-0.146772	1.000000e+00	Mcpi
76	TCRVb5	81015.0	5.179407e-01	0.041924	1.000000e+00	Mcpi
77	mIgG1	17007.0	7.778279e-01	-0.028209	1.000000e+00	Mcpi
78	mIgG2a	38053.0	1.518312e-01	-0.123990	1.000000e+00	Mcpi
79	mIgG2b	48753.0	3.982969e-01	0.056669	1.000000e+00	Mcpi

160 rows × 6 columns

Alternatively, we can visualize the output of these tests next to each other in a facetted volcano plot. The plot_polarity_diff_volcano function will perform the differential analysis and create volcano plots facetted on the targets.

# Create polarity strip plot

pol_volcano_plot = plot_polarity_diff_volcano(obj.polarization,
    targets = targets,
    reference = "Control",
    contrast_column = "sample")
_ = pol_volcano_plot[1][0].set_xlim(-1, 1.5)
_ = pol_volcano_plot[1][1].set_xlim(-1, 3)

Differential Colocalization

Similarly, we can run multiple differential colocalization tests from a single call to get_differential_colocalization.

DCA = get_differential_colocalization(obj.colocalization,
    targets = targets,
    reference = "Control",
    contrast_column = "sample")

DCA.sort_values(by =['p_adj'])

	marker_1	marker_2	stat	p_value	median_difference	p_adj	target
1121	CD162	CD50	60522.0	1.990746e-28	7.118788	6.290758e-25	Rantes
151	B2M	HLA-ABC	90377.0	2.380435e-20	4.897676	7.522174e-17	Rantes
1241	CD18	CD45RB	93332.0	1.790114e-17	3.771053	5.656759e-14	Rantes
2138	CD29	CD43	101524.0	8.688518e-16	-3.437684	2.745572e-12	Rantes
1114	CD162	CD45	157426.0	1.085987e-15	-3.713952	3.431718e-12	Rantes
...	...	...	...	...	...	...	...
3156	TCRVb5	mIgG2b	39590.0	2.514257e-01	-0.386695	1.000000e+00	Mcpi
3157	mIgG1	mIgG2a	10733.0	1.917360e-01	-0.804866	1.000000e+00	Mcpi
3158	mIgG1	mIgG2b	11533.0	6.731368e-01	-0.226777	1.000000e+00	Mcpi
3159	mIgG2a	mIgG2b	19345.0	7.051521e-01	0.176204	1.000000e+00	Mcpi
0	ACTB	B2M	14082.0	9.354814e-01	0.079239	1.000000e+00	Rantes

6320 rows × 7 columns

Again, side-by-side visualization of the volcano plots is easily achieved by facetting on the targets using the plot_colocalization_diff_volcano function. This function will perform the differential analysis and create volcano plots facetted on the targets.

# Volcano plot
col_volcano_plot = plot_colocalization_diff_volcano(obj.colocalization,
    targets = targets,
    reference = "Control",
    contrast_column = "sample")

_ = col_volcano_plot[1][1].set_xlim(-4, 6)

This concludes this short tutorial on performing multiple comparisons of abundance, polarization, and colocalization. Now, you’ll be able to perform multiple comparisons simultaneously, which significantly accelerates the analysis process for more complex experimental designs. The combination of speed, accuracy, and statistical robustness provided by this approach will help ensure that your results are not only more reliable but also more likely to stand up to peer review and replication efforts.

Remember, while this method is powerful, it’s always important to consider the biological context of your experiments and interpret the results in light of your specific research questions and hypotheses.

Setup​

Loading data​

QC & Filter Data​

Differential Abundance​

Differential Polarization​

Differential Colocalization​