MPX Quality Control
This tutorial details the first steps of data analysis to quality control and clean up the MPX data output from Pixelator.
After completing this tutorial, you should be able to:
- Use the edge rank plot to manually set cell calling thresholds and filter low-quality cells.
- Aggregate data across samples and visualize sample-level QC metrics like number of cells.
- Check distributions of quality metrics like molecule counts and graph connectivity.
- Identify and remove cell outliers using the antibody count distribution metric Tau.
Setup
First, we will load packages necessary for downstream processing.
library(pixelatorR)
library(dplyr)
library(stringr)
library(SeuratObject)
library(here)
library(ggplot2)
library(tidyr)
library(tibble)
Load data
We will begin by loading the merged dataset we created the end of the previous tutorial Data handling. As described previously, this merged object contains four samples: a resting and PHA stimulated PBMC sample, both in duplicate. Note that if you continue directly from the previous tutorial, you don’t need to load the data again.
DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)
# Load the object that you saved in the previous tutorial. This is not needed if it is still in your workspace.
pg_data_combined <- readRDS(file.path(DATA_DIR, "combined_data.rds"))
pg_data_combined
An object of class Seurat
84 features across 4454 samples within 1 assay
Active assay: mpxCells (84 features, 84 variable features)
1 layer present: counts
Cell calling: Edge rank plot
Here, we use the molecule rank plot to perform an additional quality control of the called cells, to make a manual adjustment to the number of cells that were called by Pixelator. This removes cells that deviate from the component size distribution, and might not represent whole cells.
edgerank_plot <- MoleculeRankPlot(pg_data_combined, group_by = "sample")
edgerank_plot
It looks like components are declining rapidly in size at around 10000 edges, and we will thus set a manual cutoff at that point, represented by a dashed line.
cutoff <- 10000
edgerank_plot +
geom_hline(yintercept = cutoff,
linetype = "dashed")
# Filter cells to have at least 10000 edges
pg_data_combined <-
pg_data_combined %>%
subset(molecules >= cutoff)
pg_data_combined
An object of class Seurat
84 features across 3860 samples within 1 assay
Active assay: mpxCells (84 features, 84 variable features)
1 layer present: counts
Here, we plot the number of called cells per condition and replicate.
CellCountPlot(pg_data_combined, color_by = "sample")
Here, we visualize the distribution of some metrics among components.
pg_data_combined[[]] %>%
select(sample, molecules, mean_molecules_per_a_pixel, reads, mean_reads_per_molecule) %>%
pivot_longer(cols = c("molecules", "mean_molecules_per_a_pixel", "reads", "mean_reads_per_molecule"), names_to = "metric", values_to = "value") %>%
mutate(metric = factor(metric, levels = c("molecules", "mean_molecules_per_a_pixel", "reads", "mean_reads_per_molecule"))) %>%
ggplot(aes(sample, value)) +
geom_violin(draw_quantiles = 0.5,
fill = "gray") +
facet_grid(metric ~ ., scales = "free") +
scale_y_log10() +
theme_minimal()
Antibody count distribution outlier removal
Here, we have plotted the umi_per_upia
stat, reflecting the mean
molecules per DNA-pixel A, against
Tau,
and colored each component by Pixelator’s classification of Tau
(low, normal, or high). It looks like Pixelator has accurately
picked out some outliers that might be an antibody aggregate or a
component that has low specificity, binding many more different types of
antibodies than we would expect from a normal cell. As these outliers
are likely a technical artefact, we will remove them from the analysis.
### Pixel content vs Marker specificity (requires pixelatorR >= 0.10.0)
TauPlot(pg_data_combined, group_by = "sample")
# Only keep the components where tau_type is normal
pg_data_combined <-
pg_data_combined %>%
subset(tau_type == "normal")
pg_data_combined
An object of class Seurat
84 features across 3844 samples within 1 assay
Active assay: mpxCells (84 features, 84 variable features)
1 layer present: counts
Going through the above steps we have performed critical quality control by filtering low-quality cells and identifying outliers. With a clean, high-quality MPX dataset in hand, we are ready to proceed to the next step, in which we will be normalizing and denoising protein abundance to facilitate the annotation of different cell populations.
If you want to pause here and save the filtered data for the next tutorial, you can do so with the following code.
# Save filtered dataset for next step. This is an optional pause step; if you prefer to continue to the next tutorial without saving the object explicitly, that will also work.
saveRDS(pg_data_combined, file.path(DATA_DIR, "combined_data_filtered.rds"))