Skip to main content
Version: 0.17.x

Data Handling

This tutorial describes how to start working with the MPX output data format. The primary data output file of * *pixelator** is the PXL file. Here, you will learn how to read and interact with it. We will go through the different components of a single sample PXL file and also how to aggregate results from multiple samples. You can read more about the PXL file format here.

After completing this tutorial, you will be able to:

  • Load PXL files in R and access the multi-modal data contents including protein counts, metadata and spatial scores.
  • Understand and work with spatial polarity (protein clustering patterns).
  • Interpret and work with colocalization scores (protein co-clustering).
  • Directly access the spatial graph structure through the edge list.
  • Aggregate multiple PXL files into an integrated data object with sample identities.
  • Save aggregated data objects for later reuse in analyses.

Setup

To start with, we need to load some packages and functions that we will need.

library(pixelatorR)
library(SeuratObject)
library(dplyr)
library(stringr)
library(here)

Loading data

We begin by locating the output from the pixelator pipeline. The input for downstream analysis is the PXL file contained in the analysis directory.

filename <- here("data/Sample01_human_pbmcs_unstimulated.dataset.pxl")
pg_data <- ReadMPX_Seurat(filename)

Component meta data

In addition to the counts data, the object contains meta data of each MPX graph component, each corresponding to a cell. This table contains information that can be useful in quality control, with information such as how many protein molecules were detected (edges), the sequencing depth (mean_reads), and the graph connectivity of each component (mean_upia_degree).

as_tibble(pg_data[[]], rownames = "component")
# A tibble: 477 × 19
component antibodies edges leiden mean_reads mean_umi_per_upia
<chr> <int> <int> <fct> <dbl> <dbl>
1 RCVCMP0000000 77 23925 2 6.10 8.18
2 RCVCMP0000002 72 6719 1 5.87 3.86
3 RCVCMP0000003 78 8596 5 5.96 3.65
4 RCVCMP0000005 79 17206 3 5.74 4.78
5 RCVCMP0000006 76 21254 2 5.51 5.41
6 RCVCMP0000007 69 6687 1 5.68 5.20
7 RCVCMP0000008 62 7687 1 5.79 5.10
8 RCVCMP0000010 66 4778 6 5.60 2.83
9 RCVCMP0000012 69 3873 4 5.96 2.62
10 RCVCMP0000013 80 31414 3 6.11 7.05
# ℹ 467 more rows
# ℹ 13 more variables: mean_upia_degree <dbl>, median_reads <dbl>,
# median_umi_per_upia <dbl>, median_upia_degree <dbl>, reads <int>,
# tau <dbl>, tau_type <fct>, umi <int>, umi_per_upia <dbl>, upia <int>,
# upia_per_upib <dbl>, upib <int>, vertices <int>

The antibody count matrix can be accessed from the Seurat object using the LayerData method.

# Display the 10 first rows and 10 first columns of the antibody counts
LayerData(pg_data, layer = "counts")[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"

CD274 62 11 25 31 16 8 8 3 6 65
CD44 553 180 66 347 213 72 131 69 126 804
CD25 12 7 6 8 5 2 2 2 2 27
CD279 4 2 . 8 2 3 2 . 1 10
CD41 6 1 3 5 6 1 21 . 5 25
HLA-ABC 10311 2169 3617 5545 10250 2035 2775 1344 1161 10840
CD54 381 99 160 224 428 103 103 58 44 458
CD26 129 175 58 148 52 115 112 21 68 1104
CD27 132 48 51 177 60 49 51 9 62 182
CD38 37 268 401 66 93 91 53 79 14 150

Spatial data

In addition to count data corresponding to protein abundance, MPX also generates spatial data which be used to visualize individual cells, and to find spatial structures of proteins across cells. The PXL file comes with MPX polarity scores, describing the degree of spatial clustering of each protein on each cell, and MPX colocalization scores, which describe the degree of spatial colocalization of pairs of proteins on each cell. If you want to read more about spatial metrics in MPX data, click here.

pixelatorR Assay classes

SeuratObject provides the Assay class and more recently the Assay5 class to store single-cell abundance data in. PixelatorR provides additional classes CellGraphAssay and CellGraphAssay5 to store both abundance data and spatial metrics in. These classes inherit the corresponding SeuratObject classes but also contain additional slots for the spatial metrics. Here for instance, pg_data contains a CellGraphAssay5 object called “mpxCells” which can be accessed as follows:

pg_data[["mpxCells"]]
CellGraphAssay (v5) data with 80 features for 477 cells
Top 10 variable features:
CD274, CD44, CD25, CD279, CD41, HLA-ABC, CD54, CD26, CD27, CD38
Layers:
counts
Loaded CellGraph objects:
0

In the output above, we can confirm that the object is of class CellGraphAssay5 and that it currently contains 0 loaded CellGraph objects. More about that later.

MPX Polarity scores

The spatial metrics can be accessed from the CellGraphAssay5 object with:

PolarizationScores(pg_data)
# A tibble: 33,479 × 6
morans_i morans_p_value morans_p_adjusted morans_z marker component
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 -0.00299 0.772 1.00 -0.290 ACTB RCVCMP0000830
2 -0.0161 0.177 0.771 -1.35 B2M RCVCMP0000830
3 0.0147 0.125 0.633 1.53 CD102 RCVCMP0000830
4 -0.00678 0.590 1.00 -0.539 CD11a RCVCMP0000830
5 0.000836 0.891 1.00 0.137 CD127 RCVCMP0000830
6 -0.00132 0.145 0.693 -1.46 CD150 RCVCMP0000830
7 -0.00103 0.395 1.00 -0.851 CD152 RCVCMP0000830
8 -0.00162 0.0365 0.265 -2.09 CD154 RCVCMP0000830
9 -0.000919 0.971 1.00 -0.0364 CD162 RCVCMP0000830
10 -0.000345 0.561 1.00 0.582 CD163 RCVCMP0000830
# ℹ 33,469 more rows

MPX colocalization scores

Similarly, pixelatorR provides a method to access the colocalization scores:

ColocalizationScores(pg_data)
# A tibble: 1,211,823 × 15
marker_1 marker_2 pearson pearson_mean pearson_stdev pearson_z
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ACTB ACTB 1 1 0 NA
2 ACTB B2M 0.324 0.317 0.0162 0.429
3 B2M B2M 1 1 0 NA
4 ACTB CD102 0.304 0.235 0.0167 4.17
5 B2M CD102 0.604 0.614 0.00966 -1.06
6 CD102 CD102 1 1 0 NA
7 ACTB CD11a 0.303 0.311 0.0154 -0.526
8 B2M CD11a 0.928 0.934 0.00252 -2.34
9 CD102 CD11a 0.568 0.601 0.00976 -3.32
10 CD11a CD11a 1 1 0 NA
# ℹ 1,211,813 more rows
# ℹ 9 more variables: pearson_p_value <dbl>, pearson_p_value_adjusted <dbl>,
# jaccard <dbl>, jaccard_mean <dbl>, jaccard_stdev <dbl>, jaccard_z <dbl>,
# jaccard_p_value <dbl>, jaccard_p_value_adjusted <dbl>, component <chr>

Edgelist

The edgelist is a list of uniquely identified antibodies per each cell, and simultaneously constitutes the bipartite graph which makes up an MPX experiment. Each row contains an edge (a uniquely identified antibody) which is identified by a UMI, UPIA, and a UPIB. The UPIA is the unique identifier for the specific A pixel, while the UPIB is the unique identifier for the B pixel. Both A pixels and B pixels may contain multiple edges, resulting in a spatial graph which can be used to infer spatial relationships between antibody molecules, to calculate spatial statistics, and to visualize the cell.

edge_list <- 
ReadMPX_edgelist(filename)
edge_list
# A tibble: 4,655,115 × 9
upia upib umi marker sequence count umi_unique_count upi_unique_count
<chr> <chr> <chr> <chr> <chr> <int> <int> <int>
1 ATTAGAAG… TATA… GCGT… CD48 GACCACTC 28 1 1
2 TTAACATA… CGAT… ACTG… CD48 GACCACTC 26 1 1
3 AAAAATAC… GGTC… GCAG… CD48 GACCACTC 26 1 1
4 ATTTGTTT… TGTT… AGAT… CD48 GACCACTC 26 1 2
5 TTTAGCGT… GTGC… CTCC… CD48 GACCACTC 25 1 2
6 ACTTCGTT… GTCA… AACG… CD48 GACCACTC 25 1 3
7 CCAAATTC… TAGT… GTAC… CD48 GACCACTC 25 1 1
8 GTGAACAA… TCCT… TTAA… CD48 GACCACTC 24 1 1
9 GAAGATAT… AAGC… TGCA… CD48 GACCACTC 24 1 1
10 ACCGGACA… GGAT… ACTT… CD48 GACCACTC 24 1 1
# ℹ 4,655,105 more rows
# ℹ 1 more variable: component <chr>

Seurat objects and PXL files

For some analytical tasks, we will need access to the edgelist. For instance, if we want to load component graphs for visualization, we need access to the edgelist to build these graphs. However, the edgelist is not stored in the Seurat object, it’s only available in the PXL file. The CellGraphAssay5 that was mentioned earlier in this tutorial keeps the path to the PXL file so that we can access it when needed. This has the advantage that we can keep the size of the Seurat object relatively small until we decide to load the graphs. On the down side, it also means that the path to the PXL file need to remain unchanged. If the PXL file is moved or renamed, we can no longer access it.

This has an important implication when saving and sharing Seurat objects containing MPX data. If you want to export a Seurat object as an RDS file and share it with someone else, you should also share associated the PXL file. In addition, the path to the PXL file has to be updated in the Seurat object.

If this is unclear at the moment, don’t worry. We will come back to this topic in the tutorials about 2D and 3D cell visualization.

Aggregating data

For cases when we want to process and analyze samples in parallel, it is convenient to merge them into a single object. Here, we read multiple files and aggregate them into a single object. When doing so, the cell ID and sample ID are merged to form the new cell ID in the merged object (e.g. “S1_RCVCMP0000830”).

When merging Seurat objects, you can provide add.cell.ids to control the sample prefixes.

# The folders the data is located in:
data_files <-
c(S1 = here("data/Sample01_human_pbmcs_unstimulated.dataset.pxl"),
S2 = here("data/Sample02_human_pbmcs_unstimulated.dataset.pxl"))

# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)

# Combine data in a merged Seurat object
pg_data_combined <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data))

# Add sample column to meta data
pg_data_combined <-
AddMetaData(pg_data_combined,
metadata = str_remove(colnames(pg_data_combined), "_.*"),
col.name = "sample")

pg_data_combined
An object of class Seurat 
80 features across 1056 samples within 1 assay
Active assay: mpxCells (80 features, 80 variable features)
2 layers present: counts.1, counts.2

Save data

After combining the samples, we can save the merged data to an RDS file to use later. Remember that a Seurat object containing MPX data is “connected” to a PXL file, so you need to share the PXL file along with the RDS file to ensure that component graphs can be loaded. If component graphs are not needed for subsequent analysis, the PXL file is no longer needed.

saveRDS(pg_data_combined, here("data/combined_data.rds"))

We have now seen how to load MPX data, inspect its key properties and prepare the data for integrated analysis across experimental samples. In the next tutorial we will apply these skills and demonstrate how to quality control MPX data.