Skip to main content

Data Handling

This tutorial describes how to start working with the Proximity Network Assay (PNA) output data format. The primary data output file of pixelator is the PXL file. Here, you will learn how to read and interact with it. We will go through the different components of a single sample PXL file and also how to aggregate results from multiple samples. You can read more about the PXL file format here.

After completing this tutorial, you will be able to:

  • Build Seurat objects from PXL files and access the multi-modal PNA data including protein abundance data, spatial data and additional metadata.
  • Access spatial scores which quantify protein clustering and colocalization (co-clustering) patterns.
  • Access the edge list representing component (cell) graphs.
  • Aggregate multiple PNA datasets into a single Seurat object.

Setup

First, we will load some packages needed for the tutorial.

library(pixelatorR)
library(SeuratObject)
library(dplyr)
library(stringr)
library(here)
library(tibble)

Loading data

We begin by locating the output from the pixelator pipeline. The input for downstream analysis is the PXL file contained in the layout directory.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)

Run the following code in the terminal to download the dataset to your local machine.

BASEURL <- "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/pna-datasets/v1"

pxl_file <- "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl"

if (!file.exists(file.path(DATA_DIR, pxl_file))) {
download.file(paste0(BASEURL, "/", pxl_file), destfile = file.path(DATA_DIR, pxl_file), method = "curl")
}

And then proceed in R to load the PNA data into Seurat object.

# Collect the path to the PXL file
filename <- file.path(DATA_DIR, "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl")

# Load the PXL file using the read function in the pixelatorR package
pg_data <- ReadPNA_Seurat(filename)
Accessing the function documentation

If you like to know more about the functions provided in the pixelatorR package, you can access the function documentation by running for example ?ReadPNA_Seurat in the R console.

Component meta data

In PNA data, the term “component” generally corresponds to an individual cell. However, it’s important to note that in some instances, a component might represent other entities such as cell doublets, fragments of cells, or even antibody aggregates. For the sake of simplicity in these tutorials, we will primarily use “component” and “cell” as synonymous terms.

In addition to counts and spatial information, the Seurat object includes a component-level metadata table. This table contains key quality control metrics such as antibody counts (n_umi), sequencing depth (reads_in_component), and the number of detected proteins (n_antibodies).

as_tibble(pg_data[[]], rownames = "component") %>% 
head() %>%
knitr::kable()
component n_umi1 n_umi2 n_edges reads_in_component n_antibodies n_umi isotype_fraction intracellular_fraction tau_type tau disqualified_for_denoising number_of_nodes_removed_in_denoise sample antibodies average_k_core k_core_1 k_core_2 k_core_3 k_core_4 k_core_5 k_core_6 k_core_7 svd_var_expl_s1 svd_var_expl_s2 svd_var_expl_s3
f1b52a4758932fc7 13355 18311 93749 246004 133 31666 0.0003158 0 normal 0.9741447 FALSE 1343 PNA062_unstim_PBMCs_1000cells_S02_S2 133 3.377471 5773 3009 5429 8402 9053 0 0 0.2844167 0.1568093 0.0893406
eca4983191f3647f 17311 23376 105495 273174 139 40687 0.0002212 0 normal 0.9753554 FALSE 2072 PNA062_unstim_PBMCs_1000cells_S02_S2 139 2.925185 7919 5180 9614 17974 0 0 0 0.2535575 0.1929813 0.1464334
f98240c66b2e49fc 11827 14934 78611 225033 158 26761 0.0017937 0 normal 0.9468124 FALSE 1545 PNA062_unstim_PBMCs_1000cells_S02_S2 158 3.373080 4706 3152 3553 8152 7198 0 0 0.2395867 0.1878860 0.1207094
b5f8b192a0cc2603 21450 28111 97857 296160 158 49561 0.0004035 0 normal 0.9495831 FALSE 3489 PNA062_unstim_PBMCs_1000cells_S02_S2 158 2.311616 12780 12920 19498 4363 0 0 0 0.3493261 0.2372026 0.1163323
7994e60e303c6dbb 15767 20905 126994 357256 144 36672 0.0002182 0 normal 0.9520865 FALSE 1715 PNA062_unstim_PBMCs_1000cells_S02_S2 144 3.855803 5937 2389 3245 7657 14342 3102 0 0.2922636 0.1730781 0.1403489
63649c2a30940d6c 15986 21763 122143 330860 147 37749 0.0004503 0 normal 0.9778780 FALSE 1678 PNA062_unstim_PBMCs_1000cells_S02_S2 147 3.582135 6942 2916 4898 7211 15782 0 0 0.2640928 0.1447063 0.1126697

Count data

The antibody count matrix can be accessed from the Seurat object using the LayerData method.

# Display the 10 first rows and 10 first columns of the antibody counts
LayerData(pg_data, layer = "counts")[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
                                                          
HLA-ABC 1643 1719 1029 1984 2065 2326 1258 7424 1171 1043
B2M 3402 3587 1922 4153 4303 5978 2091 10599 2057 3080
CD11b 1 . 13 74 2 17 6 5 13 1
CD11c 3 1 7 287 6 22 3 7 8 1
CD18 118 281 124 547 97 152 232 26 168 147
CD82 376 781 45 307 40 1124 17 912 14 503
CD8 8 4 2862 3564 4240 20 45 15 315 6704
TCRab 161 322 91 236 163 41 2 11 . 105
HLA-DR 1 2 10 70 13 489 9 20 5 3
CD45 2141 3939 1575 2454 2096 1235 1855 80 1147 2206

Spatial data

Beyond protein abundance, the Proximity Network Assay (PNA) generates spatial data that enables the analysis of protein proximity patterns both within and between cells. Moreover, by leveraging graph drawing methods, it is even possible to create 3D layouts of the cells for visualization.

The PXL file includes a table of proximity scores, quantifying the spatial clustering of individual proteins and the co-localization of protein pairs on each cell.

pixelatorR Assay classes

PixelatorR introduces two important Seurat extensions to store PNA data: PNAAssay (based on Assay from Seurat v3) and PNAAssay5 (based on Assay5 from Seurat v5). These two classes are used to store abundance data, just like an Assay or Assay5, but also spatial metrics and CellGraph objects. When a Seurat object is created from a PXL file with ReadPNA_Seurat, the data is automatically converted into a PNAAssay5 object named “PNA” which can be accessed as follows:

pg_data[["PNA"]]
PNAAssay (v5) data with 158 features for 1083 cells
Top 10 variable features:
HLA-ABC, B2M, CD11b, CD11c, CD18, CD82, CD8, TCRab, HLA-DR, CD45
Layers:
counts
Loaded CellGraph objects:
0

In the output above, we see that there are 158 proteins and 1084 cells available and that the object currently contains 0 loaded CellGraph objects. More about that later.

PNA Proximity scores

The spatial metrics can be accessed from the PNAAssay5 object with:

ProximityScores(pg_data) %>% 
head() %>%
knitr::kable()
marker_1 marker_2 join_count join_count_expected_mean join_count_expected_sd join_count_z join_count_p component log2_ratio
CD33 CD33 0 0.25 0.5198096 -0.25000 0.4012937 91984005700471b2 0.0000000
CD33 CD59 5 3.02 1.9590763 1.01068 0.1560847 91984005700471b2 0.7273795
CD33 CD45RB 0 0.08 0.2726599 -0.08000 0.4681186 91984005700471b2 0.0000000
CD33 KLRG1 1 0.14 0.3487351 0.86000 0.1948945 91984005700471b2 0.0000000
CD33 VISTA 0 0.20 0.5124707 -0.20000 0.4207403 91984005700471b2 0.0000000
CD33 CD62P 0 0.16 0.4197161 -0.16000 0.4364405 91984005700471b2 0.0000000

Edgelist

The edgelist table lists linked protein pairs. Each row represents an edge, identified by two integer-encoded UMI barcodes (UMI1 and UMI2). Each UMI uniquely maps to a single protein (UMI1 to marker_1, UMI2 to marker_2).

This edgelist is a form of graph representation where UMI1 and UMI2 represent distinct sets of nodes (vertices), and a pair of UMI1/UMI2 denote edges (links) between them. This bipartite structure, inherent to how relationships are formed in the Proximity Network Assay, means a UMI1 barcode can only connect to a UMI2 barcode, and vice versa. In more technical terms, a PNA component graph is a simple, undirected bipartite graph. Given that edges typically form between spatially proximal proteins, the PNA graph is also a spatial graph.

Importantly, the edgelist is organized by component (individual cell), with no edges spanning across components. While a PXL file’s edgelist can be large (e.g., over 100 million rows in our sample), making direct manipulation cumbersome, pixelatorR offers tools to convert these component edgelists into more manageable graph representations for analysis.

Using the ReadPNA_edgelist function, we can for example lazy load the edgelist from the PXL file without loading it into memory. The resulting object is a tbl_lazy object, which can be manipulated with most dplyr verbs (thanks to the dbplyr R package) and materialized in memory when needed with collect.

edge_list <- ReadPNA_edgelist(filename)
edge_list %>%
head() %>%
collect() %>%
knitr::kable()
marker_1 marker_2 umi1 umi2 read_count uei_count corrected_read_count component
CD29 CD43 555638967335511 28226901571890329 9 2 0 8cd2f4585f9102dd
B2M CD44 1592993435260922 33899519959851077 3 2 0 8cd2f4585f9102dd
B2M CD5 1772702303642965 23228591926107523 6 2 1 8cd2f4585f9102dd
CD45 CD45 1905030979812053 14976911110869850 3 2 0 8cd2f4585f9102dd
CD82 CD82 1927887423101473 8679240914900276 1 1 0 8cd2f4585f9102dd
CD11a CD43 2083036410634253 10588034069358815 2 1 0 8cd2f4585f9102dd

Seurat objects and PXL files

For some analyses, we will need to access information from the edgelist. For instance, if we want to load component graphs for visualization, we fetch the relevant component edges. It is important to note that the edgelist is not stored in the Seurat object, it’s only available in the PXL file. The PNAAssay5, mentioned earlier in this tutorial, stores the path(s) to the PXL file(s) from which the Seurat object was built from so that we can access the edgelist when needed. This has the advantage that we can keep the size of the Seurat object relatively small until we decide to load the graphs. On the down side, it also means that the path(s) to the PXL file(s) need to remain unchanged. If the PXL file(s) are moved or renamed, we can no longer access them.

This has an important implication when saving and sharing Seurat objects containing PNA data. If you want to export a Seurat object as an RDS file and share it with someone else, you should also share associated the PXL file(s). In addition, the path(s) to the PXL file(s) has to be updated in the Seurat object. If you want to see the PXL file path(s) or update them, you can use the FSMap method (see ?FSMap for more details):

# Get the path to the PXL file
FSMap(pg_data)
# A tibble: 1 × 3
id_map sample pxl_file
<list> <int> <chr>
1 <tibble [1,083 × 2]> 1 data/PNA062_unstim_PBMCs_1000cells_S02_S2.layout.…

If this is unclear at the moment, don’t worry. We will come back to this topic in the tutorial about cell visualization.

Aggregating data

For cases when we want to process and analyze multiple PNA datasets, we can load the corresponding PXL files as Seurat objects and merge them. The cell IDs should be unique across different samples, but you can provide a unique prefix to each sample by passing a character vector to the add.cell.ids parameter in merge.

In the code chunk below, we will create a merged object from two PXL files. These PXL files represent a resting and PHA stimulated PBMC sample. We will create a merged object Seurat and update the object’s metadata to keep information abiut the condition. The resulting object will be explored further in the next tutorial on Quality Control.

BASEURL <- "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/pna-datasets/v1"

pxl_files <- c("PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl", "PNA062_PHA_PBMCs_1000cells_S04_S4.layout.pxl")

for (f in pxl_files) {
if (!file.exists(file.path(DATA_DIR, f))) {
download.file(paste0(BASEURL, "/", f), destfile = file.path(DATA_DIR, f), method = "curl")
}
}
# The folders the data is located in:
data_files <- c(
# Resting
resting = file.path(DATA_DIR, "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl"),
# PHA
stimulated = file.path(DATA_DIR, "PNA062_PHA_PBMCs_1000cells_S04_S4.layout.pxl")
)

# Read PXL files as a list of Seurat objects
obj_list <- lapply(data_files, ReadPNA_Seurat)

# Combine files into a single Seurat object
pg_data_combined <- merge(obj_list[[1]], y = obj_list[-1], add.cell.ids = names(obj_list))

# Add meta data describing the sample and donor origin
pg_data_combined <- AddMetaData(
pg_data_combined,
metadata = data.frame(id = colnames(pg_data_combined)) %>%
mutate(condition = str_remove(id, "_.*")) %>%
column_to_rownames("id")
)
pg_data_combined
An object of class Seurat 
158 features across 2137 samples within 1 assay
Active assay: PNA (158 features, 158 variable features)
2 layers present: counts.1, counts.2

As you can see in the object description above, the count data in the merged Seurat object is separated in multiple matrices (“counts.1”, “counts.2”, …). Our PNA data is stored in a PNAAssay5 object which extends the Assay5 object introduced in Seurat v5 which keeps the sample count matrices separated after merging. For many downstream applications, one needs a single merged count matrix.

One option is to set options(Seurat.object.assay.version = "v3") and rerun this tutorial in which case the PNA data will be stored in a PNAAssay (Seurat v3) instead. Alternatively, you can join the data using the JoinLayers method. After joining, you should see that there’s now only 1 layer present.

pg_data_combined <- JoinLayers(pg_data_combined)
pg_data_combined
An object of class Seurat 
158 features across 2137 samples within 1 assay
Active assay: PNA (158 features, 158 variable features)
1 layer present: counts

Save data

After combining the samples, we can save the merged data to an RDS file to use for later. Remember that a Seurat object containing PNA data is associated with one or several a PXL file(s), so you need to share the PXL file(s) along with the RDS file to ensure that all pixelatorR features can be used by the recipient. If the component graphs are not needed for subsequent analysis, the PXL file(s) are not needed.

# This is an optional pause step; if you prefer to continue to the next tutorial without saving the object, that will also work.
saveRDS(pg_data_combined, file.path(DATA_DIR, "combined_data.rds"))

We have now learned how to load multiple PNA datasets, inspect its key properties and prepare the data for downstream analysis. In the next tutorial we will apply these skills and demonstrate how to quality control PNA data.