Data Handling

This tutorial describes how to start working with the Proximity Network Assay (PNA) output data format. The primary data output file of pixelator is the PXL file. Here, you will learn how to read and interact with it. We will go through the different components of a single sample PXL file and also how to aggregate results from multiple samples. You can read more about the PXL file format here.

After completing this tutorial, you will be able to:

Build Seurat objects from PXL files and access the multi-modal PNA data including protein abundance data, spatial data and additional metadata.
Access spatial scores which quantify protein clustering and colocalization (co-clustering) patterns.
Access the edge list representing component (cell) graphs.
Aggregate multiple PNA datasets into a single Seurat object.

Setup

First, we will load some packages needed for the tutorial.

library(pixelatorR)
library(SeuratObject)
library(dplyr)
library(stringr)
library(here)
library(tibble)

Loading data

We begin by locating the output from the pixelator pipeline. The input for downstream analysis is the PXL file contained in the layout directory.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)

Run the following code in the terminal to download the dataset to your local machine.

BASEURL <- "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/pna-datasets/v1"

pxl_file <- "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl"

if (!file.exists(file.path(DATA_DIR, pxl_file))) {
  download.file(paste0(BASEURL, "/", pxl_file), destfile = file.path(DATA_DIR, pxl_file), method = "curl")
}

And then proceed in R to load the PNA data into Seurat object.

# Collect the path to the PXL file
filename <- file.path(DATA_DIR, "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl")

# Load the PXL file using the read function in the pixelatorR package
pg_data <- ReadPNA_Seurat(filename)

Accessing the function documentation

If you like to know more about the functions provided in the pixelatorR package, you can access the function documentation by running for example ?ReadPNA_Seurat in the R console.

Component meta data

In PNA data, the term “component” generally corresponds to an individual cell. However, it’s important to note that in some instances, a component might represent other entities such as cell doublets, fragments of cells, or even antibody aggregates. For the sake of simplicity in these tutorials, we will primarily use “component” and “cell” as synonymous terms.

In addition to counts and spatial information, the Seurat object includes a component-level metadata table. This table contains key quality control metrics such as antibody counts (n_umi), sequencing depth (reads_in_component), and the number of detected proteins (n_antibodies).

as_tibble(pg_data[[]], rownames = "component") %>% 
  head() %>% 
  knitr::kable()

component	n_umi1	n_umi2	n_edges	reads_in_component	n_antibodies	n_umi	isotype_fraction	tau_type	tau	disqualified_for_denoising	number_of_nodes_removed_in_denoise	sample	antibodies	average_k_core	k_core_1	k_core_2	k_core_3	k_core_4	k_core_5	k_core_6	svd_var_expl_s1	svd_var_expl_s2	svd_var_expl_s3
f1b52a4758932fc7	13355	18311	93749	246004	133	31666	0.0003158	normal	0.9741447	FALSE	1343	PNA062_unstim_PBMCs_1000cells_S02_S2	133	3.377471	5773	3009	5429	8402	9053	0	0.2844167	0.1568093	0.0893406
eca4983191f3647f	17311	23376	105495	273174	139	40687	0.0002212	normal	0.9753554	FALSE	2072	PNA062_unstim_PBMCs_1000cells_S02_S2	139	2.925185	7919	5180	9614	17974	0	0	0.2535575	0.1929813	0.1464334
f98240c66b2e49fc	11827	14934	78611	225033	158	26761	0.0017937	normal	0.9468124	FALSE	1545	PNA062_unstim_PBMCs_1000cells_S02_S2	158	3.373080	4706	3152	3553	8152	7198	0	0.2395867	0.1878860	0.1207094
b5f8b192a0cc2603	21450	28111	97857	296160	158	49561	0.0004035	normal	0.9495831	FALSE	3489	PNA062_unstim_PBMCs_1000cells_S02_S2	158	2.311616	12780	12920	19498	4363	0	0	0.3493261	0.2372026	0.1163323
7994e60e303c6dbb	15767	20905	126994	357256	144	36672	0.0002182	normal	0.9520865	FALSE	1715	PNA062_unstim_PBMCs_1000cells_S02_S2	144	3.855803	5937	2389	3245	7657	14342	3102	0.2922636	0.1730781	0.1403489
63649c2a30940d6c	15986	21763	122143	330860	147	37749	0.0004503	normal	0.9778780	FALSE	1678	PNA062_unstim_PBMCs_1000cells_S02_S2	147	3.582135	6942	2916	4898	7211	15782	0	0.2640928	0.1447063	0.1126697

Count data

The antibody count matrix can be accessed from the Seurat object using the LayerData method.

# Display the 10 first rows and 10 first columns of the antibody counts
LayerData(pg_data, layer = "counts")[1:10, 1:10]

10 x 10 sparse Matrix of class "dgCMatrix"

                                                          
HLA-ABC 1643 1719 1029 1984 2065 2326 1258  7424 1171 1043
B2M     3402 3587 1922 4153 4303 5978 2091 10599 2057 3080
CD11b      1    .   13   74    2   17    6     5   13    1
CD11c      3    1    7  287    6   22    3     7    8    1
CD18     118  281  124  547   97  152  232    26  168  147
CD82     376  781   45  307   40 1124   17   912   14  503
CD8        8    4 2862 3564 4240   20   45    15  315 6704
TCRab    161  322   91  236  163   41    2    11    .  105
HLA-DR     1    2   10   70   13  489    9    20    5    3
CD45    2141 3939 1575 2454 2096 1235 1855    80 1147 2206

Spatial data

Beyond protein abundance, the Proximity Network Assay (PNA) generates spatial data that enables the analysis of protein proximity patterns both within and between cells. Moreover, by leveraging graph drawing methods, it is even possible to create 3D layouts of the cells for visualization.

The PXL file includes a table of proximity scores, quantifying the spatial clustering of individual proteins and the co-localization of protein pairs on each cell.

pixelatorR Assay classes

PixelatorR introduces two important Seurat extensions to store PNA data: PNAAssay (based on Assay from Seurat v3) and PNAAssay5 (based on Assay5 from Seurat v5). These two classes are used to store abundance data, just like an Assay or Assay5, but also spatial metrics and CellGraph objects. When a Seurat object is created from a PXL file with ReadPNA_Seurat, the data is automatically converted into a PNAAssay5 object named “PNA” which can be accessed as follows:

pg_data[["PNA"]]

PNAAssay (v5) data with 158 features for 1083 cells
Top 10 variable features:
 HLA-ABC, B2M, CD11b, CD11c, CD18, CD82, CD8, TCRab, HLA-DR, CD45 
Layers:
 counts 
Loaded CellGraph objects:
 0

In the output above, we see that there are 158 proteins and 1084 cells available and that the object currently contains 0 loaded CellGraph objects. More about that later.

PNA Proximity scores

The spatial metrics can be accessed from the PNAAssay5 object with:

ProximityScores(pg_data) %>% 
  head() %>% 
  knitr::kable()

marker_1	marker_2	join_count	join_count_expected_mean	join_count_expected_sd	join_count_z	join_count_p	component	log2_ratio
CD33	CD33	0	0.25	0.5198096	-0.25000	0.4012937	91984005700471b2	0.0000000
CD33	CD59	5	3.02	1.9590763	1.01068	0.1560847	91984005700471b2	0.7273795
CD33	CD45RB	0	0.08	0.2726599	-0.08000	0.4681186	91984005700471b2	0.0000000
CD33	KLRG1	1	0.14	0.3487351	0.86000	0.1948945	91984005700471b2	0.0000000
CD33	VISTA	0	0.20	0.5124707	-0.20000	0.4207403	91984005700471b2	0.0000000
CD33	CD62P	0	0.16	0.4197161	-0.16000	0.4364405	91984005700471b2	0.0000000

Edgelist

The edgelist table lists linked protein pairs. Each row represents an edge, identified by two integer-encoded UMI barcodes (UMI1 and UMI2). Each UMI uniquely maps to a single protein (UMI1 to marker_1, UMI2 to marker_2).

This edgelist is a form of graph representation where UMI1 and UMI2 represent distinct sets of nodes (vertices), and a pair of UMI1/UMI2 denote edges (links) between them. This bipartite structure, inherent to how relationships are formed in the Proximity Network Assay, means a UMI1 barcode can only connect to a UMI2 barcode, and vice versa. In more technical terms, a PNA component graph is a simple, undirected bipartite graph. Given that edges typically form between spatially proximal proteins, the PNA graph is also a spatial graph.

Importantly, the edgelist is organized by component (individual cell), with no edges spanning across components. While a PXL file’s edgelist can be large (e.g., over 100 million rows in our sample), making direct manipulation cumbersome, pixelatorR offers tools to convert these component edgelists into more manageable graph representations for analysis.

Using the ReadPNA_edgelist function, we can for example lazy load the edgelist from the PXL file without loading it into memory. The resulting object is a tbl_lazy object, which can be manipulated with most dplyr verbs (thanks to the dbplyr R package) and materialized in memory when needed with collect.

edge_list <- ReadPNA_edgelist(filename)
edge_list %>% 
  head() %>% 
  collect() %>% 
  knitr::kable()

marker_1	marker_2	umi1	umi2	read_count	uei_count	corrected_read_count	component
CD29	CD43	555638967335511	28226901571890329	9	2	0	8cd2f4585f9102dd
B2M	CD44	1592993435260922	33899519959851077	3	2	0	8cd2f4585f9102dd
B2M	CD5	1772702303642965	23228591926107523	6	2	1	8cd2f4585f9102dd
CD45	CD45	1905030979812053	14976911110869850	3	2	0	8cd2f4585f9102dd
CD82	CD82	1927887423101473	8679240914900276	1	1	0	8cd2f4585f9102dd
CD11a	CD43	2083036410634253	10588034069358815	2	1	0	8cd2f4585f9102dd

Seurat objects and PXL files

For some analyses, we will need to access information from the edgelist. For instance, if we want to load component graphs for visualization, we fetch the relevant component edges. It is important to note that the edgelist is not stored in the Seurat object, it’s only available in the PXL file. The PNAAssay5, mentioned earlier in this tutorial, stores the path(s) to the PXL file(s) from which the Seurat object was built from so that we can access the edgelist when needed. This has the advantage that we can keep the size of the Seurat object relatively small until we decide to load the graphs. On the down side, it also means that the path(s) to the PXL file(s) need to remain unchanged. If the PXL file(s) are moved or renamed, we can no longer access them.

This has an important implication when saving and sharing Seurat objects containing PNA data. If you want to export a Seurat object as an RDS file and share it with someone else, you should also share associated the PXL file(s). In addition, the path(s) to the PXL file(s) has to be updated in the Seurat object. If you want to see the PXL file path(s) or update them, you can use the FSMap method (see ?FSMap for more details):

# Get the path to the PXL file
FSMap(pg_data)

# A tibble: 1 × 3
  id_map               sample pxl_file                                          
  <list>                <int> <chr>                                             
1 <tibble [1,083 × 2]>      1 data/PNA062_unstim_PBMCs_1000cells_S02_S2.layout.…

If this is unclear at the moment, don’t worry. We will come back to this topic in the tutorial about cell visualization.

Aggregating data

For cases when we want to process and analyze multiple PNA datasets, we can load the corresponding PXL files as Seurat objects and merge them. The cell IDs should be unique across different samples, but you can provide a unique prefix to each sample by passing a character vector to the add.cell.ids parameter in merge.

In the code chunk below, we will create a merged object from two PXL files. These PXL files represent a resting and PHA stimulated PBMC sample. We will create a merged object Seurat and update the object’s metadata to keep information abiut the condition. The resulting object will be explored further in the next tutorial on Quality Control.

BASEURL <- "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/pna-datasets/v1"

pxl_files <- c("PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl", "PNA062_PHA_PBMCs_1000cells_S04_S4.layout.pxl")

for (f in pxl_files) {
  if (!file.exists(file.path(DATA_DIR, f))) {
    download.file(paste0(BASEURL, "/", f), destfile = file.path(DATA_DIR, f), method = "curl")
  }
}

# The folders the data is located in:
data_files <- c(
  # Resting
  resting = file.path(DATA_DIR, "PNA062_unstim_PBMCs_1000cells_S02_S2.layout.pxl"),
  # PHA
  stimulated = file.path(DATA_DIR, "PNA062_PHA_PBMCs_1000cells_S04_S4.layout.pxl")
)

# Read PXL files as a list of Seurat objects
obj_list <- lapply(data_files, ReadPNA_Seurat)

# Combine files into a single Seurat object
pg_data_combined <- merge(obj_list[[1]],  y = obj_list[-1], add.cell.ids = names(obj_list)) 

# Add meta data describing the sample and donor origin
pg_data_combined <- AddMetaData(
  pg_data_combined, 
  metadata = data.frame(id = colnames(pg_data_combined)) %>% 
    mutate(condition = str_remove(id, "_.*")) %>% 
    column_to_rownames("id")
  )
pg_data_combined

An object of class Seurat 
158 features across 2137 samples within 1 assay 
Active assay: PNA (158 features, 158 variable features)
 2 layers present: counts.1, counts.2

As you can see in the object description above, the count data in the merged Seurat object is separated in multiple matrices (“counts.1”, “counts.2”, …). Our PNA data is stored in a PNAAssay5 object which extends the Assay5 object introduced in Seurat v5 which keeps the sample count matrices separated after merging. For many downstream applications, one needs a single merged count matrix.

One option is to set options(Seurat.object.assay.version = "v3") and rerun this tutorial in which case the PNA data will be stored in a PNAAssay (Seurat v3) instead. Alternatively, you can join the data using the JoinLayers method. After joining, you should see that there’s now only 1 layer present.

pg_data_combined <- JoinLayers(pg_data_combined)
pg_data_combined

An object of class Seurat 
158 features across 2137 samples within 1 assay 
Active assay: PNA (158 features, 158 variable features)
 1 layer present: counts

Save data

After combining the samples, we can save the merged data to an RDS file to use for later. Remember that a Seurat object containing PNA data is associated with one or several a PXL file(s), so you need to share the PXL file(s) along with the RDS file to ensure that all pixelatorR features can be used by the recipient. If the component graphs are not needed for subsequent analysis, the PXL file(s) are not needed.

# This is an optional pause step; if you prefer to continue to the next tutorial without saving the object, that will also work.
saveRDS(pg_data_combined, file.path(DATA_DIR, "combined_data.rds"))

We have now learned how to load multiple PNA datasets, inspect its key properties and prepare the data for downstream analysis. In the next tutorial we will apply these skills and demonstrate how to quality control PNA data.

Data Handling

Setup​

Loading data​

Component meta data​

Count data​

Spatial data​

pixelatorR Assay classes​

PNA Proximity scores​

Edgelist​

Seurat objects and PXL files​

Aggregating data​

Save data​