Glossary
Component
A component is a subset of a graph where all nodes are connected, and no node in the subset is connected to any node outside it. pixelator produces components from MPX/PNA data that correspond to the spatial networks produced on the surface of cells. Graph components are qualified as cells after pixelator has carried out cell calling.
Connectivity (average coreness)
Represented as average_k_core in your dataset, this measures how "tightly knit" the graph is. A high average coreness (>1.5) confirms the PNA graph represents an intact cell surface rather than cellular fragments or low-connectivity noise.
DNA pixels
DNA pixels and their use to map proteins in cell surfaces are a core innovation of Pixelgen Technologies. These are concatemers of DNA with predefined parts that allow MPX to be performed. For a brief introduction to MPX, check this document.
Edge
An edge is a connection between two nodes in a graph. It represents a relationship or link between the nodes, forming part of the graph's structure. In MPX/PNA data, an edge represents a spatial adjacency that is detected between two DNA pixels (MPX) or protein molecules (PNA).
Fraction of Isotypes
Represented as isotype_fraction in your dataset, is the proportion of molecules that are isotype controls. High levels indicate significant non-specific binding in a cell.
Graph Edge Saturation
This reflects the fraction of observed protein connections (edges) relative to the theoretical total. Values >40% indicate a well-sampled, reliable protein network.
Graph Layout
A graph layout is a way of arranging the nodes and edges of a graph in a visual space. The goal is typically to create a visually informative representation of the graph, minimizing clutter, overlapping elements, or ambiguity in the relationships between nodes.
In MPX and PNA data, graph layouts are used to create 3D representations of individual cells, which can be used to visually inspect the shape of the cell component, or spatial effects that are apparent in spatial statistics. There are many layout strategies and in our visualization tutorial we explore some of those.
Graph Node Saturation
This measures the fraction of observed UMIs (unique proteins) against the theoretical total. A value >60% typically indicates a robustly sampled surface proteome.
log2 ratio (Colocalization)
The ratio of observed protein-protein connections vs. what would happen by chance.
Molecule Count
Represented as n_umi in your dataset, is the total number of unique antibody molecules identified in a cell graph. This is the primary metric for determining cell size.
Molecule Rank Knee
The inflection point on the molecule rank plot where the molecule count drops sharply. Components above this "knee" are generally considered high-quality "called" cells.
Moran's I (Polarization)
A measure of spatial autocorrelation. A high score suggests a protein clusters on the cell surface rather than being spread evenly.
MPX
Molecular Pixelation (MPX) is the name of the first generation of spatial network technology powering Pixelgen Technologies' assays.
Nextflow
Nextflow is a workflow orchestration engine with a domain specific language (DSL) syntax that enables scalable and reproducible scientific workflows using software containers. It is compatible with the most common scripting languages and configurable to deploy complex parallel and reactive workflows on clouds and clusters.
nf-core
A community effort to collect a curated set of analysis pipelines using Nextflow with an open source philosophy. Nf-core pipelines adhere to strict guidelines with versioning, allowing easy reproducibility and validated releases, perfect for academic facilities. Workflow developers can use companion templates and tools to help validate pipeline code and simplify common tasks.
nf-core/pixelator
nf-core/pixelator is the name of Pixelgen Technologies' nf-core open source workflow that is highly reliable, validated in different platforms and developed with reproducibility in mind.
Node
A node is a fundamental element in a graph, representing an entity or point. Nodes are connected to each other by edges, defining the graph's framework. In MPX/PNA data, each node represents a spatial location that has been sampled on the surface of a cell.
pixelator
Our software pixelator is the library with the underlying logic for processing and analyzing MPX and PNA data, from raw FASTQ reads to PXL files. Pixelgen Technologies packages this library in software containers in order to use it as companion of nf-core/pixelator, creating highly reproducible workflows.
pixelatorR
pixelatorR is the R version of the pixelator library that can be used to analyze MPX and PNA data.
Sequencing Saturation (in MPX/PNA)
Saturation metrics validate the stability of your PNA data. Sufficient sequencing depth is required to build highly connected cell graphs, which are the foundation for accurate cell calling, protein quantification, and proximity scoring. PNA graphs reach stability at relatively low sequencing depths, and additional sequencing depth yields diminishing returns.
PNA
Proximity Network Assay (PNA) is the name of the second generation of spatial network technology powering Pixelgen Technologies' assays.
UMI
UMIs (Unique Molecular Identifiers) are tags made of unique nucleotide combinations attached to a molecule in order to provide means to uniquely identify that particular molecule before amplification (e.g. PCR) and thus reduce quantitation biases Kivioja et al., 2011. MPX and PNA use UMIs in order to uniquely count antibody molecules.
UPI
The UPI is a randomly generated nucleotide sequence, and each MPX DNA pixel concatemer includes multiple repeated copies of it. Following the MPX assay, each read contains a UPIA and a UPIB sequence, corresponding to the two rounds of pixelation. These sequences enable the mapping of the UMI to its originating pixel A and pixel B.
Valid Read Saturation
An estimate of how much of the library has been sequenced. High saturation (>20%) suggests that sequencing deeper will yield few new unique molecules.
Valid Reads
The percentage of total sequenced reads that contain recognizable Protein IDs (PIDs) and Unique Molecular Identifiers (UMIs) for the RCPs. Low values usually indicate library prep or sequencing quality issues.