Pixelator commands

This section provides an overview of the commands from pixelator that the pipeline executes. For detailed usage and options, please refer to the command-specific documentation.

Overview of the pipeline

The PNA pipeline consists of the following steps:

Amplicon
Do quality control checks of input reads and build amplicons
Demultiplexing
Create groups of amplicons based on their marker assignments
Molecule collapsing and error correction Derive original molecules to use as edge list downstream by error correcting, and counting input reads for each amplicon
Graph construction
Compute the components of the graph from the edge list in order to create putative cells
Denoising
Denoise the cell graphs
Analysis
Analyze the spatial information in the cell graphs
Sample calling
Split hashed from pools into per-sample data (only run for proxiome-v2 data)
Layout creation
Generate 3D graph layouts for visualization of cells

Amplicon

Command:
pixelator single-cell-pna amplicon

The Amplicon step converts raw FASTQ data into high-quality, standardized sequences based on the Proximity Network Assay (PNA) design. It identifies protein identifiers (PIDs), unique molecular identifiers (UMIs), and unique event identifiers (UEIs) from the reads.

Key Operations:

Consensus Building: Merges paired-end reads by resolving overlaps and selecting high-confidence bases.
Sequence Cleaning: Performs quality trimming and removes homopolymer artifacts.
Rigorous Filtering: Discards reads with low-complexity UMIs, excessive ambiguous bases (Ns), or LBS-UMI structural errors.

The output is a refined, compressed FASTQ file.

Demultiplexing

Command:
pixelator single-cell-pna demux

The Demultiplexing (Demux) step organizes sequencing data by antibody markers, creating appropriate sized groups for the Collapse step to work on.

Key Operations:

Barcode Correction: Matches read barcodes against a known antibody panel, allowing for minor mismatches and labeling read headers with marker names.
Partitioning Strategies: Groups reads into manageable batches.
High-Performance Storage: Uses DuckDB to sort and deduplicate batches, outputting compressed Parquet files.

This structure ensures rapid, contiguous data access for further processing by the collapse command.

Molecule collapsing and error correction

Command:
pixelator single-cell-pna collapse

The Collapse step performs deduplication and corrects likely sequencing errors from the UMIs by consolidating near-identical sequences that originated from the same molecule into a single representative record.

Key Operations:

Similarity Detection: Partitions data by antibody marker pairs and uses binary indices to find similar UMIs and UEIs based on a defined Hamming distance.
Group PCR duplicates: Constructs adjacency networks and clusters similar UMIs to resolve sequencing errors.
Representative Selection: Identifies the sequence with the highest read support within each cluster and aggregates counts.

The result is a refined Parquet dataset of unique "protein links" (edge list). This can then be used by the graph step to construct the cell graphs.

Graph construction

Command:
pixelator single-cell-pna graph

The Graph step identifies and partitions connected components within the molecular edge list to define individual putative cells.

Key Operations:

Network Construction: Builds a global graph using antibody markers as nodes and collapsed molecules as edges.
Multiplet Recovery: Resolves "mega-clusters" into distinct components (cell graphs) using community detection methods.
Edge Pruning: Employs optional edge cycle verification to remove spurious crossing edges between components.
Size Filtering: Discards components that do not meet quality thresholds to ensure robust single-cell data.

From this step and onwards, the output file are in PXL format. This is a custom format used by Pixelator to make PNA data easier to work with. Internally it used duckdb to store the data. For more information on the PXL format, please refer to the Pixelator documentation.

Denoising

Command:
pixelator single-cell-pna denoise

The Denoise step provides a comprehensive noise-reduction framework using two complementary graph-based techniques to refine the PNA components.

Key Operations:

Adaptive Core Expansion (ACE):
Prunes peripheral-like structures with low connectivity.
Partial Least Squares (PLS):
Employs Partial Least Squares (PLS) to model the relationship between local protein composition and connectivity, allowing the identification and removal of nodes correlating with low connectivity.

Identified noise nodes are collectively removed and the remaining PNA components are stabilized, resulting in a .pxl dataset with a significantly improved signal-to-noise ratio for accurate biological analysis.

Sample calling

Command:
pixelator single-cell-pna sample-calling

note

Note that this step is only run for proxiome-v2 data

When samples are stained with hashing antibodies and pooled, sample calling splits the pool back into per-sample data.

Key operations:

Define hashing panels:
Each sample in the pool is associated with a set of hashing antibodies (often three per sample).
Count per cell:
For each cell, total the hashing signal that supports each sample, e.g. sample-1: 200, sample-2: 20, sample-3: 10.
Pick the best-matching sample:
The sample with the highest total is taken as the cell’s likely origin (in the example, sample-1 with 200).
Score confidence:
Let $c$ be the set of those per-sample counts. A confidence score is $t = \max(c) / \sum(c)$ (in the example, $t = 200 / (200 + 20 + 10) \approx 0.87$ ).
Apply a threshold:
If $t$ is below a cutoff $T$ , the cell is labeled undetermined; otherwise it is assigned to the winning sample (in the example, assign to sample-1 if $t \geq T$ , else undetermined).

The sample-calling command writes one .pxl file per sample in the pool, and optionally a separate .pxl for undetermined cells.

Analysis

This step uses the pixelator single-cell-pna analysis command to calculate spatial statistics.

The Analysis step enriches cell graphs with spatial and network metrics to characterize protein co-localization and structural complexity.

Primary Analytical Modules:

Proximity Analysis: Computes statistical proximity scores between pairs of protein markers.
K-Core Analysis: Quantifies graph connectivity via k-core decomposition.

This step updates the .pxl file with a proximity score table and adds quality metrics to the cell-specific metadata.

Compute layouts for visualization

Command:
pixelator single-cell-pna layout

The Layout step generates 3D spatial coordinates for nodes in each PNA component (cell graph), transforming abstract connectivity data into a visualizable topography.

Key Operations:

wPMDS (weighted PMDS):
Projects PNA component graph data into a 3D coordinate system using weighted pivot multidimensional scaling (wPMDS). Enhances accuracy by incorporating local density weights, providing a more refined representation of spatial relationships.

The resulting coordinates are stored in the layouts table within the .pxl file, serving as the foundation for 3D visualization and structural interpretation of single-cell proximity networks.

Overview of the pipeline​

Amplicon​

Demultiplexing​

Molecule collapsing and error correction​

Graph construction​

Denoising​

Sample calling​

Analysis​

Compute layouts for visualization​

Overview of the pipeline

Amplicon

Demultiplexing

Molecule collapsing and error correction

Graph construction

Denoising

Sample calling

Analysis

Compute layouts for visualization