Skip to main content

Samplesheet

Format

The samplesheet is a CSV or TSV file formatted with a few required and some optional columns. You can export to CSV from spreadsheet programs such as Microsoft Excel, Google Sheets and LibreOffice Calc.

Here is an example of a simple samplesheet with two samples:

sample,design,panel,fastq_1,fastq_2
uropod_control,D21,human-sc-immunology-spatial-proteomics,uropod_control_S1_R1_001.fastq.gz,uropod_control_S1_R2_001.fastq.gz
uropod_stimulated,D21,human-sc-immunology-spatial-proteomics,uropod_stimulated_S1_R1_001.fastq.gz,uropod_stimulated_S1_R2_001.fastq.gz

The table below provides an overview of all possible columns in the samplesheet. The nf-core/pixelator samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 5 columns to be present.

Columns not defined in the following table are ignored by the pipeline but can be useful to add extra information for downstream processing.

ColumnRequiredDescription
sampleYesCustom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (_).
designYesThe name of the pixelator design configuration.
panel
or
panel_file
YesName of the panel to use.
or
Path to a CSV file containing a custom panel.
fastq_1YesPath to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".
fastq_2NoPath to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". Parameter only used if you are running paired-end.

The design field accepts a predefined design name.

The panel and panel_file fields are mutually exclusive. If both are specified, the pipeline will throw an error. Only one of them needs to be specified. See the panels section for more information.

The pipeline will auto-detect whether a sample is single- or paired-end based on if both fastq_1 and fastq_2 or only fastq_1 is present in the samplesheet. When single-end sequencing is used, fastq_2 field should be left empty.

Multiple runs of the same sample

The sample identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is a samplesheet for the same experimental sample sequenced across 3 lanes:

sample,design,panel,panel_file,fastq_1,fastq_2
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L001_R1_001.fastq.gz,uropod_control_S1_L001_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L002_R1_001.fastq.gz,uropod_control_S1_L002_R2_001.fastq.gz
uropod_control_1,D21,human-sc-immunology-spatial-proteomics,,uropod_control_S1_L003_R1_001.fastq.gz,uropod_control_S1_L003_R2_001.fastq.gz

Relative paths

Using relative paths in a samplesheet is supported. This make it easier to relocate data since you do not have to edit the paths to files in the samplesheet.

The default behavior is to resolve relative paths based on the directory the samplesheet file is located in.

Given following directory structure:

  • data
    • samplesheet.csv
    • fastq
      • sample1_R1.fq.gz
      • sample1_R2.fq.gz

You can use following samplesheet:

sample,design,panel,panel_file,fastq_1,fastq_2
sample1,D21,human-sc-immunology-spatial-proteomics,,fastq/sample1_R1.fq.gz,fastq/sample1_R2.fq.gz

Using the --input_basedir option you can specify a different location that will be used to resolve relative paths. This location can be a local or a remote path. On a local file system, you could use the following command from the top of that directory hierarchy tree:

nextflow run nf-core/pixelator --input data/samplesheet.csv --input_basedir data

On a remote file system, using the same samplesheet as above, but with the samplesheet on the local machine and the input data located on an AWS S3 bucket:

  • s3://my-company-data/experiment-1/fastq
    • sample1_R1.fq.gz
    • sample1_R2.fq.gz
nextflow run nf-core/pixelator --input samplesheet.csv --input_basedir s3://my-company-data/experiment-1/

Remote files

Nextflow supports a variety of remote file systems. Please refer to the Nextflow documentation for more information specific on how to specify remote paths and authenticate with the storage provider.

The most common cloud storage providers are supported and use the standard prefixes:

  • Google Cloud Storage: gs://my-bucket/some/path
  • AWS S3: s3://my-bucket/some/path
  • Azure Blob Storage: az://my-bucket/some/path

Design

The design column specifies the name of the MPX assay design configuration to use.

A list of available designs can be listed by running following command:

pixelator single-cell --list-designs

Currently, a single design is available:

  • D21

Panels

The panel file contains all information used to link antibodies barcodes to their respective targets. Panel files can be specified in two ways:

  • Using a predefined panel name to use the default build in panels.
  • Passing a csv file with a customized panel.

Predefined panels can be passed in the panel field. Custom panels can be passed in the panel_file field. Every sample should have either panel or panel_file specified.

A list of available panels can be listed by running following command:

pixelator single-cell --list-panels

Currently, a single built-in panel is available:

  • human-sc-immunology-spatial-proteomics