Skip to main content

Quick start

nf-core/pixelator helps you go from raw sequencing data (FASTQ) to PXL output files that you can use for downstream analysis. It runs with Nextflow, which can execute the same workflow on a single server, a large HPC cluster, or in the cloud. This quickstart guide is aimed at first time users of nextflow and nf-core/pixelator.

Feeling lost?

If you are not used to working with servers and/or high performance computing systems, and parts of this guide feel difficult, consider contacting your local IT support.

If you are using a centralized HPC center, many administrators will be comfortable setting up and configuring nf-core pipelines. There are also many cluster-specific configurations available via nf-core/configs. Perhaps your cluster is among them?

If you do not have access to these resources, that’s okay — the guide below will still walk you through the basics you need to get started.

Before you start (checklist)

  • Your system is supported
    • Linux or macOS (Windows is supported via WSL2)
    • You have x86_64 CPUs (ARM systems such as Apple M series are currently not supported)
  • You have enough RAM memory
    • Plan for at least 512 GB RAM available on one or more machines for memory-heavy steps
  • You have enough disk space
    • Plan for several times the total size of your FASTQ files (the pipeline creates intermediate files while it runs)
  • Nextflow works on your machine
    • nextflow -v runs successfully
  • You have a container runtime install, any of the below should be ok
    • Docker: docker -v works
    • Apptainer: apptainer --version
    • Singularity singularity --version
  • You know where results should be written
    • You have chosen an output folder for --outdir (for example ./results)
  • You have prepared a samplesheet with the paths the files you want to process

Choose your setup

Pick the option that best matches where your data lives and how much computing power you need.

Option 1: Single server / workstation

Best when you want the simplest setup, and your dataset fits comfortably on one machine.

  • Pros
    • Quick to get started (ideal for first-time users)
    • Great for testing and smaller runs
  • Cons
    • Limited by the CPU, RAM, and disk of a single machine
    • Large datasets may run slowly due to processing happening sequentially

Go to Run on a single server.

Option 2: HPC cluster and/or cloud

Best when you need to scale up (more samples, faster turnaround) or your organization runs compute through a scheduler (e.g. slurm or PBS).

  • Pros
    • Can use many compute nodes in parallel (faster, more scalable)
    • Better for large datasets and shared infrastructure
    • Often provides fast scratch storage for the pipeline workDir
  • Cons
    • Requires extra preparation (usually a nextflow.config for executor/queues/storage)
    • More environment-specific settings (you may need help from an HPC admin)

Go to Run on HPC or cloud.

Run on a single server

This command downloads a small public test dataset and runs the pipeline end-to-end.

nextflow run nf-core/pixelator \
-profile test,docker \
--outdir "./results"

What you should see

  • Nextflow prints a banner that includes nf-core/pixelator and a pipeline version.
  • A results/ folder is created in your current directory.
  • The run finishes without errors.

If the test run fails, fix that first (container runtime, Java/Nextflow, permissions, disk space) before moving on to real data.

2) Run your own data

To run real data, you typically provide a samplesheet (a CSV file that tells the pipeline where your FASTQ files are) and choose an output directory.

nextflow run nf-core/pixelator \
-profile docker \
--input "samplesheet.csv" \
--outdir "./results"

Replace these values

  • samplesheet.csv: the path to your samplesheet file
  • ./results: where you want output files to be written
Start small

Keep your first real run small (a subset of samples) so you can validate runtime, disk usage, and output before scaling up.

Run on HPC or cloud

On a cluster or cloud environment, the pipeline usually cannot “just run” with defaults, because Nextflow needs to know things like:

  • Which scheduler/executor to use (for example Slurm or PBS)
  • Which queue/partition to submit jobs to
  • Where to put the work directory (often a fast scratch filesystem)
  • Which container runtime is allowed (often Apptainer/Singularity)

There are two common approaches.

Many institutes already have a ready-made configuration in nf-core/configs. If your cluster is listed, this is usually the fastest path.

You would then run with something like:

nextflow run nf-core/pixelator \
-profile <institution>,apptainer \
--input "samplesheet.csv" \
--outdir "./results"

Replace <institution> with the name of your cluster profile from nf-core/configs.

Option B: Create a minimal nextflow.config

If there is no existing profile (or you need custom settings), create a small config file in your run directory. Below is a minimal starting point. You will still need to fill in the correct values for your system.

/*
* Minimal Nextflow config for running nf-core/pixelator on HPC/cloud.
* Replace placeholders (<>), and ask your HPC admin if unsure.
*/

process {
executor = '<slurm|pbs|lsf|...>'
queue = '<partition_or_queue_name>'
// clusterOptions = '--account <project> --qos <qos>' // optional, system-specific
}

// Put temporary work files on fast storage if available
workDir = '<path_to_fast_scratch_workdir>'

// Choose ONE container runtime (set the one you use to true)
docker.enabled = false
apptainer.enabled = false
singularity.enabled = false

// Optional: set a default output folder (you can still override with --outdir)
// params.outdir = './results'

Then run the pipeline using your config file:

nextflow run nf-core/pixelator \
-c nextflow.config \
-profile apptainer \
--input "samplesheet.csv" \
--outdir "./results"

Common gotchas

  • Start with the test dataset on your cluster first (it’s faster to debug): -profile test,apptainer or -profile test,singularity
  • Use a workDir on a filesystem that is intended for heavy temporary I/O (often scratch). Writing files to network attached storage can be very slow and degrade the performance of the pipeline.
  • If your cluster requires an account or project, you may need to add scheduler-specific flags (ask your admin).
  • You do not need to merge your fastq files manually prior to running nf-core/pixelator. If you have multiple fastq files per samples simply add one entry per file in the samplesheet with the same sample name. The nf-core/pixelator pipeline will recognize this as data from the same sample and merge them prior to proceeding with the rest of the pipeline.

Where to go next