The type of run (single-cell or bulk) is determined by whether or not the barcodes_txt file is provided. If provided, the single-cell IsoSeq pipeline, including barcode correction, will run. If not provided, the bulk IsoSeq pipeline will run.
Batch name; used for naming files
Array of HiFi reads in BAM format
FASTA file containing forward and reverse primer sequences. Used to demultiplex and refine reads.
Reference data and associated files. See the IsoSeq docs for more information.
Reference name; used to name outputs (e.g., “GRCh38”)
Reference genome and index
Annotation file for the reference genome in gtf format
CAGE peaks in BED format
Intropolis data in custom format
polyA motif list in custom format
Optional file containing fasta adapter sequences, ordered in the expected order of the adapters within the reads. If this file is provided, skera will be run first to segment the reads from the sample movie BAM. Required if the movie BAMs were generated using MAS-Seq.
Optional file containing valid whitelisted barcode sequences. If provided, the single-cell IsoSeq pipeline will be run. Otherwise, bulk IsoSeq will run.
Optional UMI/Barcode design. If not provided and the single-cell pipeline is run, the isoseq tag default tag will be used [T-8U-10B]
Backend where the workflow will be executed
Zones where compute will take place; required if backend is set to ‘AWS’ or ‘GCP’.
Queue ARN for the spot batch queue; required if backend is set to ‘AWS’ and preemptible is set to true
Queue ARN for the on demand batch queue; required if backend is set to ‘AWS’ and preemptible is set to false
If set to true, run tasks preemptibly where possible. On-demand VMs will be used only for tasks that run for >24 hours if the backend is set to GCP. If set to false, on-demand VMs will be used for every task. Ignored if backend is set to HPC.
Common outputs are produced regardless of which pipeline is run; depending on which pipeline is run, either single-cell or bulk outputs will be produced in addition to common outputs.
Metadata output from the polyA and concatemer removal step
Summary JSON output from the polyA and concatemer removal step
Report CSV output from the polyA and concatemer removal step
Reads aligned to the reference genome in BAM format
Read stats output from the transcript collapse step
Report JSON output from the transcript collapse step
Sorted gff output by pigeon
Summary file output by pigeon transcript classification
Report JSON output by pigeon transcript classification
Classification file output by pigeon transcript classification
A junctions txt file containing every junction for each isoform. Follows the SQANTI3 junction file convention.
The filter tool outputs a txt file containing the reasons an isoform was filtered. See the pigeon documentation for reasons an isoform can be filtered.
A txt file containing the read count and number of unique genes found in a subsambled number of reads
Corrected barcode report JSON
Stats for group barcodes in JSON format
Stats for group barcodes in TSV format
Files required to run tertiary analysis with Seurat.
Clustered BAM and index
Clustering report in CSV format
Reference datasets are hosted publicly for use in the pipeline.
Docker images definitions used by this workflow can be found in the wdl-dockerfiles repository. Images are hosted in PacBio’s quay.io. Docker images used in the workflow are pegged to specific versions by referring to their digests rather than tags.
The Docker image used by a particular step of the workflow can be identified by looking at the docker key in the runtime block for the given task.