Analyze targeted HiFi sequence datasets using PacBio read data. Call and phase small and structural variants.
Workflow for performing target enrichment on PacBio HiFi data written in Workflow Description Language (WDL). The target enrichment workflow minimally runs demultiplexing, duplicate marking, alignment to reference, and small variant variant calling using DeepVariant, in addition to structural variant (SV) calling using pbsv. Phases and haplotags samples. Cohort analysis, QC, HS metrics, and pharmcat steps are optional.
Batch name; used for naming files
HiFi reads in BAM format.
FASTA file containing forward and reverse barcode sequences; used to demultiplex reads.
BED file specifying the coordinates of the regions of interest.
Files associated with the reference genome.
Run optional cohort analysis steps
Run optional QC steps
Low coverage cutoff for QC 
BED file specifying the coordinates for the probes used to prepare the target capture library. The same file used for target_regions_bed may be used in place of the probes_bed if you do not have access to the probes_bed. If this file is specified, the HS metrics workflow will run.
Sample size for Picard CollectHsMetrics; the sample size used for Theoretical Het Sensitivity sampling. 
Near distance cutoff for Picard CollectHsMetrics; the maximum distance between a read and the nearest probe/bait/amplicon for the read to be considered ‘near probe’ and included in the percent selected. 
Run optional pharmcat and pangu_cyp2d6 steps
VCF file and index specifying pharmact positions; required if run_pharmcat is set to true.
Minimum coverage cutoff used to filter the preprocessed VCF passed to pharmcat 
Version of deepvariant to use [1.4.0]
Optonal alternate DeepVariant model file to use
Backend where the workflow will be executed [“Azure”, “AWS”, “GCP”, “HPC”]
Zones where compute will take place; required if backend is set to ‘AWS’ or ‘GCP’.
Queue ARN for the spot batch queue; required if backend is set to ‘AWS’ and
Queue ARN for the on demand batch queue; required if backend is set to ‘AWS’ and
Container registry where workflow images are hosted. If left blank, PacBio’s public Quay.io registry will be used.
If set to true, run tasks preemptibly where possible. On-demand VMs will be used only for tasks that run for >24 hours if the backend is set to GCP. If set to false, on-demand VMs will be used for every task. Ignored if backend is set to HPC.
The set of workflow outputs will depend on which set of analyses are specified to run, determined by the set of inputs that are provided as well as whether options such as
run_pharmcat are set to
A file listing samples that failed demultiplexing.
Sample analysis are produced for each demultiplexed sample generated from the input hifi reads.
Cohort analysis outputs will be produced if the input
QC outputs will be produced if the input
HS metrics outputs will be generated if the input
PharmCAT outputs will be produced if the input
Reference datasets are hosted publicly for use in the pipeline.
Docker images definitions used by this workflow can be found in the wdl-dockerfiles repository. Images are hosted in PacBio’s quay.io. Docker images used in the workflow are pegged to specific versions by referring to their digests rather than tags.
The Docker image used by a particular step of the workflow can be identified by looking at the
docker key in the
runtime block for the given task.