Sentieon Long Read Germline Whole Genome Sequencing Analysis

Perform alignment and variant calling for SNPs, small indels, and structural variants using Sentieon.

This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. This workflow performs read alignment, SNP and small indel, and structural variant calling steps on long read data. The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution.

The workflow outputs both small and structural variant calls.

This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here.

Sentieon Long Read Germline Whole Genome Sequencing Analysis workflow diagram

Workflow Inputs

The long read pipeline is capable of processing either PacBio or Oxford Nanopore long reads. Either read type may be used as the fastq input.

Input	Description
`fastq`	Long read FASTQ files, generated by either PacBio or Oxford Nanopore sequencers
`read_groups`	Sample read groups
`reference_name`	The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’)
`dnascope_lr_model`	The DNAscope LongRead model file. If provided, small variant calls will be output
`longreadsv_model`	The LongReadSV model file
`canonical_user_id`	Your account’s AWS canonical user ID. Used to acquire a Sentieon license
`sentieon_docker`	Sentieon docker image
`n_threads`	Number of vCPUs to allocate for the task [32]
`memory`	Memory to allocate for the task [64 GiB]

Workflow Outputs

The Sentieon long read germline pipeline outputs aligned reads and a long read structural variant VCF. If a DNAscope long read model is provided, it will also output a VCF containing SNPs and small indels.

Output	Description
`aligned_reads`	Aligned and duplicate marked reads
`aligned_index`	Index for `aligned_reads`
`sv_vcf`	Long read structural variant VCF
`sv_vcf_tbi`	Index for `sv_vcf`
`calls_vcf`	SNP and small indel call VCF; output if `dnascope_lr_model` is provided
`calls_vcf_tbi`	Index for `calls_vcf`; output if `dnascope_lr_model` is provided

Containers

The latest version of the Sentieon Docker image can be run by following the instructions listed here.