Sentieon Germline Whole Genome Sequencing Analysis

Align reads to the reference genome, call variants, and calculate quality metrics using Sentieon.

Align reads to the reference genome, call variants, and calculate quality metrics using Sentieon.

This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. This workflow performs read alignment, duplicate marking, base quality score recalibration (BQSR), and variant calling steps. The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution. The workflow also computes quality metrics on the deduplicated alignments and produces various plots which can be used to quickly inspect sample quality.

The workflow can optionally output a gVCF rather than a VCF file, which can be combined with other sample gVCFs for use in joint genotyping.

This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here.

Sentieon Germline Whole Genome Sequencing Analysis
Sentieon Germline Whole Genome Sequencing Analysis

Workflow Inputs

The workflow can be run using either paired FASTQ or aligned BAM/CRAM files. If using the FASTQ entrypoint, r1_fastqr2_fastq, and read_groups must be defined. If using the BAM/CRAM entrypoint, input_aln and input_aln_idx must be defined.


R1 fastq files


R2 fastq files


Sample read groups


Input alignment (BAM/CRAM) files


Input alignment (BAM/CRAM) index files


The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’)


If true, mark duplicates and output QC metrics and plots. [true]


Output variant calls in the gVCF format instead of VCF [false]


The Sentieon DNAscope variant calling model


Your account’s AWS canonical user ID. Used to acquire a Sentieon license


Sentieon docker image


Set to true if the data is processed with a PCR-free library prep [false]


Number of vCPUs to allocate for the task [32]


Memory to allocate for the task [64 GiB]

Workflow Outputs

The workflow produces variant calles in either VCF or gVCF format will be produced. Other outputs will depend on the options selected.


Variant calls in VCF or gVCF format


Variant calls index

Metrics and reads files

Metrics and reads files are produced if run_dedup_and_qc is set to true.

aligned_readsAligned and duplicate marked reads
aligned_indexIndex for aligned_reads
dedup_metricsFile containing the metrics data from the deduping stage
mq_metricsFile containing metrics quantifying the mean base quality score for each sequencing cycle
qd_metricsFile containing metrics quantifying the number of bases with a specific base quality score
gc_summarySummary of the GC bias in the reference and the sample
gc_metricsFile containing metrics quantifying the GC bias in the reference and the sample
as_metricsFile containing statistics about the read alignment
is_metricsFile containing metrics quantifying the statistical distribution of insert sizes
mq_plotPlot of the mq_metrics
qd_plotPlot of the gd_metrics
gc_plotPlot of the gc_metrics
is_plotPlot of the is_metrics

Sample recal table output by running BQSR. Base quality score recalibration will run if no custom dnascope_model is provided.


The latest version of the Sentieon Docker image can be run by following the instructions listed here.