Sentieon Germline Whole Genome Sequencing Analysis

Align reads to the reference genome, call variants, and calculate quality metrics using Sentieon.

Align reads to the reference genome, call variants, and calculate quality metrics using Sentieon.

This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. This workflow performs read alignment, duplicate marking, base quality score recalibration (BQSR), and variant calling steps. The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution. The workflow also computes quality metrics on the deduplicated alignments and produces various plots which can be used to quickly inspect sample quality.

The workflow can optionally output a gVCF rather than a VCF file, which can be combined with other sample gVCFs for use in joint genotyping.

This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here.

Sentieon Germline Whole Genome Sequencing Analysis
Sentieon Germline Whole Genome Sequencing Analysis

Workflow Inputs

The workflow can be run using either paired FASTQ or aligned BAM/CRAM files. If using the FASTQ entrypoint, r1_fastqr2_fastq, and read_groups must be defined. If using the BAM/CRAM entrypoint, input_aln and input_aln_idx must be defined.

InputDescription
r1_fastq

R1 fastq files

r2_fastq

R2 fastq files

read_groups

Sample read groups

input_aln

Input alignment (BAM/CRAM) files

input_aln_idx

Input alignment (BAM/CRAM) index files

reference_name

The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’)

run_dedup_and_qc

If true, mark duplicates and output QC metrics and plots. [true]

output_gvcf

Output variant calls in the gVCF format instead of VCF [false]

dnascope_model

The Sentieon DNAscope variant calling model

canonical_user_id

Your account’s AWS canonical user ID. Used to acquire a Sentieon license

sentieon_docker

Sentieon docker image

is_pcr_free

Set to true if the data is processed with a PCR-free library prep [false]

n_threads

Number of vCPUs to allocate for the task [32]

memory

Memory to allocate for the task [64 GiB]

Workflow Outputs

The workflow produces variant calles in either VCF or gVCF format will be produced. Other outputs will depend on the options selected.

OutputDescription
calls_vcf

Variant calls in VCF or gVCF format

calls_vcf_tbi

Variant calls index

Metrics and reads files

Metrics and reads files are produced if run_dedup_and_qc is set to true.

aligned_readsAligned and duplicate marked reads
aligned_indexIndex for aligned_reads
dedup_metricsFile containing the metrics data from the deduping stage
mq_metricsFile containing metrics quantifying the mean base quality score for each sequencing cycle
qd_metricsFile containing metrics quantifying the number of bases with a specific base quality score
gc_summarySummary of the GC bias in the reference and the sample
gc_metricsFile containing metrics quantifying the GC bias in the reference and the sample
as_metricsFile containing statistics about the read alignment
is_metricsFile containing metrics quantifying the statistical distribution of insert sizes
mq_plotPlot of the mq_metrics
qd_plotPlot of the gd_metrics
gc_plotPlot of the gc_metrics
is_plotPlot of the is_metrics
bqsr_table

Sample recal table output by running BQSR. Base quality score recalibration will run if no custom dnascope_model is provided.

Containers

The latest version of the Sentieon Docker image can be run by following the instructions listed here.

Top