Sentieon Somatic Variant Calling

Compare tumor/normal samples and call somatic variants using Sentieon.

Compare tumor/normal samples and call somatic variants using Sentieon.

This workflow implements the Sentieon® Genomics software, a set of software tools that perform highly accurate and computationally efficient analysis of genomic data. For both tumor and normal samples the workflow performs read alignment, deduplication and BQSR. The workflow then calls somatic variants on each sample, identifying potential sites where the cancer genome data displays somatic variations relative to the normal genome, and calculates genotypes at those sites. Finally, the variants are filtered.

The workflow is designed for use with a variety of reference genomes, which are downloaded as part of workflow execution.

This workflow was developed by the Sentieon development and is written in Workflow Description Language (WDL). Further documentation can be found here. TNseq®-specific documentation may be found here.

Sentieon Somatic Variant Calling workflow diagram
Sentieon Somatic Variant Calling workflow diagram

Workflow Inputs

Both tumor and optionally normal sample reads are input. Each sample’s inputs will be separately aligned and corrected, before being combined during somatic variant calling and filtering

InputDescription
r1_fastq

R1 fastq files for the tumor sample

r2_fastq

R2 fastq files for the tumor sample

read_groups

Read groups for the tumor sample

normal_r1_fastq

R1 fastq files for the normal sample

normal_r2_fastq

R2 fastq files for the normal sample

normal_read_groups

Read groups for the normal sample

reference_name

The name of the human reference genome build. (‘hg38_alt’, ‘hg38_gatk’, ‘hg38’, ‘hg38_noalt’, ‘hs38’, ‘b37_gatk’, ‘b37’, ‘hs37d5’, ‘hg19’, ‘ucsc_hg19’)

pon_vcf

The panel of normals VCF file

pon_vcf_tbi

The panel of normals VCF index file

germline_vcf

The germline VCF file

germline_vcf_tbi

The germline VCF index file

contamination_vcf

The VCF file of germline sites for contamination detection

contamination_vcf_tbi

The VCF index file of germline sites for contamination detection

canonical_user_id

Your account’s AWS canonical user ID. Used to acquire a Sentieon license

sentieon_docker

Sentieon docker image

n_threads

Number of vCPUs to allocate for the task [32]

memory

Memory to allocate for the task [64 GiB]

Workflow Outputs

The main output of the pipeline is a VCF containing somatic variant calls. Various metrics and plots are also produced.

OutputDescription
calls_vcf

Somatic variant calls output by TNfilter

calls_vcf_tbi

Index for the calls_vcf

Metrics and reads files

Each output listed here will be produced for both the tumor sample and the normal sample, if a normal sample is provided. Normal sample outputs are produced if any normal sample fastqs are provided (i.e. if length(normal_r1_fastq) > 0); the output names for normal sample outputs will be prefixed with normal_.

aligned_readsAligned and duplicate marked reads
aligned_indexIndex for aligned_reads
dedup_metricsFile containing the metrics data from the deduping stage
dedup_metricsFile containing the metrics data from the deduping stage
qd_metricsFile containing metrics quantifying the number of bases with a specific base quality score
gc_summarySummary of the GC bias in the reference and the sample
gc_metricsFile containing metrics quantifying the GC bias in the reference and the sample
as_metricsFile containing statistics about the read alignment
is_metricsFile containing metrics quantifying the statistical distribution of insert sizes
mq_plotPlot of the mq_metrics
qd_plotPlot of the gd_metrics
gc_plotPlot of the gc_metrics
is_plotPlot of the is_metrics
bqsr_tableSample recal table output by running BQSR

Containers

The latest version of the Sentieon Docker image can be run by following the instructions listed here.

Top