DRAGEN – Variant Calling and Filtering

DRAGEN functional equivalence discovery and filtering of SNPs and indels.

DRAGEN functional equivalence discovery and filtering of SNPs and indels.

This workflow uses the GATK HaplotypeCaller for SNP and indel discovery according to the GATK Best Practices. When the workflow runs in DRAGEN mode, it produces a Dragstr model that is used during variant calling, and it performs hard filtering.

This workflow is maintained by the Broad Institute and is written in Workflow Description Language (WDL). Further documentation can be found here.

DRAGEN Variant Calling and Filtering workflow diagram
DRAGEN Variant Calling and Filtering workflow diagram

Workflow Inputs

The workflow requires sample and reference information. The user may decide whether or not to run the pipeline in DRAGEN functional equivalence mode by setting the value of the run_dragen_mode_variant_calling input.

The Broad Institute provides various test inputs hosted in GCP that can be used to run the pipeline.

InputDescription
calling_interval_list

Interval list used for variant calling

evaluation_interval_list

File containing the target set of genomic intervals

haplotype_scatter_count

Scatter count used for variant calling

break_bands_at_multiples_of

Breaks reference bands up at genomic positions that are multiples of this number; used to reduce (g)VCF file size

input_bam

Input BAM

input_bam_index

Input BAM index

ref_fasta

Reference fasta

ref_fasta_index

Reference fasta index

ref_dict

Reference dictionary

dbsnp_vcf

dbSNP VCF file

dbsnp_vcf_index

dbSNP VCF file index

base_file_name

String used for output files; can be set to a read group ID

final_vcf_base_name

Base name for the output (g)VCF file; can be set to a read group ID

agg_preemptible_tries

Number of preemtible machine tries

Optional inputs

Set run_dragen_mode_variant_calling to true and use_spanning_event_genotyping to false to run in DRAGEN functional equivalence mode.

run_dragen_mode_variant_callingBoolean used to indicate that DRAGEN mode should be used for variant calling; must be true to compose DRAGstr model and perform variant calling with HaplotypeCaller in dragen-mode [false]
use_spanning_event_genotypingBoolean used to call the HaplotypeCaller –disable-spanning-event-genotyping parameter; default set to true so that variant calling includes spanning events. Set to false to run the DRAGEN pipeline [true]
contaminationOptional estimated contamination for the variant calling tasks
ref_strOptional STR location table used for DragSTR model auto-calibration; see ComposeSTRTableFile for more information
make_gvcfBoolean that indicates if workflow should produce a GVCF or VCF file [true]
make_bamoutBoolean that indicates if workflow should sort and gather the BAM files into one file [false]
use_gatk3_haplotype_callerUse the GATK3.5 HaplotypeCalller [false]
skip_reblockingSkip reblocking (read more in the Reblocking section) [false]
use_dragen_hard_filteringBoolean that indicates if workflow should perform hard filtering using the GATK VariantFiltration tool with the –filter-name “DRAGENHardQUAL” [false]

Workflow Outputs

The key outputs of the workflow are variant calls in either VCF or gVCF format. Various metrics and associated files are also produced.

OutputDescription
CollectVariantCallingMetrics

Outputs from calling variants with HaplotypeCaller.

vcf_summary_metrics(g)VCF summary metrics
vcf_detail_metrics(g)VCF detail metrics
output_vcfFinal (g)VCF
output_vcf_indexIndex for the final (g)VCF
bamoutOptional corrected and merged BAM if make_bamout is true
bamout_indexOptional index for the corrected and merged BAM if make_bamout is true
MergeBamOuts

Output from sorting and merging the BAM files, then correcting the merged BAM file.

output_bamOutput aligned recalibrated BAM if provided_output_bam is true
output_bam_indexOptional index for the aligned recalibrated BAM if provided_output_bam is true

References

Reference data hosted in GCP may be found here.

Containers

Containers used by the pipeline are hosted in the Broad Institute’s public container registry, and the public biocontainers registry in quay.io.

Top