Whole Genome Variant Calling

Call small and structural variants using Oxford Nanopore data.

Call small and structural variants using Oxford Nanopore data.

Workflow for calling small and structural variants using Oxford Nanopore long-reads. The workflow merges alignments, computes alignment metrics, and calls variants. Additionally, the workflow can optionally call small variants within the mitochondrial genome.

This workflow is maintained by the Broad Institute and is written in Workflow Description Language (WDL). Further documentation can be found here.

Whole Genome Variant Calling workflow diagram
Whole Genome Variant Calling workflow diagram

Workflow Inputs

The workflow is run once per sample. Each sample may have multiple associated aligned BAMs.

InputDescription
aligned_bams

Path to aligned BAM files

aligned_bais

Path to aligned BAM file indices

participant_name

Name of the participant from whom these samples were obtained

ref_map_file

Table indicating reference sequence and auxillary file locations

gcs_out_root_dir

GCS bucket to store the output reads, variants, and metrics files

bams_suspected_to_contain_dup_record

Boolean paramter to indicate if the BAM files provided are suspected to contain duplicate records

ref_scatter_interval_list_locator

A file holding paths to interval_list files; needed only when running DV-Pepper

ref_scatter_interval_list_ids

A file that gives short IDs to the interval_list files; needed only when running DV-Pepper

Workflow Outputs

The workflow outputs a set of alignment stats and variant calls output by various small and structural variant calling tools.

OutputDescription
merged_bam

Merged BAM file, comprised of the set of input `aligned_bams`

merged_bai

Merged BAM file index

aligned_num_reads

Number of aligned reads

aligned_num_bases

Number of aligned bases

aligned_frac_bases

Fractional number of aligned bases

aligned_est_fold_cov

Estimated aligned coverage

aligned_read_length_mean

Mean aligned read length

aligned_read_length_median

Median aligned read length

aligned_read_length_stdev

Aligned read length standard deviation

aligned_read_length_N50

Aligned read length N50 value

average_identity

Average identity value obtained from Nanoplot

median_identity

Median identity value obtained from Nanoplot

pbsv_vcf, pbsv_tbi

VCF file and index output by the PacBio Structural Variant (PBSV)

sniffles_vcf, sniffles_tbi

VCF file and index output by the Sniffles structural variant caller

clair_vcf, clair_tbi

VCF file and index output by the Clair deep neural network based variant caller

clair_gvcf, clair_gtbi

gVCF file and index output by Clair

dvp_vcf, dvp_tbi

VCF file and index output by DeepVariant Pepper (DVPepper)

dvp_g_vcf, dvp_g_tbi

gVCF file and index output by DVPepper

dvp_phased_vcf, dvp_phased_tbi

Phased VCF file and index output by DVPepper

Containers

Containers used by the pipeline are hosted in the Broad Institute’s public container registry, and the public biocontainers registry in quay.io.

Top