Skip to content

Examples

Chris edited this page Aug 5, 2019 · 11 revisions

Examples

Note_1: In all the following examples, you must replace the values for each option with your own value(s); These are only examples and do not represent example you can run.
Note_2: Example_1 is the MAIN CORE command line users should use in most cases; The other Example are for a more advanced use or for users who want to run the preparation of vcf and the merging step on their own

Example_1

This Example represent the CORE example and easiest way to merge VCFs using vcfMerger2.

vcfMerger2.py    
-d DirTempFiles
-g hs37d5.fa  
--toolnames "strelka2|mutect2|lancet|octopus" 
--vcfs "./raw_tool_vcfs/strelka2.raw.vcf|./raw_tool_vcfs/mutect2.raw.vcf|./raw_tool_vcfs/lancet.raw.vcf|./raw_tool_vcfs/octopus.raw.vcf" 
--prep-outfilenames "strelka2.prepped.vcf|mutect2.prepped.vcf|lancet.prepped.vcf|octopus.prepped.vcf"
--normal-sname NORMAL_SAMPLENAME  
--tumor-sname TUMOR_SAMPLENAME 
-o merged.vcf 

WARNING regarding Example_1:

  • Only RAW VCFs can be used with that command; You CANNOT use vcfMerger2-Ready VCFs or use a mix of RAW and vcfMerger2-Ready VCFs; If you do so, vcfMerger2 will fail.




The following commands and examples are given as examples for an advanced use of vcfMerger2

Example_2
vcfMerger2.py
-d ${DIROUT}    
-g ${REF_GENOME_FASTA_FILE}  
--toolnames "strelka2|mutect2|lancet|octopus" 
--vcfs "strelka2.vcfMerger2-prepped.vcf|mutect2.vcfMerger2-prepped.vcf|lancet.vcfMerger2-prepped.vcf|octopus.vcfMerger2-prepped.vcf" 
--normal-sname NORMAL_SAMPLENAME  
--tumor-sname TUMOR_SAMPLENAME
-o merged.vcf
--skip-prep-vcfs

This example is used in the case you already have vcfMerger2-prepped VCFs to merge and therefore want to skip the vcfMerger2-embedded prep stage; hence the use of the option --skip-prep.


The most important feature in the Example_2 command above is the type of input vcf that must be used. "vcfMerger2-prepped" ( or aka vcfMerger2-prepped-ready" or aka "vcfMerger2-ready") is the most important term here (see Glossary). This means all the inputs VCFs must be to specs before merging.
This implies:

  • vcfs must already be to VCF specifications version v4.2 or later
  • vcf must be decomposed and have only one variant per line; Multi-Allelic Variants must be Decomposed.
  • vcf must contain the contigs information in the header (same list of contig in each vcf is best)
  • vcfs must have the following fields/flags present in the FORMAT columns:
    • GT     (GenoType must be of format 0/0, 0/1, or 1/0 as we have decomposed the RAW VCF)
    • DP     (Total DePth represented by 1 integer value)
    • AD     (Allele Depth for reference REF and first ALT allele)
    • AR     (Allele Ratio of the first ALT allele)

How to only prepare the vcf files using vcfMerger2 ? [that means No Merging]

Users can prepare tool-specific vcf by only running the sub-step of vcfMerger2 called prep_vcf independently. The script called prep_vcf.sh allows you to specifically bring up to vcfMerger2-specs vcfs for the supported tools. This script can be called directly and run independently for each vcf and tool.

Example_3:
prep_vcf.sh
-d DIR_PREPPED_VCFs
--toolname lancet 
--normal-sname MySampleNameForNormal 
--tumor-sname sampleNameForTumorSample 
--vcf ./raw_tool_vcfs/lancet.raw.vcf 
-g ./ref_genome/grch37.22.fa 
-o lancet.prepped.vcf 
Example_4:
prep_vcf.sh 
--toolname lancet 
--normal-sname NORMAL 
--tumor-sname TUMOR 
--vcf raw_tool_vcfs/lancet.raw.vcf 
-g ref_genome/grch37.22.fa 
-o lancet.prepped.vcf 
--contigs-file ./contigs/contigs/txt
--bam ${TUMOR_BAM_FILE}

Note: --contigs-file allows you to add contigs to the vcf if contigs are missing in the header of the RAW vcf ; Normally new variants callers have already added the correct contig line in the vcf header

Example_5 (ALPHA VERSION: Germline calls instead of somatic ; adding acronyms to reduce file size):

vcfMerger2.py -g ${REF_GENOME} --toolnames "haplotypecaller|freebayes|samtools" --vcfs "testFile_HC.100000lines.vcf|testFile_FB.100000lines.vcf|testFile_ST.100000lines.vcf" --prep-outfilenames "HC_prep.vcf|FB_prep.vcf|ST_prep.vcf" --germline --germline-snames "HAPI_0001_000001_OV_Whole_T1_TSWGS_A28333" -o "merged_germline_calls_3tools.vcf" -a "HC|FB|ST"

Example_6 (using filtering options [here both filtering options are being used]):
vcfMerger2.py 
--toolnames "strelka2|mutect2|lancet|octopus" 
--vcfs "strelka2.somatic.snvs_indels.vcf|mutect2.somatic.snvs_indels.FiltMutCallsTool.vcf|lancet.commpressed.somatic.snvs_indels.vcf.gz|octopus.legacy.vcf" 
--normal-sname "COLO829_C2" 
--tumor-sname "COLO829_T1"  
-g ${REF_GENOME_FASTA_FILE} 
-o colo829.merged.vcf 
--path-jar-snpsift /tools/snpEff/4.3/snpEff/SnpSift.jar 
--filter-by-pass 
--prep-outfilenames "SLK.prep.vcf|MUT.prepped.vcf|LAN.prepped.vcf|OCT_prepped.vcf"  
--filter "( GEN[0].DP>=10 && GEN[0].DP>=10 ) && ( GEN[0].AR<=0.02 && GEN[1].AR >= 0.05 )" 
--do-venn
Example_7 (same as Example_6, but each vcf is filtered independently using different field or flags ):
vcfMerger2.py 
--toolnames "strelka2|mutect2|lancet|octopus" 
--vcfs "strelka2.somatic.snvs_indels.vcf|mutect2.somatic.snvs_indels.FiltMutCallsTool.vcf|lancet.commpressed.somatic.snvs_indels.vcf.gz|octopus.legacy.vcf" 
--normal-sname "COLO829_C2" 
--tumor-sname "COLO829_T1"  
-g ${REF_GENOME_FASTA_FILE} 
-o colo829.merged.vcf 
--path-jar-snpsift /tools/snpEff/4.3/snpEff/SnpSift.jar 
--filter-by-pass 
--prep-outfilenames "SLK.prep.vcf|MUT.prepped.vcf|LAN.prepped.vcf|OCT_prepped.vcf"  
--filter "( GEN[0].DP>=10 && QSS>0.5 )|(GEN[0].DP>=10 && MUT>2 )|( GEN[0].AR<=0.02 && GEN[1].AR>=0.5 )|(GEN[1].AR >= 0.05 )" 
--do-venn

Note: Compared to Example_6, in Example_7 --filter is a piped-delimited list with one filtering per tool ; In Example_6, The same filter is apply to ALL the VCFs, requiring that the ALL VCFs MUST have the same fields and/or flags;


SOON MORE EXAMPLES TO COME


top