-
Notifications
You must be signed in to change notification settings - Fork 1
Examples
Note_1: In all the following examples, you must replace the values for each option with your own value(s); These are only examples
and do not represent example you can run.
Note_2: Example_1 is the MAIN CORE command line users should use in most cases; The other Example are for a more advanced use or for users who want to run the preparation of vcf and the merging step on their own
This Example represent the CORE example and easiest way to merge VCFs using vcfMerger2.
vcfMerger2.py
-d DirTempFiles
-g hs37d5.fa
--toolnames "strelka2|mutect2|lancet|octopus"
--vcfs "./raw_tool_vcfs/strelka2.raw.vcf|./raw_tool_vcfs/mutect2.raw.vcf|./raw_tool_vcfs/lancet.raw.vcf|./raw_tool_vcfs/octopus.raw.vcf"
--prep-outfilenames "strelka2.prepped.vcf|mutect2.prepped.vcf|lancet.prepped.vcf|octopus.prepped.vcf"
--normal-sname NORMAL_SAMPLENAME
--tumor-sname TUMOR_SAMPLENAME
-o merged.vcf
WARNING regarding Example_1:
- Only RAW VCFs can be used with that command; You CANNOT use vcfMerger2-Ready VCFs or use a mix of RAW and vcfMerger2-Ready VCFs; If you do so, vcfMerger2 will fail.
vcfMerger2.py
-d ${DIROUT}
-g ${REF_GENOME_FASTA_FILE}
--toolnames "strelka2|mutect2|lancet|octopus"
--vcfs "strelka2.vcfMerger2-prepped.vcf|mutect2.vcfMerger2-prepped.vcf|lancet.vcfMerger2-prepped.vcf|octopus.vcfMerger2-prepped.vcf"
--normal-sname NORMAL_SAMPLENAME
--tumor-sname TUMOR_SAMPLENAME
-o merged.vcf
--skip-prep-vcfs
This example is used in the case you already have vcfMerger2-prepped VCFs to merge and therefore want to skip the vcfMerger2-embedded prep stage; hence the use of the option --skip-prep.
The most important feature in the Example_2 command above is the type of input vcf that must be used.
"vcfMerger2-prepped" ( or aka vcfMerger2-prepped-ready" or aka "vcfMerger2-ready") is the most important term here (see Glossary).
This means all the inputs VCFs must be to specs before merging.
This implies:
- vcfs must already be to VCF specifications version v4.2 or later
- vcf must be decomposed and have only one variant per line; Multi-Allelic Variants must be Decomposed.
- vcf must contain the contigs information in the header (same list of contig in each vcf is best)
- vcfs must have the following fields/flags present in the FORMAT columns:
- GT (GenoType must be of format 0/0, 0/1, or 1/0 as we have decomposed the RAW VCF)
- DP (Total DePth represented by 1 integer value)
- AD (Allele Depth for reference REF and first ALT allele)
- AR (Allele Ratio of the first ALT allele)
Users can prepare tool-specific vcf by only running the sub-step of vcfMerger2 called prep_vcf independently.
The script called prep_vcf.sh allows you to specifically bring up to vcfMerger2-specs vcfs for the supported tools.
This script can be called directly and run independently for each vcf and tool.
prep_vcf.sh
-d DIR_PREPPED_VCFs
--toolname lancet
--normal-sname MySampleNameForNormal
--tumor-sname sampleNameForTumorSample
--vcf ./raw_tool_vcfs/lancet.raw.vcf
-g ./ref_genome/grch37.22.fa
-o lancet.prepped.vcf
prep_vcf.sh
--toolname lancet
--normal-sname NORMAL
--tumor-sname TUMOR
--vcf raw_tool_vcfs/lancet.raw.vcf
-g ref_genome/grch37.22.fa
-o lancet.prepped.vcf
--contigs-file ./contigs/contigs/txt
--bam ${TUMOR_BAM_FILE}
Note: --contigs-file allows you to add contigs to the vcf if contigs are missing in the header of the RAW vcf ;
Normally new variants callers have already added the correct contig line in the vcf header
vcfMerger2.py -g ${REF_GENOME} --toolnames "haplotypecaller|freebayes|samtools" --vcfs "testFile_HC.100000lines.vcf|testFile_FB.100000lines.vcf|testFile_ST.100000lines.vcf" --prep-outfilenames "HC_prep.vcf|FB_prep.vcf|ST_prep.vcf" --germline --germline-snames "HAPI_0001_000001_OV_Whole_T1_TSWGS_A28333" -o "merged_germline_calls_3tools.vcf" -a "HC|FB|ST"
vcfMerger2.py
--toolnames "strelka2|mutect2|lancet|octopus"
--vcfs "strelka2.somatic.snvs_indels.vcf|mutect2.somatic.snvs_indels.FiltMutCallsTool.vcf|lancet.commpressed.somatic.snvs_indels.vcf.gz|octopus.legacy.vcf"
--normal-sname "COLO829_C2"
--tumor-sname "COLO829_T1"
-g ${REF_GENOME_FASTA_FILE}
-o colo829.merged.vcf
--path-jar-snpsift /tools/snpEff/4.3/snpEff/SnpSift.jar
--filter-by-pass
--prep-outfilenames "SLK.prep.vcf|MUT.prepped.vcf|LAN.prepped.vcf|OCT_prepped.vcf"
--filter "( GEN[0].DP>=10 && GEN[0].DP>=10 ) && ( GEN[0].AR<=0.02 && GEN[1].AR >= 0.05 )"
--do-venn
Example_7 (same as Example_6, but each vcf is filtered independently using different field or flags ):
vcfMerger2.py
--toolnames "strelka2|mutect2|lancet|octopus"
--vcfs "strelka2.somatic.snvs_indels.vcf|mutect2.somatic.snvs_indels.FiltMutCallsTool.vcf|lancet.commpressed.somatic.snvs_indels.vcf.gz|octopus.legacy.vcf"
--normal-sname "COLO829_C2"
--tumor-sname "COLO829_T1"
-g ${REF_GENOME_FASTA_FILE}
-o colo829.merged.vcf
--path-jar-snpsift /tools/snpEff/4.3/snpEff/SnpSift.jar
--filter-by-pass
--prep-outfilenames "SLK.prep.vcf|MUT.prepped.vcf|LAN.prepped.vcf|OCT_prepped.vcf"
--filter "( GEN[0].DP>=10 && QSS>0.5 )|(GEN[0].DP>=10 && MUT>2 )|( GEN[0].AR<=0.02 && GEN[1].AR>=0.5 )|(GEN[1].AR >= 0.05 )"
--do-venn
Note: Compared to Example_6, in Example_7 --filter is a piped-delimited list with one filtering per tool ;
In Example_6, The same filter is apply to ALL the VCFs, requiring that the ALL VCFs MUST have the same fields and/or flags;