Authour: Shashwat Sahay(shashwat.sahay@charite.de)
This pipeline was developed under supervision of Dr. Naveed Ishaque (naveed.ishaque@charite.de) and Prof. Roland Eils (roland.eils@charite.de).
The pipeline was tested and supported by Daniel Steiert.
The pipeline is made for aligning UMI based WGS/WES and Panel Seq data and to compute the QC metrics associated with it.
We require the sequencing is performed in paired end mode
Currently the ability to provide support is limited.
To run the pipeline make sure you have a working snakemake installation in a conda environment. We highly recommend using miniforge3 instead of any other alternatives conda!!! Please follow this guide on how to install mamba
Clone this repository with command
git clone https://github.com/HiDiHlabs/umi_alignment.git
And change to directory
cd umi_alignment
Please create a conda environment
mamba env create -f workflow/envs/umi-dedup-base.yaml
A one for all conda environment is available at workflow/envs/umi-dedup-full.yaml. Although this is not the recommended way to prepare the conda environment in which the pipeline is run. As each rule has its own conda enviroment and can be/should be run independent of the base environment
mamba env create -f workflow/envs/umi-dedup-full.yaml
To start the pipeline certain configurations must be made in the template config config/config.yaml. It is recommended for each run of the pipeline a new config file be created based on the template. It is also remcommended that the config file is stored in the output folder
Please modify the entry for
-
SeqType: Should be eitherPanel,WGSorWES -
library_prep_kit: Library prep kit used for preparing the sample. If not available will be set to Unknown -
pid: The patient ID as that in the PATIENT ID column. Needs to bestring -
sample: Since this pipleine is run sample wise please mention the sample name as mentioned in the sample_name column of the metadata file. Needs to bestring -
metadataAbsolute path to the metadata sheet (Please check Metadata section for format specifcation of the metadata sheet). Needs to bePath -
work_dirPlease provide an absolute path to a working folder to store the output of pipeline. It is recommended that this folder is suffixed with PID_SAMPLE to avoid result overwriting. Needs to bePath -
log_dirPlease provide an absolute path to a log folder to store the logs of the pipeline. It is recommended that this is inside the work_dir. Needs to bePath, if not provided or left empty the reverts to default<workdir>/logs -
genomePlease provide an abosolute path to thegenome.fafile please note the genome should be indexed for use with bwa mem and indexes should be in the same folder as the genome -
dbsnp: Please provide path to a vcf file used for recalibration by BaseRecalibrator -
trim_adapters: A boolean to switch on and off the adapter trimming using cutadapt. It is highly recommended that adapter trimming be carried out but can switched off in rare cases -
Adapter_R1with the Adapter Sequences for Read 1 of the library prep. Needs to beListcan be an empty list iftrim_adaptersswitch is set to False -
Adapter_R3with the Adapter Sequences for Read 3 of the library prep Needs to beListcan be an empty list iftrim_adaptersswitch is set to False -
target_regions: Absolute path to the target regions, must be set whenSeqTypeisWESorPanel -
bait_regions: Absolute path to the bait regions, if unset andSeqTypeisWESorPanel. A slop of 100bp on thetarget_regionsis computed and used as bait regions -
chrom_sizes: An absolute path to chromosomals length for the given genomes, ignored ifSeqTypeisWGS -
dict_genome: An absolute path to dict file for the given genomes, ignored ifSeqTypeisWGS
group_allowed_edits: Number of edit allowed when grouping based on umi. defaults to 0, should be set to zero if correct_umi is truegroup_min_mapq: 20: Set--min-map-qof groupReadsByUMIgroup_strategy: Set the--strategyparam of groupReadsByUMI deafults toAdjacency
consensus_min_reads: 1consensus_min_base_qual: 2consensus_min_input_base_mapq: 10consensus_error_rate_pre_umi: 45consensus_error_rate_post_umi: 30
filter_min_reads: 3filter_min_base_qual: 2filter_max_base_error_rate: 0.1filter_max_read_error_rate: 0.05filter_max_no_call_fraction: 0.2
read_structure: 8M143T 8M143T
correct_umi: Falsecorrect_umi_max_mismatches: 3correct_umi_min_distance: 1umi_file: