UMI Based Sequencing Alignment Pipeline

Authour: Shashwat Sahay(shashwat.sahay@charite.de)

This pipeline was developed under supervision of Dr. Naveed Ishaque (naveed.ishaque@charite.de) and Prof. Roland Eils (roland.eils@charite.de).

The pipeline was tested and supported by Daniel Steiert.

The pipeline is made for aligning UMI based WGS/WES and Panel Seq data and to compute the QC metrics associated with it.

We require the sequencing is performed in paired end mode

Currently the ability to provide support is limited.

Prerequisites

To run the pipeline make sure you have a working snakemake installation in a conda environment. We highly recommend using miniforge3 instead of any other alternatives conda!!! Please follow this guide on how to install mamba

Clone this repository with command

git clone https://github.com/HiDiHlabs/umi_alignment.git

And change to directory


cd umi_alignment

Recommended Installation

Please create a conda environment

mamba env create -f workflow/envs/umi-dedup-base.yaml

One step installation

!!! Not Recommended !!!

A one for all conda environment is available at workflow/envs/umi-dedup-full.yaml. Although this is not the recommended way to prepare the conda environment in which the pipeline is run. As each rule has its own conda enviroment and can be/should be run independent of the base environment

mamba env create -f workflow/envs/umi-dedup-full.yaml

Pipeline Preparation

!!! Important !!!

Config file

To start the pipeline certain configurations must be made in the template config config/config.yaml. It is recommended for each run of the pipeline a new config file be created based on the template. It is also remcommended that the config file is stored in the output folder

Please modify the entry for

SeqType: Should be either Panel, WGS or WES
library_prep_kit: Library prep kit used for preparing the sample. If not available will be set to Unknown
pid: The patient ID as that in the PATIENT ID column. Needs to be string
sample: Since this pipleine is run sample wise please mention the sample name as mentioned in the sample_name column of the metadata file. Needs to be string
metadata Absolute path to the metadata sheet (Please check Metadata section for format specifcation of the metadata sheet). Needs to be Path
work_dir Please provide an absolute path to a working folder to store the output of pipeline. It is recommended that this folder is suffixed with PID_SAMPLE to avoid result overwriting. Needs to be Path
log_dir Please provide an absolute path to a log folder to store the logs of the pipeline. It is recommended that this is inside the work_dir. Needs to be Path, if not provided or left empty the reverts to default <workdir>/logs
genome Please provide an abosolute path to the genome.fa file please note the genome should be indexed for use with bwa mem and indexes should be in the same folder as the genome
dbsnp: Please provide path to a vcf file used for recalibration by BaseRecalibrator
trim_adapters: A boolean to switch on and off the adapter trimming using cutadapt. It is highly recommended that adapter trimming be carried out but can switched off in rare cases
Adapter_R1 with the Adapter Sequences for Read 1 of the library prep. Needs to be List can be an empty list if trim_adapters switch is set to False
Adapter_R3 with the Adapter Sequences for Read 3 of the library prep Needs to be List can be an empty list if trim_adapters switch is set to False
target_regions: Absolute path to the target regions, must be set when SeqType is WES or Panel
bait_regions: Absolute path to the bait regions, if unset and SeqType is WES or Panel. A slop of 100bp on the target_regions is computed and used as bait regions
chrom_sizes: An absolute path to chromosomals length for the given genomes, ignored if SeqType is WGS
dict_genome: An absolute path to dict file for the given genomes, ignored if SeqType is WGS

http://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html

group_allowed_edits: Number of edit allowed when grouping based on umi. defaults to 0, should be set to zero if correct_umi is true
group_min_mapq: 20: Set --min-map-q of groupReadsByUMI
group_strategy: Set the --strategy param of groupReadsByUMI deafults to Adjacency

https://fulcrumgenomics.github.io/fgbio/tools/latest/CallMolecularConsensusReads.html

consensus_min_reads: 1
consensus_min_base_qual: 2
consensus_min_input_base_mapq: 10
consensus_error_rate_pre_umi: 45
consensus_error_rate_post_umi: 30

https://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html

filter_min_reads: 3
filter_min_base_qual: 2
filter_max_base_error_rate: 0.1
filter_max_read_error_rate: 0.05
filter_max_no_call_fraction: 0.2

http://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html

read_structure: 8M143T 8M143T

http://fulcrumgenomics.github.io/fgbio/tools/latest/CorrectUmis.html

correct_umi: False
correct_umi_max_mismatches: 3
correct_umi_min_distance: 1
umi_file:

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
config		config
profile		profile
workflow		workflow
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UMI Based Sequencing Alignment Pipeline

Prerequisites

Recommended Installation

One step installation

!!! Not Recommended !!!

Pipeline Preparation

!!! Important !!!

Config file

http://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html

https://fulcrumgenomics.github.io/fgbio/tools/latest/CallMolecularConsensusReads.html

https://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html

http://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html

http://fulcrumgenomics.github.io/fgbio/tools/latest/CorrectUmis.html

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

License

HiDiHlabs/umi_alignment

Folders and files

Latest commit

History

Repository files navigation

UMI Based Sequencing Alignment Pipeline

Prerequisites

Recommended Installation

One step installation

!!! Not Recommended !!!

Pipeline Preparation

!!! Important !!!

Config file

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages