Skip to content

Seedability: A tool to determine parameters t (shared number of seeds) and k (seed length) between a set of reads or between a reference and set of reads.

License

Notifications You must be signed in to change notification settings

lorrainea/Seedability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seedability

Installation: To compile Seedability, run make.

INPUT: A set of reads in multiFASTA format and a reference sequence (optional).

OUTPUT: A TSV file containing an estimate of the seed length (k) and number of shared seeds (t) between every pairwise (i,j) reads (or every read i and the given reference) as well as the optimal number of shared seeds and seed length of all sequences.

Column 1: Read_i id or Ref id (if parameter -r is used).

Column 2: Read_j id or Read_i id (if parameter -r is used).

Column 4: Number of shared seeds (t) between every sequence pair.

Column 5: Length of shared seeds (k) between every sequence pair.

Usage: seedability
Standard (Mandatory):
-q, --reads-file	<str>		multiFASTA reads filename. 
-o, --output-filename	<str>		Output filename.
Optional:
-r, --ref-file		<str>		FASTA reference filename.
-l, --min-k		<int>		Minimum k value to explore (Default: 3).
-k, --max-k		<int>		Maximum k value to explore (Default: 15).
-d, --delta		<double>	Threshold allowance between best alignment identity and alignment identity for larger k values (Default: 0.05).

Examples

./seedability -q ./data/synthetic/reads.fasta -o out_reads.tsv
./seedability -q ./data/synthetic/reads.fasta -r ./data/synthetic/ref.fasta -o out_reads_ref.tsv

Experimental Results

The results above show that for datasets ranging from 100bp-15000bp in length and a range of divergences from 5%-25%, the average alignment identity between 100 sequence pairs is higher when using the results output by Seedability along with Minimap2, in comparison to when using the default values of Minimap2. Note that some sequences are unmapped when using the default values of Minimap2.

Citation:

Lorraine A. K. Ayad, Rayan Chikhi and Solon P. Pissis, Seedability: Optimising alignment parameters for sensitive sequence comparison, Bioinformatics Advances, 2023; vbad108, https://doi.org/10.1093/bioadv/vbad108

License: GNU GPLv3 License; Copyright (C) 2023 Lorraine A.K. Ayad, Rayan Chikhi and Solon P. Pissis.

About

Seedability: A tool to determine parameters t (shared number of seeds) and k (seed length) between a set of reads or between a reference and set of reads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages