-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
For metagenomics and --aemb the index often reaches several hundred gigabytes. At this point loading the index into memory takes longer than the actual mapping.
Could strobealign be given a list of samples as input, such that it iterates over each sample, mapping its reads and saving its output, then progressing to the next sample without exiting, thereby not having to reload the index into memory again?
This would make strobealign three to four times faster for our use case.
An input tsv file could look like:
sample1 fq/sample1_1.fq.gz fq/sample1_2.fq.gz map/sample1.tsv
sample2 fq/sample2_1.fq.gz fq/sample2_2.fq.gz map/sample2.tsv
sample3 fq/sample3_1.fq.gz fq/sample3_2.fq.gz map/sample3.tsv
sample4 fq/sample4_1.fq.gz fq/sample4_2.fq.gz map/sample4.tsv
Or just a list of samples
sample1
sample2
sample3
sample4
where strobealign is then told to look for input files sampleX_1.fq.gz and sampleX_2.fq.gz in the specified input folder and write the --aemb output tsv file in the specified output folder.
Metadata
Metadata
Assignees
Labels
No labels