Feature request: Keep index in memory while mapping multiple samples in a row

For metagenomics and `--aemb` the index often reaches several hundred gigabytes. At this point loading the index into memory takes longer than the actual mapping.

Could strobealign be given a list of samples as input, such that it iterates over each sample, mapping its reads and saving its output, then progressing to the next sample without exiting, thereby not having to reload the index into memory again?

This would make strobealign three to four times faster for our use case.

An input tsv file could look like:
```
sample1 fq/sample1_1.fq.gz fq/sample1_2.fq.gz map/sample1.tsv
sample2 fq/sample2_1.fq.gz fq/sample2_2.fq.gz map/sample2.tsv
sample3 fq/sample3_1.fq.gz fq/sample3_2.fq.gz map/sample3.tsv
sample4 fq/sample4_1.fq.gz fq/sample4_2.fq.gz map/sample4.tsv
```

Or just a list of samples
```
sample1
sample2
sample3
sample4
```
where strobealign is then told to look for input files `sampleX_1.fq.gz` and `sampleX_2.fq.gz` in the specified input folder and write the `--aemb` output tsv file in the specified output folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Keep index in memory while mapping multiple samples in a row #483

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: Keep index in memory while mapping multiple samples in a row #483

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions