-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Labels
enhancementNew feature or requestNew feature or requestepicFeature being added to readfishFeature being added to readfishneeds discussionA topic/feature that needs discussion from maintainers and usersA topic/feature that needs discussion from maintainers and users
Description
Feature being added - Filtering!
Filter reads and alignments based on a flexible "mini language" specified in the TOML.
Would go into config TOML under the respective place for the filtering? I.e caller_settings/mapper_settings
Applied in the tight targets loop
# for base calling
filter = [
"metadata.sequence_length > 0",
"metadata.sequence_length < 1000",
]
# for alignment
filter = [
"is_primary",
"mapq > 40",
"strand == -1",
]This is parsed into magic Enums and Classes in _filter.py
chunks = read_until_client.get_read_batch(...)
filtered_calls, calls = partition(basecall_filter, basecall(chunks))
filtered_aligns, aligns = partition(alignment_filter, align(calls))
for result in aligns:
print("boo these alignments are trash")
for filtered_item in filtered_calls + filtered_aligns:
print("Woohoo we have great success in filtering")I suppose we would store these on the respective classes? _PluginModule or something I forget
Ideas
- Extend language to startsWith/endsWith
- and/or/not logical operators
Issues that need resolving/clarification
- VERY footgunny - for example
sequence.metadata.length < 0,mapq < 0and goodbye all reads. How can we safeguard against this? I suggest maybe starting only with PAF, and maybe adding some checks in validation. - Where do we add the tracking of filtering status. Do we add it directly to the
Resultobject, do we add it straight into the pluginbasecall/map_readsmethods (would involve having to write separate implementations for new plugins), and have the plugin return two Iterables, one of reads/Resultsinstances that passed and one that failed? - What do we do with Results that fail validations,
unblockorproceed?- Fails basecalling filtering
- Fails alignment filtering
- DO we add a
fails_validationto the toml/Conditions section, which defaults toproceed? This then relies on the exceeded max chunk behaviour - How and where do we log this?
- What will it look like/where will it be placed in the config?
- What will the API between targets and plug-ins look like?
- How will we ensure that targets doesn’t miss any data?
### Tasks
- [ ] #304
- [ ] Describe mini-language
- [ ] Needs mad tests
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestepicFeature being added to readfishFeature being added to readfishneeds discussionA topic/feature that needs discussion from maintainers and usersA topic/feature that needs discussion from maintainers and users