feat: Add 62 new unit tests for match_regions_to_genes, transcript, candidate processing, and parsers#13
Closed
TianYuan-Liu wants to merge 2 commits intomasterfrom
Closed
Conversation
…andidate processing, and parsers Add comprehensive test coverage across multiple modules: - match_regions_to_genes integration tests (9 tests): empty regions/genes handling, sorted region processing, max_gene_length effects, multiple chromosomes, large gaps between genes, gene level reporting with merging, and region order preservation. - Transcript advanced tests (7 tests): set_length interactions with calculate_size, exons extending beyond boundaries, renumber with many exons for both strands, overlapping exons robustness, single exon transcripts, and clone preserving exon numbers. - Process candidates edge cases (7 tests): single candidate at each reporting level (exon/transcript/gene), multiple genes at gene level, same gene different transcripts merging, mixed areas at transcript level, and all-below-threshold handling. - Config validation tests (8 tests): empty/comma-only rules parsing, partial valid rules, invalid tag handling, multiple set_distance_kb calls, all-zero and large max_lookback values, default rules order. - TSS boundary condition tests (5 tests): region at exact TSS position, very large distances, negative strand at end, zero length TSS zone, large region spanning all zones. - TTS boundary condition tests (5 tests): region at exact TTS position, very large distances, negative strand at start, zero TTS zone, large region spanning TTS and downstream. - BED reader edge cases (6 tests): single line files, exact chunk size, chunk larger than file, mixed comments, browser/track lines, Windows line endings. - GTF parser edge cases (6 tests): single exon genes, many exons (20), unsorted exons, mixed strands on same chromosome, genes without exons, duplicate exons. - Rules priority tests (5 tests): FirstExon beats Promoter, TSS beats FirstExon, custom rules with Downstream first, same priority ties, pctg_region tiebreaker. - Output line format validation (5 tests): field count with/without metadata, field order verification, percentage rounding, hundred percent formatting. Unit test count increased from 282 to 344 tests.
Co-Authored-By: Claude (claude-opus-4-5) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
match_regions_to_genes integration tests (9 tests): empty regions/genes
handling, sorted region processing, max_gene_length effects, multiple
chromosomes, large gaps between genes, gene level reporting with merging,
and region order preservation.
Transcript advanced tests (7 tests): set_length interactions with
calculate_size, exons extending beyond boundaries, renumber with many
exons for both strands, overlapping exons robustness, single exon
transcripts, and clone preserving exon numbers.
Process candidates edge cases (7 tests): single candidate at each
reporting level (exon/transcript/gene), multiple genes at gene level,
same gene different transcripts merging, mixed areas at transcript
level, and all-below-threshold handling.
Config validation tests (8 tests): empty/comma-only rules parsing,
partial valid rules, invalid tag handling, multiple set_distance_kb
calls, all-zero and large max_lookback values, default rules order.
TSS boundary condition tests (5 tests): region at exact TSS position,
very large distances, negative strand at end, zero length TSS zone,
large region spanning all zones.
TTS boundary condition tests (5 tests): region at exact TTS position,
very large distances, negative strand at start, zero TTS zone, large
region spanning TTS and downstream.
BED reader edge cases (6 tests): single line files, exact chunk size,
chunk larger than file, mixed comments, browser/track lines, Windows
line endings.
GTF parser edge cases (6 tests): single exon genes, many exons (20),
unsorted exons, mixed strands on same chromosome, genes without exons,
duplicate exons.
Rules priority tests (5 tests): FirstExon beats Promoter, TSS beats
FirstExon, custom rules with Downstream first, same priority ties,
pctg_region tiebreaker.
Output line format validation (5 tests): field count with/without
metadata, field order verification, percentage rounding, hundred
percent formatting.
Unit test count increased from 282 to 344 tests.