calculate coverage using run report instead of fastq processing by mrmonkington · Pull Request #4 · MicrobesNG/mingo

mrmonkington · 2026-02-17T09:58:06Z

What does this Pull Request do and why??

Provides a coverage analysis tool which uses Minknow own output run stats, rather than scraping fastqs.

Will allow faster run summaries, including mid-run, without hammering the data disks.

Can use either of:

the run report report_xxxxxxx.json for instant coverage stats.
the run summary sequencing_summary_xxxx.txt (tsv) for read size binning.

Can optionally only output samples below a target coverage, and default bin threshold of 7kb can be changed.

What are the relevant Github Issues or support tickets?

None.

Where should the reviewer start?

See the README!

How can the reviewer see this in action and/or test?

Experiment with the files in examples/

Compliance

None

…summary-only support

R-Cardenas · 2026-02-19T11:16:20Z

bin/mingo-coverage

+    parser.add_argument("--summary", help="Path to the sequencing summary TXT file.")
+    parser.add_argument("--below", type=int, help="Only output lines where coverage is below this value.")
+    parser.add_argument("--bin-threshold", type=int, default=7000, help="Read length threshold for binned stats (default: 7000).")
+    parser.add_argument("--csv", action="store_true", help="Output in CSV format.")


help="Prints stdout in CSV format" - maybe?

mingo/coverage.py

R-Cardenas · 2026-02-19T11:16:40Z

mingo/coverage.py

+    thresh_kb = f"{bin_threshold/1000:g}kb"
+    if output_csv:
+        writer = csv.writer(sys.stdout)
+        csv_cols = [c if c not in ['short_total_mb', 'short_avg_len', 'long_total_mb', 'long_avg_len'] else 


would change all 'avgs' to means to be more clear - most people in the FASTQ space would use medians or N50/N90s as they are less suceptible to skew / wide distribution of read lens - which ONT generall has.

Median is more annoying to calculate - so I think that why we have historically used mean average (and would be impossible when using JSON as input? - so mean is fine for us I think)

Would also be good to get short and long read numbers? like long_read_n or short_read_n. This will help us tell if a sample is bad - e.g. 50% of reads are short is easier to interpret

You're right, I should be explicit - I was going from the run template the lab use which expects means though it just says 'ave':

File | Total_Yield_(bp) | Ave_length_(bp) | Yield_<7kbp_(bp) | Ave_length_<7kbp_(bp) | Yield_>7kbp_(bp) | Ave_length_>7kbp_(bp) | N50

But maybe they haven't really given any thought to it and have just inherited this approach.

@AndrewMNG would median be more useful for QCing the read distribution? Do the lab use this info, or is it just you? The old script produced means because it was just bodged with awk, but I can do whatever is most useful.

here's the SOP

https://docs.google.com/document/d/1ZC_R_JkTnHM5Nm80IkZ5OI9OKszJfHQJl847PmWxkMs/edit?tab=t.0

and example of where the lab put this info

https://docs.google.com/spreadsheets/d/1lTPI1VpJRYt9lyOSYe1Scsm005OnBx2j/edit?gid=1178709320#gid=1178709320

Of course ultimately want to do away with all of these manual steps - the Run Manager should just report this info where it's need (or something derived from this info)

@mrmonkington Median or N50 is more useful than mean, since the read lengths are not normally distributed and mean will be skewed by longer reads. Measuring yield <7kb and >7kb is good to get a measure of how fragmented the DNA is, although median will imply this anyway if it's low. At the moment it's not something we use, we do run Nanoplot for LR which computes a lot of stats about the reads, but I only check it if the assembly is very bad - and typically it's down to long reads that are too short. It's something we could us in future though eg. databasing - could be a way of tracking how certain species perform, or performance of native vs. rapid.

R-Cardenas

Just some small changes to some names, no functional changes. Is really nice!

mrmonkington added 6 commits February 16, 2026 10:07

wip: calculate coverage from sample sheet and run report

c6c2108

wip: add csv output, read threshold, improve formatting

5f2c98f

tweak: right align numerical, change coverage to 1dp

eb066e4

feat: enhance coverage calculator with read length distributions and …

ad05f85

…summary-only support

chore, doc: refactor and readme

87a4f4a

tweak: better option naming

542b415

mrmonkington marked this pull request as ready for review February 18, 2026 02:09

mrmonkington requested review from AndrewMNG and R-Cardenas February 18, 2026 02:09

R-Cardenas reviewed Feb 19, 2026

View reviewed changes

mingo/coverage.py Show resolved Hide resolved

R-Cardenas reviewed Feb 19, 2026

View reviewed changes

R-Cardenas approved these changes Feb 19, 2026

View reviewed changes

mrmonkington changed the base branch from mark-run-manager to main February 20, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

calculate coverage using run report instead of fastq processing#4

calculate coverage using run report instead of fastq processing#4
mrmonkington wants to merge 6 commits intomainfrom
mark-new-style-coverage

mrmonkington commented Feb 17, 2026 •

edited

Loading

Uh oh!

R-Cardenas Feb 19, 2026

Uh oh!

Uh oh!

R-Cardenas Feb 19, 2026

Uh oh!

mrmonkington Feb 20, 2026 •

edited

Loading

Uh oh!

mrmonkington Feb 20, 2026

Uh oh!

AndrewMNG Feb 20, 2026

Uh oh!

R-Cardenas left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

mrmonkington commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

R-Cardenas Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

R-Cardenas Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

mrmonkington Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrmonkington Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

AndrewMNG Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

R-Cardenas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mrmonkington commented Feb 17, 2026 •

edited

Loading

mrmonkington Feb 20, 2026 •

edited

Loading