+
+
+

+
+
+ WASP2
+ Allele-Specific Analysis Pipeline
+
+
+
+
+
+
+
+
+
diff --git a/docs/source/_static/logo.png b/docs/source/_static/logo.png
new file mode 100644
index 0000000..a0b4a97
Binary files /dev/null and b/docs/source/_static/logo.png differ
diff --git a/docs/source/_static/podcast/artwork/README.md b/docs/source/_static/podcast/artwork/README.md
new file mode 100644
index 0000000..27675c3
--- /dev/null
+++ b/docs/source/_static/podcast/artwork/README.md
@@ -0,0 +1,85 @@
+# The WASP's Nest - Podcast Artwork
+
+## Official WASP2 Logo
+
+The podcast uses the official WASP2 hexagonal logo featuring:
+- **Two wasps** facing each other (representing paired alleles)
+- **Colored bands** (red/blue) symbolizing allelic variants
+- **Hexagonal frame** - perfect honeycomb/hive aesthetic
+
+**Logo file:** `wasp2_logo.png` (from `doc/wasp2_hex_logo_v1.png`)
+
+## Cover Art Specifications
+
+The podcast cover should embody "The WASP's Nest" theme:
+
+### Required Files
+
+- `cover.png` - Main podcast cover (3000x3000 px)
+- `cover-small.png` - Thumbnail version (500x500 px)
+- `banner.png` - Episode banner (1920x1080 px)
+
+### Design Guidelines
+
+**Theme:** Scientific beehive meets bioinformatics
+
+**Visual Elements:**
+- 🐝 Stylized queen bee (elegant, scientific)
+- 🧬 DNA helix or chromosome imagery
+- 📊 Hexagonal honeycomb pattern (data visualization aesthetic)
+- 🔬 Subtle scientific/genomics motifs
+
+**Color Palette** (from official logo):
+- Teal/seafoam (#5DAB9E) - hexagon border
+- Mint green (#7FCBBA) - hexagon fill
+- Honey gold (#F5C244) - wasp body
+- Charcoal black (#2D2D2D) - wasp stripes
+- Allele red (#E8747C) - allele band
+- Allele blue (#5B9BD5) - allele band
+- Clean white (#FFFFFF) - background
+
+**Typography:**
+- Title: Bold, modern sans-serif
+- Subtitle: Clean, readable
+- Include tagline: "Buzz from the Hive"
+
+**Layout:**
+```
+┌─────────────────────────┐
+│ THE WASP'S NEST │
+│ │
+│ ┌───────────────┐ │
+│ │ [Official │ │
+│ │ WASP2 hex │ │
+│ │ logo with │ │
+│ │ two wasps] │ │
+│ └───────────────┘ │
+│ │
+│ Buzz from the Hive │
+│ ───────────────── │
+│ Changelog Podcast │
+└─────────────────────────┘
+```
+
+The official WASP2 logo already perfectly embodies the hive theme with its
+hexagonal shape and paired wasps representing allelic variants.
+
+### Technical Requirements
+
+- Format: PNG (preferred) or JPG
+- Color space: sRGB
+- Resolution: 72 DPI minimum, 300 DPI preferred
+- No transparent backgrounds for main cover
+- Square aspect ratio for cover images
+
+### Generation Tools
+
+Cover art can be generated using:
+- DALL-E 3 / Midjourney with prompt engineering
+- Figma/Illustrator for vector design
+- Stable Diffusion with appropriate LoRAs
+
+**Example prompt for AI generation:**
+> "Scientific podcast cover art, stylized queen bee wearing tiny lab coat,
+> hexagonal honeycomb pattern made of DNA helices, bioinformatics theme,
+> gold and blue color scheme, modern minimalist design, podcast cover format"
diff --git a/docs/source/_static/podcast/artwork/wasp2_logo.png b/docs/source/_static/podcast/artwork/wasp2_logo.png
new file mode 100644
index 0000000..a0b4a97
Binary files /dev/null and b/docs/source/_static/podcast/artwork/wasp2_logo.png differ
diff --git a/docs/source/_static/podcast/audio/.gitkeep b/docs/source/_static/podcast/audio/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/docs/source/_static/podcast/audio/episode-001-origin-swarm.mp3 b/docs/source/_static/podcast/audio/episode-001-origin-swarm.mp3
new file mode 100644
index 0000000..829f3ac
Binary files /dev/null and b/docs/source/_static/podcast/audio/episode-001-origin-swarm.mp3 differ
diff --git a/docs/source/_static/podcast/audio/episode-002-new-hive.mp3 b/docs/source/_static/podcast/audio/episode-002-new-hive.mp3
new file mode 100644
index 0000000..c0c3573
Binary files /dev/null and b/docs/source/_static/podcast/audio/episode-002-new-hive.mp3 differ
diff --git a/docs/source/_static/podcast/audio/episode-003-rust-metamorphosis.mp3 b/docs/source/_static/podcast/audio/episode-003-rust-metamorphosis.mp3
new file mode 100644
index 0000000..f029cf6
Binary files /dev/null and b/docs/source/_static/podcast/audio/episode-003-rust-metamorphosis.mp3 differ
diff --git a/docs/source/_static/podcast/chronicles/.gitkeep b/docs/source/_static/podcast/chronicles/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/docs/source/_static/podcast/chronicles/TEMPLATE.md b/docs/source/_static/podcast/chronicles/TEMPLATE.md
new file mode 100644
index 0000000..51f8275
--- /dev/null
+++ b/docs/source/_static/podcast/chronicles/TEMPLATE.md
@@ -0,0 +1,122 @@
+# Buzz Report Template
+# Episode: [NUMBER] | Version: [VERSION]
+# Date: [DATE]
+
+---
+
+## 🐝 Opening
+
+[happy buzz]
+
+Welcome to the Hive, fellow worker bees!
+
+I'm the Queen Bee, and this is The WASP's Nest - bringing you the latest
+buzz from WASP2 development.
+
+Today's Buzz Report covers version [VERSION], and we have some exciting
+news from the colony!
+
+---
+
+## 🌸 Foraging: New Features
+
+[excited waggle]
+
+The worker bees have been busy foraging for new capabilities...
+
+### Feature Name
+
+[Description of new feature]
+
+[technical tone]
+From a technical perspective, this means [technical details].
+
+---
+
+## 🏗️ Building: Improvements
+
+[precise tone]
+
+The architects of the hive have been building...
+
+### Improvement Name
+
+[Description of improvement]
+
+---
+
+## 🛡️ Defending: Bug Fixes
+
+[satisfied celebration]
+
+Our defenders have squashed some pesky bugs...
+
+### Bug Name
+
+[Description of bug fix]
+
+Buzz buzz! Another one bites the dust.
+
+---
+
+## 🌺 Pollinating: Community
+
+[playful buzz]
+
+Cross-pollination with the broader ecosystem...
+
+### Contribution/Integration
+
+[Description]
+
+---
+
+## 📊 Illumination
+
+```mermaid
+graph LR
+ A[Previous Version] --> B[This Release]
+ B --> C[New Feature 1]
+ B --> D[Improvement 1]
+ B --> E[Bug Fix 1]
+```
+
+---
+
+## 🐝 Closing
+
+[pause]
+
+And that's the buzz for version [VERSION], worker bees!
+
+Remember:
+- [Key takeaway 1]
+- [Key takeaway 2]
+
+Keep building, keep buzzing!
+May your reads map true and your alleles balance.
+
+From the WASP's Nest, this is the Queen Bee.
+
+Buzz out! 🐝
+
+---
+
+## Episode Metadata
+
+```yaml
+episode:
+ number: [NUMBER]
+ version: "[VERSION]"
+ date: "[DATE]"
+ duration_estimate: "5-7 minutes"
+ chapters:
+ - name: "Foraging"
+ topics: []
+ - name: "Building"
+ topics: []
+ - name: "Defending"
+ topics: []
+ - name: "Pollinating"
+ topics: []
+```
diff --git a/docs/source/_static/podcast/chronicles/episode-001-origin-swarm.md b/docs/source/_static/podcast/chronicles/episode-001-origin-swarm.md
new file mode 100644
index 0000000..7b7e977
--- /dev/null
+++ b/docs/source/_static/podcast/chronicles/episode-001-origin-swarm.md
@@ -0,0 +1,149 @@
+# Buzz Report: The Origin Swarm
+# Episode: 001 | The WASP Chronicles
+# Date: 2026-02-03
+
+---
+
+## Opening
+
+Welcome to the Hive, fellow worker bees.
+
+I'm the Queen Bee, and this is The WASP's Nest. Today we're bringing you something special. Instead of our usual release notes, we're going back to the beginning. This is Episode One of The WASP Chronicles... where we trace the lineage of our hive.
+
+Today's Buzz Report takes us back to 2015... when the first WASP was born.
+
+---
+
+## The Problem: Mapping Bias
+
+Picture this, worker bees. You're a researcher trying to understand which version of a gene is more active. You sequence RNA from cells, map those reads to the genome, and count how many come from each allele.
+
+Simple, right?... Wrong.
+
+Here's the sting. Reads carrying the reference allele map differently than reads carrying the alternate allele. If your read has a variant that doesn't match the reference genome, the aligner might map it to the wrong place... give it a lower quality score... or fail to map it entirely.
+
+This creates systematic bias toward the reference allele. And when you're looking for allele-specific expression?... That bias looks exactly like the biological signal you're hunting for.
+
+False positives everywhere. Real signals getting buried.
+
+---
+
+## The Foraging: A Clever Solution
+
+In 2015, a team of brilliant researchers at Stanford and the University of Chicago forged a solution. Bryce van de Geijn, Graham McVicker, Yoav Gilad, and Jonathan Pritchard published their landmark paper in Nature Methods.
+
+The title... "WASP: allele-specific software for robust molecular quantitative trait locus discovery."
+
+Their approach was elegantly simple. The WASP Read Filtering Strategy works in four steps.
+
+First... find reads overlapping variants. Identify which reads touch heterozygous sites.
+
+Second... swap the alleles. Create an alternate version of each read with the other allele.
+
+Third... remap both versions. Send both through the aligner.
+
+Fourth... filter discordant reads. If they don't map to the same place with the same quality... throw them out.
+
+The genius of this approach is clear. Any read that maps differently depending on which allele it carries is biased by definition. By removing these reads... you eliminate the bias at its source.
+
+---
+
+## Building: The Combined Haplotype Test
+
+But wait... there's more. The original WASP didn't just fix mapping bias. It introduced a powerful statistical test called the Combined Haplotype Test... or CHT.
+
+Traditional approaches tested either read depth... does a genetic variant affect total expression?... or allelic imbalance... among heterozygotes, is one allele more expressed?
+
+The CHT combined both signals into a single test.
+
+The test integrates across individuals, combining total read counts at the gene level... allele-specific read counts at heterozygous sites within the gene... and proper handling of overdispersion using a beta-binomial model.
+
+This gave substantially more power to detect expression QTLs than either approach alone.
+
+---
+
+## The Original Architecture
+
+The 2015 WASP was built for its era.
+
+The technology stack included Python 3.x with C extensions... about 77 percent Python and 19 percent C. HDF5 format for variant storage via PyTables. NumPy and SciPy for numerical computation. And pysam for BAM file handling.
+
+The tools were straightforward. snp2h5 converted VCF files to HDF5 format. find_intersecting_snps.py found reads overlapping variants. filter_remapped_reads.py removed biased reads after remapping. And combined_test.py ran the CHT for QTL discovery.
+
+The HDF5 requirement was pragmatic for 2015... it offered fast random access to millions of variants. But it also meant users had to convert their VCF files before running the pipeline.
+
+---
+
+## Deep Dive: The Science
+
+For the bioinformaticians in the hive... let's go deeper.
+
+The key insight was modeling read mapping as a stochastic process. Given a heterozygous site with alleles A and B, a read carrying allele A might have mapping probability P_A... while the same read with allele B has probability P_B.
+
+If P_A is not equal to P_B... that read is biased. By simulating the alternate allele and testing empirically, WASP avoided the need to model aligner behavior analytically.
+
+The CHT used a likelihood ratio test. The null hypothesis states no genetic effect... expression is independent of genotype. The alternative hypothesis states a genetic effect is present... a QTL exists.
+
+The test statistic follows a chi-squared distribution under the null... with overdispersion handled by the beta-binomial model for allelic counts.
+
+---
+
+## The Impact
+
+The original WASP made a lasting mark.
+
+529 commits over four-plus years of development. 111 stars on GitHub at github.com slash bmvdgeijn slash WASP. Last release v0.3.4 in April 2019. And cited by hundreds of eQTL and ASE studies worldwide.
+
+But perhaps most importantly... it established the fundamental approach that all subsequent allele-specific analysis tools would build upon.
+
+---
+
+## Closing
+
+And that's the buzz on where it all began, worker bees.
+
+The original WASP showed us that mapping bias isn't just a nuisance... it's a fundamental problem that requires a principled solution. By swapping alleles and filtering discordant reads, van de Geijn and colleagues gave the field a tool that remains influential a decade later.
+
+The key takeaways from this episode. Mapping bias is real and can masquerade as biological signal. The WASP filtering strategy removes bias at its source. And combining read depth and allelic imbalance increases statistical power.
+
+In our next episode... we'll see how the McVicker Lab took these foundational ideas and built something new.
+
+Keep building... keep buzzing. May your reads map true and your alleles balance.
+
+From the WASP's Nest... this is the Queen Bee.
+
+Buzz out.
+
+---
+
+## Episode Metadata
+
+```yaml
+episode:
+ number: 1
+ title: "The Origin Swarm"
+ subtitle: "Original WASP (2015)"
+ series: "The WASP Chronicles"
+ date: "2026-02-03"
+ duration_estimate: "8-10 minutes"
+ source_paper:
+ title: "WASP: allele-specific software for robust molecular quantitative trait locus discovery"
+ authors: ["van de Geijn B", "McVicker G", "Gilad Y", "Pritchard JK"]
+ journal: "Nature Methods"
+ year: 2015
+ pmid: 26366987
+ doi: "10.1038/nmeth.3582"
+ source_repo: "https://github.com/bmvdgeijn/WASP"
+ note: "The original WASP used the Combined Haplotype Test (CHT). WASP2 replaced CHT with a beta-binomial model for allelic imbalance detection."
+ chapters:
+ - name: "The Problem"
+ topics: ["mapping bias", "allele-specific analysis", "false positives"]
+ - name: "Foraging"
+ topics: ["WASP filtering", "allele swapping", "read remapping"]
+ - name: "Building"
+ topics: ["Combined Haplotype Test", "beta-binomial", "QTL detection"]
+ - name: "Deep Dive"
+ topics: ["statistical model", "likelihood ratio test"]
+ - name: "Impact"
+ topics: ["citations", "field influence"]
+```
diff --git a/docs/source/_static/podcast/chronicles/episode-002-new-hive.md b/docs/source/_static/podcast/chronicles/episode-002-new-hive.md
new file mode 100644
index 0000000..4b1ad89
--- /dev/null
+++ b/docs/source/_static/podcast/chronicles/episode-002-new-hive.md
@@ -0,0 +1,170 @@
+# Buzz Report: Building the New Hive
+# Episode: 002 | The WASP Chronicles
+# Date: 2026-02-03
+
+---
+
+## Opening
+
+Welcome to the Hive, fellow worker bees.
+
+I'm the Queen Bee, and this is The WASP's Nest. Today we continue The WASP Chronicles with Episode Two... Building the New Hive.
+
+In our last episode, we explored the original WASP from 2015... a groundbreaking tool that solved mapping bias. But by 2021, the field had evolved. Single-cell technologies exploded. VCF files became the universal standard. And a new generation of researchers needed modern tools.
+
+This is the story of how WASP2 was born at the McVicker Lab.
+
+---
+
+## The Call to Rebuild
+
+Let's set the scene. It's late 2021 at the Salk Institute. The original WASP is still widely used... but showing its age.
+
+The pain points were real. Researchers had to convert every VCF file to HDF5 format before running any analysis. Single-cell experiments? Not supported. The command-line tools were scattered Python scripts with inconsistent interfaces. Dependencies were becoming harder to manage. And performance bottlenecks were slowing down large-scale studies.
+
+Researchers were spending more time wrestling with file formats... than doing actual biology.
+
+But there was opportunity. VCF and BCF had become universal standards. Single-cell ATAC-seq and RNA-seq were now mainstream. Modern Python packaging... with pyproject.toml, typer, and rich... had made CLI development elegant. The core algorithms were still sound. Only the interface needed modernization.
+
+---
+
+## Foraging: The New Design
+
+Aaron Ho, working with the McVicker Lab, established a new repository... mcvickerlab WASP2. The vision was clear from day one.
+
+The design principles were straightforward. First... no format conversion. Read VCF and BCF files directly. Eliminate the HDF5 step entirely. Second... a unified CLI. One tool with many subcommands, like git. Third... single-cell native support. First-class handling for scATAC and scRNA experiments. Fourth... modern packaging. A simple pip install. Clean dependencies. No headaches.
+
+Here's what the transformation looked like in practice. The old way required multiple scripts... snp2h5 dot py to convert variants... find intersecting snps dot py to identify overlaps... filter remapped reads dot py for the filtering step. Multiple commands, multiple outputs, multiple opportunities for confusion.
+
+The new way is elegantly simple. wasp2-count for counting alleles at variant sites. wasp2-map for the mapping bias correction pipeline. wasp2-analyze for detecting allelic imbalance. Clean. Intuitive. No HDF5 in sight.
+
+---
+
+## Building: The Architecture
+
+The architects of WASP2 made thoughtful choices about the new hive's structure.
+
+For the command-line interface, they chose Typer. Modern argument parsing with automatic help generation and shell completion. Each subcommand became a focused tool. wasp2-count handles allele counting at heterozygous variant sites. wasp2-map provides the unbiased read mapping pipeline. wasp2-analyze runs statistical analysis for detecting allelic imbalance. And wasp2-ipscore enables QTL scoring workflows.
+
+For terminal output, they integrated Rich. Beautiful progress bars, colored output, and informative error messages. No more walls of text flooding the terminal.
+
+For single-cell support, they built native AnnData integration. The scanpy ecosystem's data structure became a first-class citizen. Single-cell researchers could take WASP2 output and flow directly into downstream analysis.
+
+The module organization reflects this clarity. The counting module handles allele counting at heterozygous sites. The mapping module manages the read filtering pipeline. The analysis module contains the statistical models... specifically the beta-binomial distribution for detecting allelic imbalance. And the I/O module supports VCF, BCF, and even the high-performance PGEN format.
+
+Pure Python... cleanly organized... well-documented.
+
+---
+
+## Defending: The Statistical Heart
+
+One thing WASP2 never compromised on... the core science.
+
+The mapping bias correction strategy remained unchanged from the original. Find reads overlapping heterozygous variants. Swap the alleles in the read sequence. Remap both versions. Filter out any reads that map differently. Simple. Principled. Effective.
+
+But the statistical analysis evolved. While the original WASP used the Combined Haplotype Test... WASP2 took a different approach. The new analysis module centers on the beta-binomial distribution.
+
+Here's why this matters. When you count alleles at a heterozygous site, you expect roughly fifty-fifty between reference and alternate. But biological and technical variation create overdispersion... more variance than a simple binomial would predict. The beta-binomial model captures this elegantly with two parameters. Mu represents the mean imbalance probability. Rho captures the dispersion.
+
+WASP2 fits these parameters using likelihood optimization, then runs a likelihood ratio test. The null hypothesis... no allelic imbalance, mu equals 0.5. The alternative... imbalance exists. The test statistic follows a chi-squared distribution... giving you a p-value you can trust.
+
+The model supports both phased and unphased genotypes. For phased data, the optimization is direct. For unphased data, a clever dynamic programming approach averages over possible phase configurations.
+
+This is the scientific heart of WASP2. Robust statistical testing... properly accounting for overdispersion... with principled inference.
+
+---
+
+## Deep Dive: VCF Native
+
+For the technically curious bees... let's explore the VCF handling innovation.
+
+The original WASP used HDF5 because random access to variants was critical. You need to quickly look up which variants overlap each read. HDF5 provided indexed arrays for this.
+
+WASP2 solved this problem differently. VCF indexing via tabix provides genomic coordinate indexing through the tbi files. Pysam's TabixFile class enables fast region queries without any format conversion. And for maximum speed, the cyvcf2 backend offers C-accelerated VCF parsing... roughly seven times faster than pure Python.
+
+But WASP2 went further. Beyond VCF, the BCF format... the binary version of VCF... offers another seven-fold speedup through native binary parsing. And for the ultimate performance, PGEN format support via Pgenlib delivers a stunning twenty-five times speedup over standard VCF.
+
+Users can keep their existing files... no conversion pipeline required. Just choose the format that matches your performance needs.
+
+---
+
+## Pollinating: The Ecosystem
+
+WASP2 was designed to play nicely with the broader bioinformatics ecosystem.
+
+For inputs... BAM or CRAM files from any aligner. VCF, BCF, or PGEN from any variant caller or imputation pipeline. Standard FASTQ for the remapping step.
+
+For outputs... TSV files for simple downstream processing. Parquet for efficient columnar storage and fast queries. And AnnData in H5AD format for seamless single-cell integration.
+
+The interoperability is deliberate. Standard bcftools and samtools compatibility. Integration with the scanpy and AnnData ecosystem. Bioconda packaging for easy installation.
+
+WASP2 didn't reinvent wheels... it connected them.
+
+---
+
+## The Timeline
+
+The journey from concept to release tells a story of steady progress.
+
+December 2021... the repository was established. Through 2022... the core counting and mapping modules took shape. In 2023... single-cell support arrived alongside robust testing infrastructure. September 2024 marked the v1.0.0 official release. November 2024 brought v1.1.0... and the beginning of Rust acceleration.
+
+That performance revolution... that's a story for our next episode.
+
+---
+
+## Closing
+
+And that's the buzz on building the new hive, worker bees.
+
+WASP2 represented a modern reimagining of the original vision. Same proven science for mapping bias correction. New accessible interface for modern workflows. The McVicker Lab took a decade of lessons learned and built something that feels native to 2020s research.
+
+The key insights from this chapter... Modernization doesn't mean reinvention. The core science remained. Developer experience matters... unified CLI, no format conversion, clean outputs. And ecosystem integration accelerates adoption.
+
+In our next episode... we'll witness the Rust metamorphosis. When WASP2 learned to fly at lightning speed.
+
+Keep building... keep buzzing. May your reads map true and your alleles balance.
+
+From the WASP's Nest... this is the Queen Bee.
+
+Buzz out.
+
+---
+
+## Episode Metadata
+
+```yaml
+episode:
+ number: 2
+ title: "Building the New Hive"
+ subtitle: "McVicker Lab WASP2"
+ series: "The WASP Chronicles"
+ date: "2026-02-03"
+ duration_estimate: "10-12 minutes"
+ source_repo: "https://github.com/mcvickerlab/WASP2"
+ authors:
+ - "Aaron Ho - Creator of WASP2"
+ - "Jeff Jaureguy - Developer and maintainer"
+ - "McVicker Lab, Salk Institute"
+ timeline:
+ established: "2021-12"
+ v1_release: "2024-09"
+ v1_1_release: "2024-11"
+ technical_highlights:
+ - "Beta-binomial model for allelic imbalance (NOT CHT)"
+ - "VCF/BCF/PGEN native support (no HDF5)"
+ - "Single-cell via AnnData/H5AD"
+ - "Unified CLI: wasp2-count, wasp2-map, wasp2-analyze, wasp2-ipscore"
+ chapters:
+ - name: "The Call"
+ topics: ["modernization", "pain points", "opportunity"]
+ - name: "Foraging"
+ topics: ["design principles", "unified CLI", "no HDF5"]
+ - name: "Building"
+ topics: ["Typer", "Rich", "AnnData", "module organization"]
+ - name: "Defending"
+ topics: ["beta-binomial model", "likelihood ratio test", "phased/unphased"]
+ - name: "Deep Dive"
+ topics: ["VCF native", "BCF 7x", "PGEN 25x", "pysam", "cyvcf2"]
+ - name: "Pollinating"
+ topics: ["ecosystem integration", "format support", "AnnData output"]
+```
diff --git a/docs/source/_static/podcast/chronicles/episode-003-rust-metamorphosis.md b/docs/source/_static/podcast/chronicles/episode-003-rust-metamorphosis.md
new file mode 100644
index 0000000..60ac24c
--- /dev/null
+++ b/docs/source/_static/podcast/chronicles/episode-003-rust-metamorphosis.md
@@ -0,0 +1,245 @@
+# Buzz Report: The Rust Metamorphosis
+# Episode: 003 | The WASP Chronicles
+# Date: 2026-02-03
+
+---
+
+## Opening
+
+Welcome to the Hive, fellow worker bees.
+
+I'm the Queen Bee, and this is The WASP's Nest. Today we conclude The WASP Chronicles with Episode Three... The Rust Metamorphosis.
+
+WASP2 was modern and accessible. But in late 2024, a new challenge emerged... scale. Researchers wanted to analyze hundreds of samples. Thousands of cells. Millions of reads. And Python, for all its elegance, was becoming the bottleneck.
+
+This is the story of how WASP2 learned to fly at the speed of compiled code.
+
+---
+
+## The Performance Problem
+
+Let's talk about the numbers that drove this transformation.
+
+The bottleneck analysis was revealing. BAM-BED intersection using pybedtools took 152 seconds... just to find which reads overlap which variants. When you're running this on dozens of samples, those minutes become hours. Those hours become days.
+
+The root causes were clear. First... pybedtools overhead. Creating intermediate files, spawning subprocess calls. Second... Python string operations in the hot path. Allele swapping happening character by character. Third... GIL limitations. Single-threaded execution despite multi-core machines sitting idle. Fourth... repeated VCF parsing. Reading the same variants over and over for every BAM file.
+
+The algorithms were sound. The implementation was the constraint.
+
+---
+
+## The Rust Revolution
+
+Enter Rust... a systems programming language with zero-cost abstractions, memory safety without garbage collection, fearless concurrency, and C-level performance.
+
+And critically... PyO3. A library that lets Rust code be called from Python seamlessly.
+
+The decision wasn't to rewrite everything in Rust. It was surgical. Rewrite the three things that matter most. BAM-variant intersection. Allele counting with INDEL support. And statistical analysis using the beta-binomial model.
+
+Leave the CLI, file I/O orchestration, and user-facing code in Python.
+
+---
+
+## Foraging: The Rust Modules
+
+Over ten thousand lines of Rust code later, WASP2 had its acceleration modules.
+
+### bam_intersect.rs: The Speed Demon
+
+This module replaced pybedtools with pure Rust and a secret weapon... COITrees. Cache-Oblivious Interval Trees. Fifty to one hundred times faster than BEDTools for genomic interval queries. Memory-efficient even for millions of intervals.
+
+The performance gain speaks for itself. 152 seconds drops to 2 or 3 seconds. That's a 50 to 75 times speedup on the most expensive operation in the pipeline.
+
+### bam_counter.rs: Parallel Counting with INDEL Support
+
+The core allele counting engine received a major upgrade... full INDEL support.
+
+Not just SNPs anymore. Proper CIGAR string interpretation. Insertion and deletion allele matching with variable-length sequences. The counting logic handles reference and alternate alleles of any length.
+
+And it runs in parallel. Rayon-powered multi-threading chunks the BAM file by genomic region and aggregates results with lock-free data structures. Performance scales linearly with CPU cores.
+
+### analysis.rs: The Beta-Binomial Engine
+
+The statistical analysis module brings precision to allelic imbalance detection.
+
+The beta-binomial distribution is the right model for this problem. When counting alleles at heterozygous sites, you expect roughly fifty-fifty. But biological and technical variation create overdispersion... more variance than a simple binomial predicts.
+
+The beta-binomial captures this elegantly. The likelihood ratio test compares the null hypothesis... no imbalance, mu equals 0.5... against the alternative where imbalance exists. P-values come from the chi-squared distribution.
+
+Performance improvement... 2.7 seconds down to 0.5 seconds. A five times speedup on the statistical core.
+
+### bam_remapper.rs: CIGAR Wizardry
+
+For the mapping bias correction pipeline, the bam_remapper module handles the tricky work. CIGAR-aware read manipulation. Proper handling of soft clips, insertions, and deletions. Quality score preservation during allele swapping.
+
+This is the heart of the WASP filtering strategy... now running at compiled speed.
+
+---
+
+## Building: The Integration
+
+The PyO3 bridge made Rust feel native to Python. From the user's perspective... same CLI. Same Python API. Just faster.
+
+Under the hood, Python calls Rust seamlessly. The fast path goes through compiled code for counting alleles, intersecting intervals, and running statistical tests. All the orchestration, configuration, and user interface stays in Python where it belongs.
+
+The best optimizations are invisible to users.
+
+---
+
+## Deep Dive: The Benchmark Numbers
+
+For the performance engineers in the hive, here are the verified benchmarks.
+
+BAM-BED intersection... 50 to 75 times faster with COITrees. Statistical analysis... 5 times faster with the Rust beta-binomial implementation. VCF parsing with cyvcf2... 7 times faster than pure Python. PGEN format support via Pgenlib... 25 times faster than standard VCF. The full pipeline end-to-end... about 10 times faster overall.
+
+And the WASP filtering operation that replaced GATK AlleleCounter... 61 times faster with validation showing r-squared greater than 0.99. The results match. The speed doesn't.
+
+### New Capabilities Enabled
+
+The performance gains enabled capabilities that weren't practical before. Full INDEL support means insertions and deletions work throughout the pipeline... counting, filtering, statistical testing. Multi-format auto-detection handles VCF, BCF, or PGEN files transparently. Single-cell scale processes millions of cells without memory issues. Streaming processing maintains constant memory usage regardless of input size.
+
+The Rust modules didn't just make WASP2 faster. They made analyses possible that weren't before.
+
+---
+
+## The Architecture Insight
+
+There's a philosophy embedded in this design.
+
+We didn't rewrite everything in Rust. We rewrote the three things that matter most.
+
+What stayed in Python... CLI argument parsing, because Typer is excellent. High-level workflow orchestration. Configuration and user-facing messages. I/O format detection and dispatch.
+
+What moved to Rust... inner loops over millions of reads. Interval tree operations. Statistical log-likelihood calculations. CIGAR string manipulation.
+
+The 80/20 rule in action. Ten percent of the code was responsible for ninety-five percent of the runtime.
+
+---
+
+## Pollinating: The Deployment Ecosystem
+
+The Rust metamorphosis wasn't just about speed. It was about making WASP2 deployable everywhere.
+
+### Nextflow Pipelines
+
+Four production-ready Nextflow DSL2 pipelines emerged from this work.
+
+nf-rnaseq handles bulk RNA-seq allele-specific expression. nf-atacseq processes bulk ATAC-seq for chromatin accessibility analysis. nf-scatac scales to single-cell ATAC-seq experiments. nf-outrider integrates with the OUTRIDER framework for outlier detection.
+
+Each pipeline integrates WASP2's CLI tools into reproducible workflows with automatic resource management.
+
+### Container Support
+
+For Docker... a simple pull and run gives you the full WASP2 environment. Multi-stage builds with Rust compilation produce optimized images.
+
+For Singularity and Apptainer... HPC-ready containers that work on clusters without root access. Pull the Docker image, convert to SIF format, and run anywhere.
+
+### Distribution Channels
+
+pip install wasp2... one command to get started. Rust extensions compile automatically via maturin. Pre-built wheels for common platforms eliminate the toolchain requirement for most users.
+
+conda install from bioconda... native integration with the bioinformatics conda ecosystem.
+
+---
+
+## The Current State
+
+As of early 2026, WASP2 represents a complete production ecosystem.
+
+By the numbers... over ten thousand lines of Rust. 50 to 100 times faster intersection. 61 times faster WASP filtering. Full INDEL support for insertions and deletions. Multi-format handling with VCF, BCF, and PGEN auto-detection. Beta-binomial statistical model with phased and unphased support. Single-cell capabilities at scale. Four Nextflow pipelines. Docker and Singularity containers. PyPI and Bioconda packages.
+
+The transformation is complete.
+
+---
+
+## Closing
+
+And that's the buzz on the Rust metamorphosis, worker bees.
+
+We've traveled from 2015 to 2026. From Python to Rust. From a research tool to an enterprise-ready pipeline. The journey of WASP shows how good science and good engineering evolve together.
+
+The arc of WASP tells a clear story. 2015 was about solving mapping bias... the science. 2021 was about modernizing the interface... the developer experience. 2024 through 2026 was about achieving scale... the performance.
+
+The key insights from this chapter. Surgical optimization beats total rewrite. The algorithms were always sound... execution speed was the constraint. And 50 to 100 times speedups come from choosing the right data structures... COITrees for interval queries, Rayon for parallelism, beta-binomial for statistics.
+
+The WASP has completed its metamorphosis. From larva to adult. From concept to production.
+
+Keep building... keep buzzing. May your reads map true and your alleles balance.
+
+From the WASP's Nest... this is the Queen Bee.
+
+Buzz out.
+
+---
+
+## Episode Metadata
+
+```yaml
+episode:
+ number: 3
+ title: "The Rust Metamorphosis"
+ subtitle: "High Performance & Deployment"
+ series: "The WASP Chronicles"
+ date: "2026-02-03"
+ duration_estimate: "12-15 minutes"
+ version: "1.3.0"
+ source_repos:
+ - "mcvickerlab/WASP2 (upstream)"
+ - "Jaureguy760/WASP2-final (production)"
+ authors:
+ - "Aaron Ho - Creator of WASP2"
+ - "Jeff Jaureguy - Rust acceleration, CI/CD, packaging"
+ - "McVicker Lab, Salk Institute"
+ rust_modules:
+ - name: "bam_counter.rs"
+ purpose: "Parallel allele counting with full INDEL support"
+ speedup: "10-50x"
+ - name: "bam_filter.rs"
+ purpose: "WASP filtering (replaces GATK AlleleCounter)"
+ speedup: "61x"
+ - name: "bam_intersect.rs"
+ purpose: "COITree interval trees for BAM-variant intersection"
+ speedup: "50-75x (15-30x documented)"
+ - name: "bam_remapper.rs"
+ purpose: "CIGAR-aware allele swapping for remapping"
+ - name: "analysis.rs"
+ purpose: "Beta-binomial statistical model"
+ speedup: "~10x"
+ performance_gains:
+ wasp_filtering: "61x (r² > 0.99 validation)"
+ bam_bed_intersect: "15-30x (coitrees vs pybedtools)"
+ allele_counting: "10-50x"
+ vcf_parsing: "7x (with cyvcf2)"
+ pgen_format: "25x (with Pgenlib)"
+ key_features:
+ - "Full INDEL support (variable-length alleles)"
+ - "Beta-binomial model (NOT CHT)"
+ - "Phased and unphased genotype support"
+ - "Single-cell scale processing"
+ - "Multi-format: VCF/BCF/PGEN auto-detection"
+ deployment:
+ nextflow_pipelines:
+ - "nf-rnaseq (bulk RNA-seq ASE)"
+ - "nf-atacseq (bulk ATAC-seq ASOC)"
+ - "nf-scatac (single-cell ATAC-seq)"
+ - "nf-outrider (outlier detection)"
+ containers:
+ - "Docker (ghcr.io/jaureguy760/wasp2-final)"
+ - "Singularity/Apptainer"
+ packages:
+ - "PyPI (pip install wasp2)"
+ - "Bioconda (conda install wasp2)"
+ chapters:
+ - name: "The Problem"
+ topics: ["performance bottlenecks", "pybedtools overhead", "GIL limitations"]
+ - name: "The Revolution"
+ topics: ["Rust language", "PyO3 integration", "surgical optimization"]
+ - name: "Foraging"
+ topics: ["bam_counter.rs", "bam_intersect.rs", "analysis.rs", "COITrees"]
+ - name: "Building"
+ topics: ["Python/Rust boundary", "invisible optimization"]
+ - name: "Deep Dive"
+ topics: ["benchmark numbers", "INDEL support", "new capabilities"]
+ - name: "Pollinating"
+ topics: ["Nextflow pipelines", "Docker", "Singularity", "PyPI"]
+```
diff --git a/docs/source/_static/podcast/enhance_audio.py b/docs/source/_static/podcast/enhance_audio.py
new file mode 100644
index 0000000..0720aab
--- /dev/null
+++ b/docs/source/_static/podcast/enhance_audio.py
@@ -0,0 +1,415 @@
+#!/usr/bin/env python3
+"""
+Audio post-processing pipeline for WASP's Nest podcast.
+
+Applies professional audio enhancement using ffmpeg:
+1. Noise reduction (afftdn filter)
+2. High-pass filter (remove rumble < 80Hz)
+3. Low-pass filter (remove hiss > 12kHz)
+4. Compression (reduce dynamic range)
+5. Loudness normalization (podcast standard: -16 LUFS)
+
+Requirements:
+ - ffmpeg with libavfilter (auto-detects static-ffmpeg if installed)
+
+Usage:
+ python enhance_audio.py # Enhance all episodes
+ python enhance_audio.py --episode 2 # Enhance specific episode
+ python enhance_audio.py --dry-run # Show commands without running
+ python enhance_audio.py --verbose # Verbose output
+"""
+
+from __future__ import annotations
+
+import argparse
+import logging
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+from collections.abc import Iterator
+from contextlib import contextmanager
+from pathlib import Path
+
+# Configure logging
+logging.basicConfig(
+ format="%(asctime)s [%(levelname)s] %(message)s",
+ datefmt="%H:%M:%S",
+)
+logger = logging.getLogger(__name__)
+
+SCRIPT_DIR = Path(__file__).parent
+AUDIO_DIR = SCRIPT_DIR / "audio"
+ENHANCED_DIR = SCRIPT_DIR / "audio_enhanced"
+
+# Processing timeout (10 minutes per file)
+PROCESS_TIMEOUT_SECONDS = 600
+
+
+class AudioEnhanceError(Exception):
+ """Raised when audio enhancement fails."""
+
+ pass
+
+
+def find_ffmpeg() -> str:
+ """
+ Find ffmpeg executable, trying multiple sources.
+
+ Returns:
+ Path to ffmpeg executable
+
+ Raises:
+ AudioEnhanceError: If ffmpeg is not found
+ """
+ # Try system ffmpeg first
+ ffmpeg_path = shutil.which("ffmpeg")
+ if ffmpeg_path:
+ logger.debug(f"Found system ffmpeg: {ffmpeg_path}")
+ return ffmpeg_path
+
+ # Try static-ffmpeg package
+ try:
+ import static_ffmpeg
+ except ImportError:
+ logger.debug("static-ffmpeg package not installed, trying other ffmpeg sources")
+ else:
+ # Package is installed - failure here is an error, not a fallback
+ try:
+ ffmpeg_path, _ = static_ffmpeg.run.get_or_fetch_platform_executables_else_raise()
+ logger.debug(f"Found static-ffmpeg: {ffmpeg_path}")
+ return ffmpeg_path
+ except Exception as e:
+ raise AudioEnhanceError(
+ f"static-ffmpeg is installed but failed: {e}\n"
+ "Try: pip uninstall static-ffmpeg && pip install static-ffmpeg"
+ )
+
+ # Try common installation paths
+ common_paths = [
+ "/usr/bin/ffmpeg",
+ "/usr/local/bin/ffmpeg",
+ os.path.expanduser("~/.local/bin/ffmpeg"),
+ ]
+ for path in common_paths:
+ if os.path.isfile(path) and os.access(path, os.X_OK):
+ logger.debug(f"Found ffmpeg at: {path}")
+ return path
+
+ raise AudioEnhanceError(
+ "ffmpeg not found. Install with:\n"
+ " pip install static-ffmpeg && python -c 'import static_ffmpeg; static_ffmpeg.add_paths()'\n"
+ " or: conda install -c conda-forge ffmpeg\n"
+ " or: apt-get install ffmpeg"
+ )
+
+
+def build_ffmpeg_filter(add_fades: bool = True) -> str:
+ """
+ Build the ffmpeg audio filter chain for podcast enhancement.
+
+ Filter chain:
+ 1. afade in - Smooth fade-in to avoid abrupt TTS start (0.5s)
+ 2. afftdn - FFT-based noise reduction (reduces steady background noise)
+ 3. highpass - Remove low-frequency rumble (< 80Hz)
+ 4. lowpass - Remove high-frequency hiss (> 12kHz)
+ 5. firequalizer - De-esser for sibilance reduction (4-8kHz)
+ 6. acompressor - Dynamic range compression (voice clarity)
+ 7. loudnorm - EBU R128 loudness normalization (-16 LUFS for podcasts)
+
+ Args:
+ add_fades: Whether to add fade-in effect (default True)
+
+ Returns:
+ str: Comma-separated ffmpeg audio filter chain string
+ """
+ filters = []
+
+ # Fade in: smooth start to avoid jarring TTS beginning
+ # t=in means fade type, d=0.5 is duration in seconds
+ if add_fades:
+ filters.append("afade=t=in:st=0:d=0.5")
+
+ filters.extend(
+ [
+ # Noise reduction: removes steady background noise
+ # nr=12 = noise reduction strength, nf=-25 = noise floor threshold in dB
+ "afftdn=nr=12:nf=-25",
+ # High-pass filter: remove rumble below 80Hz
+ # Human voice fundamentals start ~85Hz, so 80Hz cutoff is safe
+ "highpass=f=80",
+ # Low-pass filter: attenuate frequencies above 12kHz
+ # Preserves voice clarity while removing high-freq artifacts
+ "lowpass=f=12000",
+ # De-esser: reduce sibilance (harsh 's' sounds common in TTS)
+ # Targets 4-8kHz range where sibilance occurs
+ "firequalizer=gain_entry='entry(4000,-2);entry(6000,-4);entry(8000,-2)'",
+ # Dynamic range compression for consistent volume
+ # threshold=-20dB, ratio=4:1, attack=5ms, release=50ms
+ "acompressor=threshold=-20dB:ratio=4:attack=5:release=50",
+ # Loudness normalization to podcast standard
+ # -16 LUFS is the standard for podcasts (Spotify, Apple Podcasts)
+ # TP=-1.5 = true peak limit to prevent clipping
+ "loudnorm=I=-16:TP=-1.5:LRA=11",
+ ]
+ )
+
+ # Note: Fade-out requires knowing audio duration, so we apply it separately
+ # using areverse,afade,areverse trick if needed (computationally expensive)
+
+ return ",".join(filters)
+
+
+@contextmanager
+def temp_file_context(suffix: str = ".mp3") -> Iterator[Path]:
+ """Context manager for temporary file with guaranteed cleanup."""
+ fd, path = tempfile.mkstemp(suffix=suffix)
+ os.close(fd)
+ temp_path = Path(path)
+ try:
+ yield temp_path
+ finally:
+ if temp_path.exists():
+ try:
+ temp_path.unlink()
+ except OSError as e:
+ logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
+
+
+def validate_audio_file(path: Path) -> None:
+ """
+ Validate that a file is a valid audio file.
+
+ Raises:
+ AudioEnhanceError: If file is invalid
+ """
+ if not path.exists():
+ raise AudioEnhanceError(f"File not found: {path}")
+
+ if not path.is_file():
+ raise AudioEnhanceError(f"Not a file: {path}")
+
+ # Check file size (minimum 1KB for valid audio)
+ size = path.stat().st_size
+ if size < 1024:
+ raise AudioEnhanceError(f"File too small ({size} bytes), may be corrupted: {path}")
+
+ # Check file extension
+ if path.suffix.lower() not in {".mp3", ".wav", ".m4a", ".ogg", ".flac"}:
+ logger.warning(f"Unusual audio extension: {path.suffix}")
+
+
+def enhance_audio(
+ input_file: Path, output_file: Path, ffmpeg_path: str, dry_run: bool = False
+) -> Path:
+ """
+ Apply audio enhancement to a single file.
+
+ Args:
+ input_file: Path to input audio file
+ output_file: Path for enhanced output
+ ffmpeg_path: Path to ffmpeg executable
+ dry_run: If True, print command without executing
+
+ Returns:
+ Path to the enhanced audio file
+
+ Raises:
+ AudioEnhanceError: If enhancement fails
+ """
+ # Validate input
+ validate_audio_file(input_file)
+
+ filter_chain = build_ffmpeg_filter()
+
+ cmd = [
+ ffmpeg_path,
+ "-y", # Overwrite output
+ "-i",
+ str(input_file),
+ "-af",
+ filter_chain,
+ "-c:a",
+ "libmp3lame", # MP3 output
+ "-b:a",
+ "192k", # 192kbps bitrate
+ "-ar",
+ "44100", # 44.1kHz sample rate
+ str(output_file),
+ ]
+
+ logger.info(f"Processing: {input_file.name}")
+
+ if dry_run:
+ print(f" Command: {' '.join(cmd)}")
+ return output_file
+
+ try:
+ result = subprocess.run(
+ cmd, capture_output=True, text=True, check=True, timeout=PROCESS_TIMEOUT_SECONDS
+ )
+ logger.debug(f"ffmpeg stdout: {result.stdout}")
+ except subprocess.TimeoutExpired:
+ raise AudioEnhanceError(
+ f"ffmpeg timed out after {PROCESS_TIMEOUT_SECONDS}s for {input_file.name}"
+ )
+ except subprocess.CalledProcessError as e:
+ raise AudioEnhanceError(f"ffmpeg failed for {input_file.name}: {e.stderr}")
+
+ # Validate output was created and is valid
+ if not output_file.exists():
+ raise AudioEnhanceError(f"Output file was not created: {output_file}")
+
+ output_size = output_file.stat().st_size
+ input_size = input_file.stat().st_size
+
+ # Output should be reasonably sized (at least 10% of input)
+ if output_size < input_size * 0.1:
+ raise AudioEnhanceError(
+ f"Output file suspiciously small ({output_size} bytes vs "
+ f"{input_size} bytes input), enhancement may have failed"
+ )
+
+ logger.info(f" -> Enhanced: {output_file.name} ({output_size / 1024 / 1024:.1f} MB)")
+ return output_file
+
+
+def validate_episode_number(value: str) -> int:
+ """Validate episode number is a positive integer."""
+ try:
+ episode = int(value)
+ if episode < 1 or episode > 999:
+ raise argparse.ArgumentTypeError(
+ f"Episode number must be between 1 and 999, got {episode}"
+ )
+ return episode
+ except ValueError:
+ raise argparse.ArgumentTypeError(f"Episode must be a number, got '{value}'")
+
+
+def main() -> int:
+ """Main entry point. Returns exit code."""
+ parser = argparse.ArgumentParser(
+ description="Enhance podcast audio with noise reduction and normalization"
+ )
+ parser.add_argument(
+ "--episode",
+ type=validate_episode_number,
+ help="Enhance only specific episode number (1-999)",
+ )
+ parser.add_argument(
+ "--dry-run", action="store_true", help="Show ffmpeg commands without running"
+ )
+ parser.add_argument(
+ "--in-place",
+ action="store_true",
+ help="Overwrite original files instead of creating enhanced copies",
+ )
+ parser.add_argument("-v", "--verbose", action="store_true", help="Enable verbose output")
+ parser.add_argument("--debug", action="store_true", help="Enable debug output")
+ args = parser.parse_args()
+
+ # Configure logging level
+ if args.debug:
+ logger.setLevel(logging.DEBUG)
+ elif args.verbose:
+ logger.setLevel(logging.INFO)
+ else:
+ logger.setLevel(logging.WARNING)
+
+ # Find ffmpeg
+ try:
+ ffmpeg_path = find_ffmpeg()
+ except AudioEnhanceError as e:
+ print(f"Error: {e}", file=sys.stderr)
+ return 1
+
+ # Determine output directory
+ if args.in_place:
+ output_dir = AUDIO_DIR
+ else:
+ output_dir = ENHANCED_DIR
+ output_dir.mkdir(exist_ok=True)
+
+ # Find audio files
+ if args.episode:
+ pattern = f"episode-{args.episode:03d}-*.mp3"
+ audio_files = list(AUDIO_DIR.glob(pattern))
+ if not audio_files:
+ print(f"No audio file found matching: {pattern}", file=sys.stderr)
+ return 1
+ else:
+ audio_files = sorted(AUDIO_DIR.glob("episode-*.mp3"))
+
+ if not audio_files:
+ print(f"No audio files found in {AUDIO_DIR}", file=sys.stderr)
+ return 1
+
+ print(f"Found {len(audio_files)} audio file(s)")
+ print(f"Output: {output_dir}")
+ print(f"ffmpeg: {ffmpeg_path}")
+ print("-" * 40)
+
+ errors = []
+ for audio_file in audio_files:
+ try:
+ if args.in_place:
+ # Create temp file, enhance, then replace original
+ # Use manual temp file management to preserve on move failure
+ fd, temp_path_str = tempfile.mkstemp(suffix=".mp3")
+ os.close(fd)
+ temp_file = Path(temp_path_str)
+ try:
+ enhance_audio(audio_file, temp_file, ffmpeg_path, args.dry_run)
+ if not args.dry_run:
+ try:
+ shutil.move(str(temp_file), str(audio_file))
+ except Exception as e:
+ # Keep enhanced file for recovery
+ backup_path = audio_file.with_suffix(".enhanced.mp3")
+ shutil.copy(str(temp_file), str(backup_path))
+ raise AudioEnhanceError(
+ f"Failed to replace original: {e}. "
+ f"Enhanced version saved to: {backup_path}"
+ )
+ finally:
+ # Only cleanup if file still exists (wasn't moved)
+ if temp_file.exists():
+ try:
+ temp_file.unlink()
+ except OSError:
+ pass
+ else:
+ output_file = output_dir / audio_file.name
+ enhance_audio(audio_file, output_file, ffmpeg_path, args.dry_run)
+ except AudioEnhanceError as e:
+ logger.error(str(e))
+ errors.append((audio_file.name, str(e)))
+ except Exception as e:
+ logger.exception(f"Unexpected error processing {audio_file.name}")
+ errors.append((audio_file.name, str(e)))
+
+ print("-" * 40)
+
+ if errors:
+ print(f"Completed with {len(errors)} error(s):")
+ for name, error in errors:
+ print(f" - {name}: {error}")
+ return 1
+
+ print("Done! Enhanced audio files in:", output_dir)
+ print()
+ print("Enhancement applied:")
+ print(" - Fade-in (0.5s smooth start)")
+ print(" - Noise reduction (afftdn)")
+ print(" - High-pass filter (80Hz)")
+ print(" - Low-pass filter (12kHz)")
+ print(" - De-esser (sibilance reduction)")
+ print(" - Dynamic compression (4:1 ratio)")
+ print(" - Loudness normalization (-16 LUFS)")
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/docs/source/_static/podcast/generate_audio.py b/docs/source/_static/podcast/generate_audio.py
new file mode 100644
index 0000000..afb4523
--- /dev/null
+++ b/docs/source/_static/podcast/generate_audio.py
@@ -0,0 +1,474 @@
+#!/usr/bin/env python3
+"""
+Generate audio for WASP's Nest podcast episodes.
+
+Supports two TTS backends:
+1. ElevenLabs API (premium quality, requires API key)
+2. edge-tts (free fallback)
+
+Usage:
+ # With ElevenLabs (set ELEVEN_API_KEY environment variable)
+ python generate_audio.py --engine elevenlabs
+
+ # With edge-tts (free, default)
+ python generate_audio.py --engine edge-tts
+
+ # Regenerate specific episode
+ python generate_audio.py --episode 2
+
+ # Verbose output
+ python generate_audio.py --verbose
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import functools
+import logging
+import os
+import re
+import shutil
+import sys
+import time
+from collections.abc import Iterator
+from pathlib import Path
+
+# Configure logging
+logging.basicConfig(
+ format="%(asctime)s [%(levelname)s] %(message)s",
+ datefmt="%H:%M:%S",
+)
+logger = logging.getLogger(__name__)
+
+# Directories
+SCRIPT_DIR = Path(__file__).parent
+CHRONICLES_DIR = SCRIPT_DIR / "chronicles"
+AUDIO_DIR = SCRIPT_DIR / "audio"
+
+# ElevenLabs has a 5000 character limit per request
+ELEVENLABS_CHAR_LIMIT = 5000
+
+# Timeout for TTS operations (5 minutes)
+TTS_TIMEOUT_SECONDS = 300
+
+
+class AudioGenerationError(Exception):
+ """Raised when audio generation fails."""
+
+ pass
+
+
+def clean_markdown(text: str) -> str:
+ """Convert markdown to speakable text optimized for TTS."""
+ # Remove YAML front matter
+ text = re.sub(r"^---.*?---\s*", "", text, flags=re.DOTALL)
+
+ # Remove code blocks
+ text = re.sub(r"```[\s\S]*?```", "", text)
+
+ # Remove inline code
+ text = re.sub(r"`[^`]+`", "", text)
+
+ # Remove markdown headers but keep text
+ text = re.sub(r"^#{1,6}\s*", "", text, flags=re.MULTILINE)
+
+ # Remove bold/italic markers
+ text = re.sub(r"\*\*([^*]+)\*\*", r"\1", text)
+ text = re.sub(r"\*([^*]+)\*", r"\1", text)
+
+ # Remove links but keep text
+ text = re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text)
+
+ # Remove tables
+ text = re.sub(r"\|.*\|", "", text)
+
+ # Remove horizontal rules
+ text = re.sub(r"^---+$", "", text, flags=re.MULTILINE)
+
+ # Remove episode metadata section
+ text = re.sub(r"## Episode Metadata[\s\S]*$", "", text)
+
+ # Remove illumination references
+ text = re.sub(r"See:.*?\.md.*", "", text)
+
+ # Clean up whitespace
+ text = re.sub(r"\n{3,}", "\n\n", text)
+ text = re.sub(r"[ \t]+", " ", text)
+
+ return text.strip()
+
+
+def chunk_text(text: str, max_chars: int = ELEVENLABS_CHAR_LIMIT) -> Iterator[str]:
+ """
+ Split text into chunks that fit within character limits.
+
+ Splits on sentence boundaries to avoid cutting words.
+ """
+ if len(text) <= max_chars:
+ yield text
+ return
+
+ # Split on sentence boundaries
+ sentences = re.split(r"(?<=[.!?])\s+", text)
+ current_chunk = ""
+
+ for sentence in sentences:
+ if len(current_chunk) + len(sentence) + 1 <= max_chars:
+ current_chunk = f"{current_chunk} {sentence}".strip()
+ else:
+ if current_chunk:
+ yield current_chunk
+ # Handle sentences longer than max_chars
+ if len(sentence) > max_chars:
+ # Split on word boundaries as fallback
+ words = sentence.split()
+ current_chunk = ""
+ for word in words:
+ if len(current_chunk) + len(word) + 1 <= max_chars:
+ current_chunk = f"{current_chunk} {word}".strip()
+ else:
+ if current_chunk:
+ yield current_chunk
+ current_chunk = word
+ else:
+ current_chunk = sentence
+
+ if current_chunk:
+ yield current_chunk
+
+
+# Exceptions worth retrying (transient network/server issues)
+RETRYABLE_EXCEPTIONS = (ConnectionError, TimeoutError, OSError)
+
+
+def retry_with_backoff(max_retries: int = 3, base_delay: float = 1.0):
+ """
+ Decorator for retrying functions with exponential backoff.
+
+ Only retries on transient errors (ConnectionError, TimeoutError, OSError).
+ Non-retryable errors (ValueError, AuthenticationError, etc.) fail immediately.
+ """
+
+ def decorator(func):
+ @functools.wraps(func)
+ def wrapper(*args, **kwargs):
+ last_exception = None
+ for attempt in range(max_retries):
+ try:
+ return func(*args, **kwargs)
+ except RETRYABLE_EXCEPTIONS as e:
+ last_exception = e
+ if attempt < max_retries - 1:
+ delay = base_delay * (2**attempt)
+ logger.warning(
+ f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s..."
+ )
+ time.sleep(delay)
+ except Exception:
+ # Non-retryable error - fail immediately
+ raise
+ raise last_exception
+
+ return wrapper
+
+ return decorator
+
+
+async def generate_with_edge_tts(text: str, output_file: Path) -> None:
+ """Generate audio using edge-tts (free Microsoft TTS)."""
+ try:
+ import edge_tts
+ except ImportError:
+ raise AudioGenerationError("edge-tts not installed. Install with: pip install edge-tts")
+
+ # Voice configuration for Queen Bee character
+ voice = "en-US-JennyNeural"
+ rate = "-5%" # Slightly slower for clarity
+ pitch = "+2Hz" # Slightly higher for Queen Bee character
+
+ try:
+ communicate = edge_tts.Communicate(text, voice, rate=rate, pitch=pitch)
+ await asyncio.wait_for(communicate.save(str(output_file)), timeout=TTS_TIMEOUT_SECONDS)
+ except asyncio.TimeoutError:
+ raise AudioGenerationError(f"edge-tts timed out after {TTS_TIMEOUT_SECONDS}s")
+ except Exception as e:
+ raise AudioGenerationError(f"edge-tts failed: {e}")
+
+
+@retry_with_backoff(max_retries=3)
+def generate_with_elevenlabs(text: str, output_file: Path) -> None:
+ """Generate audio using ElevenLabs API (premium quality)."""
+ try:
+ from elevenlabs import save
+ from elevenlabs.client import ElevenLabs
+ except ImportError:
+ raise AudioGenerationError("elevenlabs not installed. Install with: pip install elevenlabs")
+
+ api_key = os.environ.get("ELEVEN_API_KEY")
+ if not api_key:
+ raise AudioGenerationError(
+ "ELEVEN_API_KEY environment variable not set. "
+ "Get your API key from https://elevenlabs.io/app/settings/api-keys"
+ )
+
+ client = ElevenLabs(api_key=api_key)
+
+ # Use a warm, professional voice for the Queen Bee character
+ voice_id = os.environ.get("ELEVEN_VOICE_ID", "21m00Tcm4TlvDq8ikWAM")
+
+ # Handle text chunking for long content
+ chunks = list(chunk_text(text, ELEVENLABS_CHAR_LIMIT))
+
+ if len(chunks) == 1:
+ # Single chunk - straightforward
+ audio = client.text_to_speech.convert(
+ voice_id=voice_id,
+ text=text,
+ model_id="eleven_multilingual_v2",
+ output_format="mp3_44100_128",
+ voice_settings={
+ "stability": 0.5,
+ "similarity_boost": 0.75,
+ },
+ )
+ save(audio, str(output_file))
+ else:
+ # Multiple chunks - generate and concatenate
+ logger.info(f"Text split into {len(chunks)} chunks")
+ temp_files = []
+ try:
+ for i, chunk in enumerate(chunks):
+ logger.debug(f"Processing chunk {i + 1}/{len(chunks)}")
+ temp_file = output_file.with_suffix(f".part{i}.mp3")
+ audio = client.text_to_speech.convert(
+ voice_id=voice_id,
+ text=chunk,
+ model_id="eleven_multilingual_v2",
+ output_format="mp3_44100_128",
+ voice_settings={
+ "stability": 0.5,
+ "similarity_boost": 0.75,
+ },
+ )
+ save(audio, str(temp_file))
+ temp_files.append(temp_file)
+
+ # Concatenate using pydub or ffmpeg
+ _concatenate_audio_files(temp_files, output_file)
+ finally:
+ # Cleanup temp files
+ for temp_file in temp_files:
+ if temp_file.exists():
+ temp_file.unlink()
+
+
+def _concatenate_audio_files(input_files: list[Path], output_file: Path) -> None:
+ """
+ Concatenate multiple audio files into one.
+
+ Attempts pydub first (re-encodes at 128kbps), falls back to ffmpeg
+ concat filter (stream copy, no re-encoding) if pydub unavailable.
+ """
+ try:
+ from pydub import AudioSegment
+
+ combined = AudioSegment.empty()
+ for f in input_files:
+ combined += AudioSegment.from_mp3(str(f))
+ combined.export(str(output_file), format="mp3", bitrate="128k")
+ except ImportError:
+ # Fallback to ffmpeg
+ import subprocess
+
+ # Try to find ffmpeg
+ ffmpeg_cmd = shutil.which("ffmpeg")
+ if not ffmpeg_cmd:
+ # Try static-ffmpeg package
+ try:
+ import static_ffmpeg
+ except ImportError:
+ pass # Package not installed - acceptable
+ else:
+ try:
+ static_ffmpeg.add_paths()
+ ffmpeg_cmd = shutil.which("ffmpeg")
+ except Exception as e:
+ logger.warning(f"static_ffmpeg.add_paths() failed: {e}")
+
+ if not ffmpeg_cmd:
+ raise AudioGenerationError(
+ "ffmpeg not found for audio concatenation. Install with:\n"
+ " pip install pydub (preferred)\n"
+ " pip install static-ffmpeg\n"
+ " conda install -c conda-forge ffmpeg"
+ )
+
+ # Create concat file list
+ list_file = output_file.with_suffix(".txt")
+ with open(list_file, "w") as f:
+ for input_file in input_files:
+ f.write(f"file '{input_file}'\n")
+
+ try:
+ result = subprocess.run(
+ [
+ ffmpeg_cmd,
+ "-y",
+ "-f",
+ "concat",
+ "-safe",
+ "0",
+ "-i",
+ str(list_file),
+ "-c",
+ "copy",
+ str(output_file),
+ ],
+ capture_output=True,
+ text=True,
+ check=True,
+ )
+ except subprocess.CalledProcessError as e:
+ stderr = e.stderr if e.stderr else "Unknown error"
+ raise AudioGenerationError(f"Failed to concatenate audio: {stderr}")
+ finally:
+ if list_file.exists():
+ list_file.unlink()
+
+
+async def generate_episode_audio(
+ episode_file: Path, output_file: Path, engine: str = "edge-tts"
+) -> Path:
+ """Generate audio for a single episode."""
+ logger.info(f"Processing: {episode_file.name}")
+ logger.info(f" Engine: {engine}")
+
+ # Validate input file exists
+ if not episode_file.exists():
+ raise AudioGenerationError(f"Episode file not found: {episode_file}")
+
+ # Read and clean the markdown
+ try:
+ content = episode_file.read_text(encoding="utf-8")
+ except (OSError, UnicodeDecodeError) as e:
+ raise AudioGenerationError(f"Failed to read {episode_file}: {e}")
+
+ text = clean_markdown(content)
+
+ # Validate text is not empty
+ if not text or len(text.strip()) < 10:
+ raise AudioGenerationError(
+ f"Episode {episode_file.name} has no speakable content after cleaning"
+ )
+
+ logger.debug(f" Text length: {len(text)} characters")
+
+ # Generate audio based on engine choice
+ if engine == "elevenlabs":
+ generate_with_elevenlabs(text, output_file)
+ else:
+ await generate_with_edge_tts(text, output_file)
+
+ # Validate output was created
+ if not output_file.exists():
+ raise AudioGenerationError(f"Output file was not created: {output_file}")
+
+ file_size = output_file.stat().st_size
+ if file_size < 1000: # Less than 1KB is suspicious
+ raise AudioGenerationError(
+ f"Output file is too small ({file_size} bytes), generation may have failed"
+ )
+
+ logger.info(f" -> Saved: {output_file.name} ({file_size / 1024:.1f} KB)")
+ return output_file
+
+
+def validate_episode_number(value: str) -> int:
+ """Validate episode number is a positive integer."""
+ try:
+ episode = int(value)
+ if episode < 1 or episode > 999:
+ raise argparse.ArgumentTypeError(
+ f"Episode number must be between 1 and 999, got {episode}"
+ )
+ return episode
+ except ValueError:
+ raise argparse.ArgumentTypeError(f"Episode must be a number, got '{value}'")
+
+
+async def main() -> int:
+ """Main entry point. Returns exit code."""
+ parser = argparse.ArgumentParser(description="Generate podcast audio from episode scripts")
+ parser.add_argument(
+ "--engine",
+ choices=["edge-tts", "elevenlabs"],
+ default="edge-tts",
+ help="TTS engine to use (default: edge-tts)",
+ )
+ parser.add_argument(
+ "--episode",
+ type=validate_episode_number,
+ help="Generate only specific episode number (1-999)",
+ )
+ parser.add_argument("-v", "--verbose", action="store_true", help="Enable verbose output")
+ parser.add_argument("--debug", action="store_true", help="Enable debug output")
+ args = parser.parse_args()
+
+ # Configure logging level
+ if args.debug:
+ logger.setLevel(logging.DEBUG)
+ elif args.verbose:
+ logger.setLevel(logging.INFO)
+ else:
+ logger.setLevel(logging.WARNING)
+
+ # Ensure output directory exists
+ AUDIO_DIR.mkdir(exist_ok=True)
+
+ # Find episode files
+ if args.episode:
+ pattern = f"episode-{args.episode:03d}-*.md"
+ episodes = list(CHRONICLES_DIR.glob(pattern))
+ if not episodes:
+ logger.error(f"No episode file found matching: {pattern}")
+ return 1
+ else:
+ episodes = sorted(CHRONICLES_DIR.glob("episode-*.md"))
+
+ if not episodes:
+ logger.error("No episode files found in %s", CHRONICLES_DIR)
+ return 1
+
+ print(f"Found {len(episodes)} episode(s)")
+ print(f"Engine: {args.engine}")
+ print("-" * 40)
+
+ errors = []
+ for episode_file in episodes:
+ output_name = episode_file.stem + ".mp3"
+ output_file = AUDIO_DIR / output_name
+
+ try:
+ await generate_episode_audio(episode_file, output_file, args.engine)
+ except AudioGenerationError as e:
+ logger.error(f"Failed to generate {episode_file.name}: {e}")
+ errors.append((episode_file.name, str(e)))
+ except Exception as e:
+ logger.exception(f"Unexpected error processing {episode_file.name}")
+ errors.append((episode_file.name, str(e)))
+
+ print("-" * 40)
+
+ if errors:
+ print(f"Completed with {len(errors)} error(s):")
+ for name, error in errors:
+ print(f" - {name}: {error}")
+ return 1
+
+ print("Done! Audio files generated in:", AUDIO_DIR)
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(asyncio.run(main()))
diff --git a/docs/source/_static/podcast/illuminations/.gitkeep b/docs/source/_static/podcast/illuminations/.gitkeep
new file mode 100644
index 0000000..e69de29
diff --git a/docs/source/_static/podcast/illuminations/README.md b/docs/source/_static/podcast/illuminations/README.md
new file mode 100644
index 0000000..ce896e2
--- /dev/null
+++ b/docs/source/_static/podcast/illuminations/README.md
@@ -0,0 +1,52 @@
+# Illuminations - Visual Diagrams
+
+This directory contains Mermaid diagrams and visual aids for podcast episodes.
+
+## Purpose
+
+Illuminations are visual companions to Buzz Reports, helping illustrate:
+- Architecture changes
+- Feature workflows
+- Data flow diagrams
+- Version comparisons
+
+## File Naming Convention
+
+```
+illumination-{episode_number}-{topic}.md
+```
+
+Example: `illumination-001-new-counting-module.md`
+
+## Template
+
+```markdown
+# Illumination: [Topic]
+# Episode: [NUMBER]
+
+## Overview Diagram
+
+\`\`\`mermaid
+graph TD
+ A[Input] --> B[Process]
+ B --> C[Output]
+\`\`\`
+
+## Detailed Flow
+
+\`\`\`mermaid
+sequenceDiagram
+ participant User
+ participant WASP2
+ participant Output
+ User->>WASP2: Run analysis
+ WASP2->>Output: Generate results
+\`\`\`
+```
+
+## Rendering
+
+Diagrams can be rendered using:
+- Mermaid CLI: `mmdc -i input.md -o output.png`
+- GitHub's built-in Mermaid support
+- VS Code Mermaid extensions
diff --git a/docs/source/_static/podcast/illuminations/illumination-001-wasp-mapping.md b/docs/source/_static/podcast/illuminations/illumination-001-wasp-mapping.md
new file mode 100644
index 0000000..6d5fd66
--- /dev/null
+++ b/docs/source/_static/podcast/illuminations/illumination-001-wasp-mapping.md
@@ -0,0 +1,126 @@
+# Illumination: WASP Mapping Bias Correction
+# Episode: 001 - The Origin Swarm
+
+## The Problem: Mapping Bias
+
+When reads contain genetic variants, they may map differently depending on which allele they carry.
+
+```mermaid
+graph TD
+ subgraph "The Bias Problem"
+ R1["Read with REF allele