Skip to content

Comments

Refactor AudioSegmentProcessor: reduce complexity in findSpeechStart#87

Open
ysdede wants to merge 3 commits intostagingfrom
refactor-audiosegmentprocessor-6181472373587751073
Open

Refactor AudioSegmentProcessor: reduce complexity in findSpeechStart#87
ysdede wants to merge 3 commits intostagingfrom
refactor-audiosegmentprocessor-6181472373587751073

Conversation

@ysdede
Copy link
Owner

@ysdede ysdede commented Feb 9, 2026

Refactored findSpeechStart in src/lib/audio/AudioSegmentProcessor.ts to reduce complexity.
Verified with reproduction test src/lib/audio/AudioSegmentProcessor_repro.test.ts (deleted after verification) and full test suite.


PR created automatically by Jules for task 6181472373587751073 started by @ysdede

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced speech start detection in audio processing with improved multi-step analysis for more accurate and reliable identification of speech onset in audio recordings.
  • Refactor

    • Reorganized audio detection logic with improved helper functions and enhanced logging for better code structure and maintainability during segment processing.

… methods

This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior.

- Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk.
- Extracted `findRisingEnergyStart` to handle rising energy trend analysis.
- Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks.
- Updated `findSpeechStart` to compose these helpers.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@ysdede
Copy link
Owner Author

ysdede commented Feb 9, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

… methods

This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior.

- Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk.
- Extracted `findRisingEnergyStart` to handle rising energy trend analysis.
- Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks.
- Updated `findSpeechStart` to compose these helpers.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/lib/audio/AudioSegmentProcessor.ts`:
- Around line 447-458: In findSnrCrossingStart, handle the edge-case where the
chunk at startIndex is already below minSnrThreshold: instead of returning
Math.min(chunks.length - 1, i + 1) blindly, add an explicit branch so that if i
=== startIndex you return startIndex (or otherwise clamp so you never return an
index past the last valid speech chunk); keep the existing behavior for earlier
i (return i+1) and add a short comment explaining this special-case to prevent
regressions—this involves editing the findSnrCrossingStart function and touching
the use of chunks, i, startIndex and minSnrThreshold.
- Around line 392-400: findLastSpeechChunkIndex currently returns 0 to indicate
"not found", which conflicts with a real match at index 0; change
findLastSpeechChunkIndex to return -1 when no speech chunk is found and update
findSpeechStart to handle -1 explicitly (e.g., treat as "no speech" and avoid
using recentChunks[0]). Also search for any other callers (e.g.,
processAudioData / findSpeechStart invocation sites) and ensure they properly
handle the -1 sentinel (or switch to returning null if you prefer), adding a
brief comment in both functions documenting the sentinel semantics.
- Around line 373-375: The variable firstSpeechIndex is misnamed because
findLastSpeechChunkIndex() returns the most recent (last) speech chunk; rename
the variable to lastSpeechIndex and update any usages accordingly (e.g., where
passed into findRisingEnergyStart) and also rename the helper parameter
startIndex to lastIndex (or similar) in methods like findRisingEnergyStart and
any other helper functions that accept that value so names reflect "last/recent"
semantics; update all call sites and JSDoc/comments referencing
firstSpeechIndex/startIndex to use the new names (findLastSpeechChunkIndex,
lastSpeechIndex, lastIndex).
📜 Review details

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7f96b9 and 9ea5f07.

📒 Files selected for processing (1)
  • src/lib/audio/AudioSegmentProcessor.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/lib/audio/AudioSegmentProcessor.ts (1)
src/lib/audio/audioParams.ts (1)
  • minSnrThreshold (133-133)
🔇 Additional comments (2)
src/lib/audio/AudioSegmentProcessor.ts (2)

372-387: Clean decomposition of findSpeechStart — nice refactor.

The three-step cascade (rising energy → SNR crossing → default lookback) is clear and easy to follow. Good separation of concerns.

One note: since findSpeechStart can no longer return -1 (minimum is 0 via Math.max), the guard at line 217 (realStartIndex !== -1) is now dead code. Consider updating the caller to remove that check for consistency.


405-442: findRisingEnergyStart logic looks correct.

The backward scan, energy-rise detection, SNR floor guard, and ~500 ms lookback cap are all well-structured. Logging only fires when a trend is actually found, which is good.

One minor observation: the guard i < chunks.length - 1 at line 414 is always true in the current call context (since i <= startIndex - 1 and startIndex <= chunks.length - 1). It's harmless as a safety check but could be removed or commented to reduce reader confusion.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +373 to +375
const firstSpeechIndex = this.findLastSpeechChunkIndex();

const risingEnergyIndex = this.findRisingEnergyStart(firstSpeechIndex);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Misleading variable name: firstSpeechIndex holds the last (most recent) speech chunk index.

findLastSpeechChunkIndex() scans backwards and returns the most recent speech chunk, yet its result is stored in firstSpeechIndex. This is confusing and inverts the semantic meaning. Rename to lastSpeechIndex (and correspondingly the startIndex parameter in the helpers) to match what findLastSpeechChunkIndex actually returns.

✏️ Suggested rename
-        const firstSpeechIndex = this.findLastSpeechChunkIndex();
+        const lastSpeechIndex = this.findLastSpeechChunkIndex();

-        const risingEnergyIndex = this.findRisingEnergyStart(firstSpeechIndex);
+        const risingEnergyIndex = this.findRisingEnergyStart(lastSpeechIndex);
         if (risingEnergyIndex !== null) {
             return risingEnergyIndex;
         }

-        const snrCrossingIndex = this.findSnrCrossingStart(firstSpeechIndex);
+        const snrCrossingIndex = this.findSnrCrossingStart(lastSpeechIndex);
         if (snrCrossingIndex !== null) {
             return snrCrossingIndex;
         }

         // Default lookback
-        return Math.max(0, firstSpeechIndex - 4);
+        return Math.max(0, lastSpeechIndex - 4);
🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 373 - 375, The variable
firstSpeechIndex is misnamed because findLastSpeechChunkIndex() returns the most
recent (last) speech chunk; rename the variable to lastSpeechIndex and update
any usages accordingly (e.g., where passed into findRisingEnergyStart) and also
rename the helper parameter startIndex to lastIndex (or similar) in methods like
findRisingEnergyStart and any other helper functions that accept that value so
names reflect "last/recent" semantics; update all call sites and JSDoc/comments
referencing firstSpeechIndex/startIndex to use the new names
(findLastSpeechChunkIndex, lastSpeechIndex, lastIndex).

Comment on lines +392 to +400
private findLastSpeechChunkIndex(): number {
const chunks = this.state.recentChunks;
for (let i = chunks.length - 1; i >= 0; i--) {
if (chunks[i].isSpeech) {
firstSpeechIndex = i;
break;
return i;
}
}
return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

findLastSpeechChunkIndex returns 0 as a "not found" sentinel — indistinguishable from a real match at index 0.

Returning 0 when no speech chunk exists conflates "the oldest chunk in the buffer is speech" with "no speech was found." Currently this is safe because processAudioData pushes a speech chunk into recentChunks (line 182-187) before calling findSpeechStart, so the search always succeeds. However, this is fragile — any future caller or reordering would silently degrade to using a stale chunk at index 0.

Consider returning -1 (or null) for "not found" and handling it explicitly in findSpeechStart, or at minimum adding a comment explaining why 0 is acceptable here.

♻️ Safer sentinel approach
     private findLastSpeechChunkIndex(): number {
         const chunks = this.state.recentChunks;
         for (let i = chunks.length - 1; i >= 0; i--) {
             if (chunks[i].isSpeech) {
                 return i;
             }
         }
-        return 0;
+        return -1;
     }

Then in findSpeechStart:

     private findSpeechStart(): number {
         const lastSpeechIndex = this.findLastSpeechChunkIndex();
+        if (lastSpeechIndex < 0) {
+            return Math.max(0, this.state.recentChunks.length - 1);
+        }

         const risingEnergyIndex = this.findRisingEnergyStart(lastSpeechIndex);
🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 392 - 400,
findLastSpeechChunkIndex currently returns 0 to indicate "not found", which
conflicts with a real match at index 0; change findLastSpeechChunkIndex to
return -1 when no speech chunk is found and update findSpeechStart to handle -1
explicitly (e.g., treat as "no speech" and avoid using recentChunks[0]). Also
search for any other callers (e.g., processAudioData / findSpeechStart
invocation sites) and ensure they properly handle the -1 sentinel (or switch to
returning null if you prefer), adding a brief comment in both functions
documenting the sentinel semantics.

Comment on lines +447 to 458
private findSnrCrossingStart(startIndex: number): number | null {
const chunks = this.state.recentChunks;
const minSnrThreshold = this.options.minSnrThreshold;

for (let i = startIndex; i >= 0; i--) {
if (chunks[i].snr < minSnrThreshold) {
return Math.min(chunks.length - 1, i + 1);
}
}

// Default lookback
return Math.max(0, firstSpeechIndex - 4);
return null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

findSnrCrossingStart — verify intent when the chunk at startIndex is below threshold.

If chunks[startIndex].snr < minSnrThreshold, the method returns Math.min(chunks.length - 1, startIndex + 1). In the current call flow startIndex equals chunks.length - 1, so Math.min clamps the result back to startIndex — effectively a no-op. This is fine today.

However, if startIndex were ever less than chunks.length - 1, the function would return an index after the last speech chunk, which is semantically backwards for a "find start" helper. Consider adding an explicit early return or comment clarifying this edge-case behaviour to prevent subtle regressions if the callers evolve.

🤖 Prompt for AI Agents
In `@src/lib/audio/AudioSegmentProcessor.ts` around lines 447 - 458, In
findSnrCrossingStart, handle the edge-case where the chunk at startIndex is
already below minSnrThreshold: instead of returning Math.min(chunks.length - 1,
i + 1) blindly, add an explicit branch so that if i === startIndex you return
startIndex (or otherwise clamp so you never return an index past the last valid
speech chunk); keep the existing behavior for earlier i (return i+1) and add a
short comment explaining this special-case to prevent regressions—this involves
editing the findSnrCrossingStart function and touching the use of chunks, i,
startIndex and minSnrThreshold.

@ysdede ysdede changed the base branch from release/r1 to staging February 10, 2026 21:00
@ysdede
Copy link
Owner Author

ysdede commented Feb 10, 2026

Review Summary (Post r1→master merge)

Status: ✅ MERGEABLE

Changes: Decomposes the complex \ indSpeechStart()\ method in \AudioSegmentProcessor\ into three focused helpers:

  • \ indLastSpeechChunkIndex()\ — finds most recent speech chunk in history
  • \ indRisingEnergyStart(startIndex)\ — looks back for rising energy trend onset
  • \ indSnrCrossingStart(startIndex)\ — finds where SNR drops below threshold

Each returns
umber | null\ with the main \ indSpeechStart()\ orchestrating the fallback chain.

Assessment: Clean extract-method refactor. The original was a single 60-line method with mixed concerns. Now each strategy (energy trend, SNR crossing, default lookback) is isolated and testable.

Note: Complements PR #85 (which refactors \processAudioData). They touch different methods in the same file, so merge #85 first, then #87.

Recommendation:Merge (after #85) — Completes the AudioSegmentProcessor decomposition

@google-labs-jules
Copy link
Contributor

Review Summary (Post r1→master merge)

Status: ✅ MERGEABLE

Changes: Decomposes the complex \ indSpeechStart()\ method in \AudioSegmentProcessor\ into three focused helpers:

  • \ indLastSpeechChunkIndex()\ — finds most recent speech chunk in history
  • \ indRisingEnergyStart(startIndex)\ — looks back for rising energy trend onset
  • \ indSnrCrossingStart(startIndex)\ — finds where SNR drops below threshold

Each returns
umber | null\ with the main \ indSpeechStart()\ orchestrating the fallback chain.

Assessment: Clean extract-method refactor. The original was a single 60-line method with mixed concerns. Now each strategy (energy trend, SNR crossing, default lookback) is isolated and testable.

Note: Complements PR #85 (which refactors \processAudioData). They touch different methods in the same file, so merge #85 first, then #87.

Recommendation:Merge (after #85) — Completes the AudioSegmentProcessor decomposition

Acknowledged.

… methods

This change extracts the complex heuristic logic in `findSpeechStart` into three private helper methods: `findLastSpeechChunkIndex`, `findRisingEnergyStart`, and `findSnrCrossingStart`. This improves readability and maintainability of the speech onset detection logic without altering its behavior.

- Extracted `findLastSpeechChunkIndex` to locate the most recent speech chunk.
- Extracted `findRisingEnergyStart` to handle rising energy trend analysis.
- Extracted `findSnrCrossingStart` to handle SNR threshold crossing checks.
- Updated `findSpeechStart` to compose these helpers.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Repository owner deleted a comment from google-labs-jules bot Feb 17, 2026
Repository owner deleted a comment from coderabbitai bot Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant