(feat): add support for tts fallback for breeze buddy #522

narsimhaReddyJuspay · 2026-02-03T07:39:40Z

added tts fallback for eleven labs and cartesia
added slack alerts after triggering tts fallback
control fallback with feature flags

Summary by CodeRabbit

New Features
- Added automatic TTS service fallback when the primary provider encounters failures
- Implemented background health monitoring for TTS services with failure detection
- Added configurable failure thresholds and timeout settings for TTS operations
Chores
- Extended dynamic configuration system to support TTS fallback management

coderabbitai · 2026-02-03T07:39:54Z

Walkthrough

This PR introduces a TTS health observer system that monitors service health, detects provider failures through provider-aware detection strategies, and automatically triggers fallback to alternate providers via Redis state updates and Slack alerts. The system integrates an asynchronous health-check loop into the Breeze Buddy pipeline and adds dynamic configuration accessors for fallback thresholds and timeout settings.

Changes

Cohort / File(s)	Summary
TTS Health Observer Module `app/ai/voice/tts/tts_health_observer.py`	New module implementing stateful TTS health monitoring with provider-aware failure detection for ElevenLabs (audio bytes) and Cartesia (ErrorFrame), per-provider failure counters, automatic fallback orchestration via Redis/Slack, timeout checks, and background health-check loop.
Agent Integration `app/ai/voice/agents/breeze_buddy/agent/__init__.py`, `app/ai/voice/agents/breeze_buddy/agent/pipeline.py`	Starts asynchronous TTS health-check loop at pipeline initialization with named task and cleanup on termination; always prepares and includes TTS health observer in observer list for both dev and non-dev environments.
TTS Module Re-exports `app/ai/voice/tts/__init__.py`, `app/ai/voice/agents/breeze_buddy/tts/__init__.py`	Main TTS module re-exports health observer components; Breeze Buddy TTS module adds fallback logic to resolve actual provider via swap map when configured provider fails, applying across voice-specific and default service selection paths.
Dynamic Configuration `app/core/config/dynamic.py`	Added four async accessors: `BB_TTS_FALLBACK_ENABLED()`, `BB_TTS_FAILURE_THRESHOLD()`, `BB_TTS_AUDIO_TIMEOUT_SECONDS()`, and `BB_TTS_FAILED_PROVIDER()` for runtime fallback configuration.
Config Store `app/services/live_config/store.py`	Added `set_config(key, value)` async function to update single runtime config values in Redis with logging and failure handling.

Sequence Diagram

sequenceDiagram
    participant Pipeline as Breeze Buddy Pipeline
    participant Observer as TTS Health Observer
    participant Config as Dynamic Config
    participant TTS as TTS Service
    participant Redis as Redis Store
    participant Slack as Slack Alerts

    Pipeline->>Observer: start_tts_health_check_loop() on init
    Observer->>Observer: Begin async monitoring task

    loop TTS Request Cycle
        TTS->>Observer: TTSStartedFrame (with provider)
        TTS->>Observer: TTSAudioRawFrame (audio bytes) or ErrorFrame
        TTS->>Observer: TTSStoppedFrame
        Observer->>Observer: Track failure count per provider
    end

    loop Health Check Interval
        Observer->>Config: Check BB_TTS_FAILURE_THRESHOLD
        Observer->>Config: Check BB_TTS_AUDIO_TIMEOUT_SECONDS
        Observer->>Observer: Evaluate timeout conditions
        
        alt Failure Threshold Exceeded
            Observer->>Config: Get BB_TTS_FAILED_PROVIDER
            Observer->>Config: Get BB_TTS_FALLBACK_ENABLED
            alt Fallback Enabled
                Observer->>Redis: Update swap state for provider
                Observer->>Slack: Alert on fallback trigger
                Note over TTS: Subsequent requests use swapped provider
            end
        end
    end

    Pipeline->>Observer: Cancel health check task on termination

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Support for Cartesia TTS with template specific support #461: Modifies Breeze Buddy TTS integration with provider/voice routing changes that align with this PR's fallback logic.
Integration of tts in Breeze Buddy #421: Establishes the TTS integration foundation that this PR extends with health monitoring and automatic provider swapping.
Fix pipeline params error of passing observers as null #519: Modifies Breeze Buddy pipeline observer handling, directly related to how health observer is integrated into the pipeline.

Suggested Reviewers

badri-singhal
murdore
swaroopvarma1

Poem

🐰 A health check hops through the TTS stream,
Detecting when providers fail to seem,
With Redis and Slack to spread the news,
And fallback magic when voices blues,
Now Breeze Buddy bounces, forever on cue! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding TTS fallback support for Breeze Buddy, which aligns with the core objective of the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 96.88% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- added tts fallback for eleven labs and cartesia - added slack alerts after triggering tts fallback - control fallback with feature flags

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)
156-221: ⚠️ Potential issue | 🟡 Minor

Normalize provider names before comparisons to keep fallback reliable.

If BB_TTS_FAILED_PROVIDER / BREEZE_BUDDY_TTS_SERVICE ever come in mixed case, the equality checks won’t match and fallback won’t trigger. Lowercasing both values makes the routing more robust.
Proposed fix
-    failed_provider = await BB_TTS_FAILED_PROVIDER()
+    failed_provider = (await BB_TTS_FAILED_PROVIDER()).lower()
...
-    configured_service = await BB_TTS_SERVICE()
+    configured_service = (await BB_TTS_SERVICE()).lower()

🤖 Fix all issues with AI agents

In `@app/ai/voice/tts/tts_health_observer.py`:
- Around line 122-131: The _detect_provider function currently defaults unknown
source names to TTSProvider.ELEVENLABS which can misattribute errors; update
_detect_provider to explicitly handle unknown providers by checking source_name
(in function _detect_provider) against known values (TTSProvider.ELEVENLABS,
TTSProvider.CARTESIA) and when none match either log a warning with the
source_name and return a sentinel (e.g., None) or raise a specific exception, or
alternatively add the new provider to the TTSProvider enum and handle it
explicitly; ensure callers of _detect_provider (wherever it's used) are updated
to handle the None/exception/new-enum case to avoid silently assuming
ElevenLabs.
- Around line 320-336: The _trigger_fallback logic currently ignores the return
value of the first set_config call; modify _trigger_fallback to capture both
results (the set for "BB_TTS_FAILED_PROVIDER" and the set for
"BB_TTS_FALLBACK_ENABLED") and only mark self._fallback_triggered = True, reset
self._failure_counts[failed_service] and call await
self._send_fallback_alert(...) when both set_config calls succeed; keep the
order (set failed provider first, then enable fallback) and log distinct errors
when either write fails (include which key failed) instead of proceeding when
only one succeeded.

In `@app/services/live_config/store.py`:
- Around line 238-269: set_config currently does a non-atomic read-modify-write
on FEATURE_FLAGS_KEY using client.get/client.set which can cause lost updates
when concurrent writers (e.g., set_config and fetch_and_update_feature_flags)
run; make the update atomic by either converting storage to a Redis HASH and
using HSET/HGET/HGETALL for individual flag updates (update normalize_key(key)
via HSET FEATURE_FLAGS_KEY normalized_key value) or implement optimistic locking
with WATCH/MULTI/EXEC around FEATURE_FLAGS_KEY in set_config: WATCH
FEATURE_FLAGS_KEY, read raw, modify the JSON, MULTI -> set -> EXEC and retry on
EXEC failure; reference the existing helpers get_redis_service, normalize_key
and the competing function fetch_and_update_feature_flags when adding the atomic
retry logic.

🧹 Nitpick comments (2)

app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)

149-154: Deduplicate provider/voice maps to avoid drift.

VOICE_TO_PROVIDER (and PROVIDER_TO_VOICE) are defined in two functions. Consider promoting them to module-level constants to keep mapping changes in one place.

Also applies to: 276-286

app/ai/voice/tts/tts_health_observer.py (1)

363-385: Drop unused current_service parameter or include it in the alert.

coderabbitai · 2026-02-03T11:52:54Z

app/ai/voice/tts/tts_health_observer.py

+    def _detect_provider(self, source_name: str) -> TTSProvider:
+        """Detect TTS provider from source class name."""
+        name_lower = source_name.lower()
+        if "elevenlabs" in name_lower:
+            return TTSProvider.ELEVENLABS
+        elif "cartesia" in name_lower:
+            return TTSProvider.CARTESIA
+        else:
+            # Default to ElevenLabs detection strategy (more conservative)
+            return TTSProvider.ELEVENLABS


⚠️ Potential issue | 🟠 Major

Handle non‑ElevenLabs/Cartesia providers explicitly.

Defaulting unknown providers to ElevenLabs can misattribute failures and trigger incorrect fallback/alerts when Sarvam (or future providers) is configured. Consider explicitly handling unknown providers (log + ignore) or adding them to the enum/config instead of defaulting.

🤖 Prompt for AI Agents

In `@app/ai/voice/tts/tts_health_observer.py` around lines 122 - 131, The _detect_provider function currently defaults unknown source names to TTSProvider.ELEVENLABS which can misattribute errors; update _detect_provider to explicitly handle unknown providers by checking source_name (in function _detect_provider) against known values (TTSProvider.ELEVENLABS, TTSProvider.CARTESIA) and when none match either log a warning with the source_name and return a sentinel (e.g., None) or raise a specific exception, or alternatively add the new provider to the TTSProvider enum and handle it explicitly; ensure callers of _detect_provider (wherever it's used) are updated to handle the None/exception/new-enum case to avoid silently assuming ElevenLabs.

coderabbitai · 2026-02-03T11:52:55Z

app/ai/voice/tts/tts_health_observer.py

+    async def _trigger_fallback(self, failed_service: str):
+        """Enable fallback in Redis and notify."""
+        alt_service = TTS_SWAP_MAP.get(failed_service, "unknown")
+
+        # Set the failed provider FIRST to avoid race conditions
+        await set_config("BB_TTS_FAILED_PROVIDER", failed_service)
+        # THEN enable the fallback flag
+        success = await set_config("BB_TTS_FALLBACK_ENABLED", True)
+        if success:
+            self._fallback_triggered = True
+            self._failure_counts[failed_service] = 0
+            logger.warning(
+                f"🔄 TTS FALLBACK TRIGGERED: {failed_service} → {alt_service}"
+            )
+            await self._send_fallback_alert(failed_service, alt_service)
+        else:
+            logger.error("[TTS Health] Failed to update Redis for fallback!")


⚠️ Potential issue | 🟠 Major

Verify both Redis updates before marking fallback triggered.

_trigger_fallback ignores the result of setting BB_TTS_FAILED_PROVIDER. If that write fails but fallback is enabled, routing can switch without an accurate failed-provider value. Check both updates and only mark fallback as triggered when both succeed.

Proposed fix

- await set_config("BB_TTS_FAILED_PROVIDER", failed_service) - # THEN enable the fallback flag - success = await set_config("BB_TTS_FALLBACK_ENABLED", True) - if success: + failed_set = await set_config("BB_TTS_FAILED_PROVIDER", failed_service) + fallback_set = False + if failed_set: + fallback_set = await set_config("BB_TTS_FALLBACK_ENABLED", True) + if failed_set and fallback_set: self._fallback_triggered = True self._failure_counts[failed_service] = 0 logger.warning( f"🔄 TTS FALLBACK TRIGGERED: {failed_service} → {alt_service}" ) await self._send_fallback_alert(failed_service, alt_service) else: - logger.error("[TTS Health] Failed to update Redis for fallback!") + logger.error( + "[TTS Health] Failed to update Redis for fallback " + f"(failed_provider={failed_set}, fallback_enabled={fallback_set})" + )

🤖 Prompt for AI Agents

In `@app/ai/voice/tts/tts_health_observer.py` around lines 320 - 336, The _trigger_fallback logic currently ignores the return value of the first set_config call; modify _trigger_fallback to capture both results (the set for "BB_TTS_FAILED_PROVIDER" and the set for "BB_TTS_FALLBACK_ENABLED") and only mark self._fallback_triggered = True, reset self._failure_counts[failed_service] and call await self._send_fallback_alert(...) when both set_config calls succeed; keep the order (set failed provider first, then enable fallback) and log distinct errors when either write fails (include which key failed) instead of proceeding when only one succeeded.

coderabbitai · 2026-02-03T11:52:55Z

app/services/live_config/store.py

+async def set_config(key: str, value: Any) -> bool:
+    """Update a single config value in Redis at runtime.
+
+    This is used for runtime updates like TTS fallback triggers.
+    Note: This does NOT update DevCycle - that remains the source of truth.
+    When DevCycle syncs, it may overwrite values set through this function.
+
+    Args:
+        key: Configuration key to update
+        value: New value to set
+
+    Returns:
+        True if update was successful, False otherwise
+    """
+    try:
+        redis = await get_redis_service()
+        client = await redis.get_client()
+
+        # Load existing flags
+        raw = await client.get(FEATURE_FLAGS_KEY)
+        all_flags = json.loads(raw) if raw else {}
+
+        # Update the specific key (normalize to match DevCycle format)
+        normalized_key = normalize_key(key)
+        all_flags[normalized_key] = value
+
+        # Write back
+        await client.set(FEATURE_FLAGS_KEY, json.dumps(all_flags))
+        logger.info(f"Runtime config updated: {normalized_key} = {value}")
+        return True
+    except Exception as e:
+        logger.error(f"Failed to set config {key}: {e}")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the store.py file fd -t f "store.py" --path "*live_config*"

Repository: juspay/clairvoyance

Length of output: 296

🏁 Script executed:

# Look at the actual file to verify the snippet cat -n app/services/live_config/store.py | sed -n '230,280p'

Repository: juspay/clairvoyance

Length of output: 2126

🏁 Script executed:

# Search for other places where FEATURE_FLAGS_KEY is written rg "FEATURE_FLAGS_KEY" -B 2 -A 2

Repository: juspay/clairvoyance

Length of output: 1947

🏁 Script executed:

# Check if WATCH/MULTI/transactions are used elsewhere in the file rg "WATCH|MULTI|EXEC|watch|multi|exec" app/services/live_config/

Repository: juspay/clairvoyance

Length of output: 45

🏁 Script executed:

# Check for concurrent patterns - see how Redis client is used rg "get_redis_service|get_client" -B 2 -A 5 app/services/live_config/

Repository: juspay/clairvoyance

Length of output: 2771

🏁 Script executed:

# Look at the DevCycle sync operation around line 203 cat -n app/services/live_config/store.py | sed -n '195,215p'

Repository: juspay/clairvoyance

Length of output: 788

🏁 Script executed:

# Check if there's any locking mechanism or mutex elsewhere in the codebase rg "asyncio.Lock|Lock\(\)|threading.Lock|mutex" app/services/live_config/

Repository: juspay/clairvoyance

Length of output: 45

🏁 Script executed:

# Check how frequently set_config and sync are called rg "set_config\(" -B 2 -A 2

Repository: juspay/clairvoyance

Length of output: 932

🏁 Script executed:

# Find the DevCycle sync operation that writes to FEATURE_FLAGS_KEY rg "def.*sync|def.*update.*flag" app/services/live_config/ -A 10 | head -50

Repository: juspay/clairvoyance

Length of output: 807

🏁 Script executed:

# Check if there are any background tasks that sync DevCycle rg "devcycle|DevCycle" app/services/live_config/ -B 2 -A 5

Repository: juspay/clairvoyance

Length of output: 10253

🏁 Script executed:

# Look at the tts_health_observer context more - this shows the race condition cat -n app/ai/voice/tts/tts_health_observer.py | grep -B 5 -A 5 "set_config"

Repository: juspay/clairvoyance

Length of output: 1111

Guard against lost updates in concurrent config writes.

set_config performs a read-modify-write on FEATURE_FLAGS_KEY without atomic guarantees. With multiple writers (e.g., set_config from TTS health observer and fetch_and_update_feature_flags from DevCycle sync both executing concurrently), updates can be lost—for example, one process reads the full flags dict, another process reads it and writes back, then the first process overwrites with its stale copy. The codebase already acknowledges this risk in the docstring ("When DevCycle syncs, it may overwrite values set through this function"), but there is no atomic mechanism to prevent it.

Consider using Redis transactions (WATCH/MULTI/EXEC for optimistic locking) or storing flags as individual Redis HASHes instead of a single JSON string to avoid RMW cycles.

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 267-267: Consider moving this statement to an else block

(TRY300)

[warning] 268-268: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@app/services/live_config/store.py` around lines 238 - 269, set_config currently does a non-atomic read-modify-write on FEATURE_FLAGS_KEY using client.get/client.set which can cause lost updates when concurrent writers (e.g., set_config and fetch_and_update_feature_flags) run; make the update atomic by either converting storage to a Redis HASH and using HSET/HGET/HGETALL for individual flag updates (update normalize_key(key) via HSET FEATURE_FLAGS_KEY normalized_key value) or implement optimistic locking with WATCH/MULTI/EXEC around FEATURE_FLAGS_KEY in set_config: WATCH FEATURE_FLAGS_KEY, read raw, modify the JSON, MULTI -> set -> EXEC and retry on EXEC failure; reference the existing helpers get_redis_service, normalize_key and the competing function fetch_and_update_feature_flags when adding the atomic retry logic.

narsimhaReddyJuspay force-pushed the tts-fallback-support branch 4 times, most recently from ddbe08b to 363206f Compare February 3, 2026 08:08

(feat): add support for tts fallback for breeze buddy

28d0594

- added tts fallback for eleven labs and cartesia - added slack alerts after triggering tts fallback - control fallback with feature flags

narsimhaReddyJuspay force-pushed the tts-fallback-support branch from 363206f to 28d0594 Compare February 3, 2026 11:46

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): add support for tts fallback for breeze buddy #522

(feat): add support for tts fallback for breeze buddy #522

Uh oh!

narsimhaReddyJuspay commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

(feat): add support for tts fallback for breeze buddy #522

Are you sure you want to change the base?

(feat): add support for tts fallback for breeze buddy #522

Uh oh!

Conversation

narsimhaReddyJuspay commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

narsimhaReddyJuspay commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading