-
Notifications
You must be signed in to change notification settings - Fork 53
(feat): add support for tts fallback for breeze buddy #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
(feat): add support for tts fallback for breeze buddy #522
Conversation
WalkthroughThis PR introduces a TTS health observer system that monitors service health, detects provider failures through provider-aware detection strategies, and automatically triggers fallback to alternate providers via Redis state updates and Slack alerts. The system integrates an asynchronous health-check loop into the Breeze Buddy pipeline and adds dynamic configuration accessors for fallback thresholds and timeout settings. Changes
Sequence DiagramsequenceDiagram
participant Pipeline as Breeze Buddy Pipeline
participant Observer as TTS Health Observer
participant Config as Dynamic Config
participant TTS as TTS Service
participant Redis as Redis Store
participant Slack as Slack Alerts
Pipeline->>Observer: start_tts_health_check_loop() on init
Observer->>Observer: Begin async monitoring task
loop TTS Request Cycle
TTS->>Observer: TTSStartedFrame (with provider)
TTS->>Observer: TTSAudioRawFrame (audio bytes) or ErrorFrame
TTS->>Observer: TTSStoppedFrame
Observer->>Observer: Track failure count per provider
end
loop Health Check Interval
Observer->>Config: Check BB_TTS_FAILURE_THRESHOLD
Observer->>Config: Check BB_TTS_AUDIO_TIMEOUT_SECONDS
Observer->>Observer: Evaluate timeout conditions
alt Failure Threshold Exceeded
Observer->>Config: Get BB_TTS_FAILED_PROVIDER
Observer->>Config: Get BB_TTS_FALLBACK_ENABLED
alt Fallback Enabled
Observer->>Redis: Update swap state for provider
Observer->>Slack: Alert on fallback trigger
Note over TTS: Subsequent requests use swapped provider
end
end
end
Pipeline->>Observer: Cancel health check task on termination
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Failure to add the new IP will result in interrupted reviews. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ddbe08b to
363206f
Compare
- added tts fallback for eleven labs and cartesia - added slack alerts after triggering tts fallback - control fallback with feature flags
363206f to
28d0594
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)
156-221:⚠️ Potential issue | 🟡 MinorNormalize provider names before comparisons to keep fallback reliable.
If
BB_TTS_FAILED_PROVIDER/BREEZE_BUDDY_TTS_SERVICEever come in mixed case, the equality checks won’t match and fallback won’t trigger. Lowercasing both values makes the routing more robust.Proposed fix
- failed_provider = await BB_TTS_FAILED_PROVIDER() + failed_provider = (await BB_TTS_FAILED_PROVIDER()).lower() ... - configured_service = await BB_TTS_SERVICE() + configured_service = (await BB_TTS_SERVICE()).lower()
🤖 Fix all issues with AI agents
In `@app/ai/voice/tts/tts_health_observer.py`:
- Around line 122-131: The _detect_provider function currently defaults unknown
source names to TTSProvider.ELEVENLABS which can misattribute errors; update
_detect_provider to explicitly handle unknown providers by checking source_name
(in function _detect_provider) against known values (TTSProvider.ELEVENLABS,
TTSProvider.CARTESIA) and when none match either log a warning with the
source_name and return a sentinel (e.g., None) or raise a specific exception, or
alternatively add the new provider to the TTSProvider enum and handle it
explicitly; ensure callers of _detect_provider (wherever it's used) are updated
to handle the None/exception/new-enum case to avoid silently assuming
ElevenLabs.
- Around line 320-336: The _trigger_fallback logic currently ignores the return
value of the first set_config call; modify _trigger_fallback to capture both
results (the set for "BB_TTS_FAILED_PROVIDER" and the set for
"BB_TTS_FALLBACK_ENABLED") and only mark self._fallback_triggered = True, reset
self._failure_counts[failed_service] and call await
self._send_fallback_alert(...) when both set_config calls succeed; keep the
order (set failed provider first, then enable fallback) and log distinct errors
when either write fails (include which key failed) instead of proceeding when
only one succeeded.
In `@app/services/live_config/store.py`:
- Around line 238-269: set_config currently does a non-atomic read-modify-write
on FEATURE_FLAGS_KEY using client.get/client.set which can cause lost updates
when concurrent writers (e.g., set_config and fetch_and_update_feature_flags)
run; make the update atomic by either converting storage to a Redis HASH and
using HSET/HGET/HGETALL for individual flag updates (update normalize_key(key)
via HSET FEATURE_FLAGS_KEY normalized_key value) or implement optimistic locking
with WATCH/MULTI/EXEC around FEATURE_FLAGS_KEY in set_config: WATCH
FEATURE_FLAGS_KEY, read raw, modify the JSON, MULTI -> set -> EXEC and retry on
EXEC failure; reference the existing helpers get_redis_service, normalize_key
and the competing function fetch_and_update_feature_flags when adding the atomic
retry logic.
🧹 Nitpick comments (2)
app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)
149-154: Deduplicate provider/voice maps to avoid drift.
VOICE_TO_PROVIDER(andPROVIDER_TO_VOICE) are defined in two functions. Consider promoting them to module-level constants to keep mapping changes in one place.Also applies to: 276-286
app/ai/voice/tts/tts_health_observer.py (1)
363-385: Drop unusedcurrent_serviceparameter or include it in the alert.
| def _detect_provider(self, source_name: str) -> TTSProvider: | ||
| """Detect TTS provider from source class name.""" | ||
| name_lower = source_name.lower() | ||
| if "elevenlabs" in name_lower: | ||
| return TTSProvider.ELEVENLABS | ||
| elif "cartesia" in name_lower: | ||
| return TTSProvider.CARTESIA | ||
| else: | ||
| # Default to ElevenLabs detection strategy (more conservative) | ||
| return TTSProvider.ELEVENLABS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle non‑ElevenLabs/Cartesia providers explicitly.
Defaulting unknown providers to ElevenLabs can misattribute failures and trigger incorrect fallback/alerts when Sarvam (or future providers) is configured. Consider explicitly handling unknown providers (log + ignore) or adding them to the enum/config instead of defaulting.
🤖 Prompt for AI Agents
In `@app/ai/voice/tts/tts_health_observer.py` around lines 122 - 131, The
_detect_provider function currently defaults unknown source names to
TTSProvider.ELEVENLABS which can misattribute errors; update _detect_provider to
explicitly handle unknown providers by checking source_name (in function
_detect_provider) against known values (TTSProvider.ELEVENLABS,
TTSProvider.CARTESIA) and when none match either log a warning with the
source_name and return a sentinel (e.g., None) or raise a specific exception, or
alternatively add the new provider to the TTSProvider enum and handle it
explicitly; ensure callers of _detect_provider (wherever it's used) are updated
to handle the None/exception/new-enum case to avoid silently assuming
ElevenLabs.
| async def _trigger_fallback(self, failed_service: str): | ||
| """Enable fallback in Redis and notify.""" | ||
| alt_service = TTS_SWAP_MAP.get(failed_service, "unknown") | ||
|
|
||
| # Set the failed provider FIRST to avoid race conditions | ||
| await set_config("BB_TTS_FAILED_PROVIDER", failed_service) | ||
| # THEN enable the fallback flag | ||
| success = await set_config("BB_TTS_FALLBACK_ENABLED", True) | ||
| if success: | ||
| self._fallback_triggered = True | ||
| self._failure_counts[failed_service] = 0 | ||
| logger.warning( | ||
| f"🔄 TTS FALLBACK TRIGGERED: {failed_service} → {alt_service}" | ||
| ) | ||
| await self._send_fallback_alert(failed_service, alt_service) | ||
| else: | ||
| logger.error("[TTS Health] Failed to update Redis for fallback!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify both Redis updates before marking fallback triggered.
_trigger_fallback ignores the result of setting BB_TTS_FAILED_PROVIDER. If that write fails but fallback is enabled, routing can switch without an accurate failed-provider value. Check both updates and only mark fallback as triggered when both succeed.
Proposed fix
- await set_config("BB_TTS_FAILED_PROVIDER", failed_service)
- # THEN enable the fallback flag
- success = await set_config("BB_TTS_FALLBACK_ENABLED", True)
- if success:
+ failed_set = await set_config("BB_TTS_FAILED_PROVIDER", failed_service)
+ fallback_set = False
+ if failed_set:
+ fallback_set = await set_config("BB_TTS_FALLBACK_ENABLED", True)
+ if failed_set and fallback_set:
self._fallback_triggered = True
self._failure_counts[failed_service] = 0
logger.warning(
f"🔄 TTS FALLBACK TRIGGERED: {failed_service} → {alt_service}"
)
await self._send_fallback_alert(failed_service, alt_service)
else:
- logger.error("[TTS Health] Failed to update Redis for fallback!")
+ logger.error(
+ "[TTS Health] Failed to update Redis for fallback "
+ f"(failed_provider={failed_set}, fallback_enabled={fallback_set})"
+ )🤖 Prompt for AI Agents
In `@app/ai/voice/tts/tts_health_observer.py` around lines 320 - 336, The
_trigger_fallback logic currently ignores the return value of the first
set_config call; modify _trigger_fallback to capture both results (the set for
"BB_TTS_FAILED_PROVIDER" and the set for "BB_TTS_FALLBACK_ENABLED") and only
mark self._fallback_triggered = True, reset self._failure_counts[failed_service]
and call await self._send_fallback_alert(...) when both set_config calls
succeed; keep the order (set failed provider first, then enable fallback) and
log distinct errors when either write fails (include which key failed) instead
of proceeding when only one succeeded.
| async def set_config(key: str, value: Any) -> bool: | ||
| """Update a single config value in Redis at runtime. | ||
|
|
||
| This is used for runtime updates like TTS fallback triggers. | ||
| Note: This does NOT update DevCycle - that remains the source of truth. | ||
| When DevCycle syncs, it may overwrite values set through this function. | ||
|
|
||
| Args: | ||
| key: Configuration key to update | ||
| value: New value to set | ||
|
|
||
| Returns: | ||
| True if update was successful, False otherwise | ||
| """ | ||
| try: | ||
| redis = await get_redis_service() | ||
| client = await redis.get_client() | ||
|
|
||
| # Load existing flags | ||
| raw = await client.get(FEATURE_FLAGS_KEY) | ||
| all_flags = json.loads(raw) if raw else {} | ||
|
|
||
| # Update the specific key (normalize to match DevCycle format) | ||
| normalized_key = normalize_key(key) | ||
| all_flags[normalized_key] = value | ||
|
|
||
| # Write back | ||
| await client.set(FEATURE_FLAGS_KEY, json.dumps(all_flags)) | ||
| logger.info(f"Runtime config updated: {normalized_key} = {value}") | ||
| return True | ||
| except Exception as e: | ||
| logger.error(f"Failed to set config {key}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the store.py file
fd -t f "store.py" --path "*live_config*"Repository: juspay/clairvoyance
Length of output: 296
🏁 Script executed:
# Look at the actual file to verify the snippet
cat -n app/services/live_config/store.py | sed -n '230,280p'Repository: juspay/clairvoyance
Length of output: 2126
🏁 Script executed:
# Search for other places where FEATURE_FLAGS_KEY is written
rg "FEATURE_FLAGS_KEY" -B 2 -A 2Repository: juspay/clairvoyance
Length of output: 1947
🏁 Script executed:
# Check if WATCH/MULTI/transactions are used elsewhere in the file
rg "WATCH|MULTI|EXEC|watch|multi|exec" app/services/live_config/Repository: juspay/clairvoyance
Length of output: 45
🏁 Script executed:
# Check for concurrent patterns - see how Redis client is used
rg "get_redis_service|get_client" -B 2 -A 5 app/services/live_config/Repository: juspay/clairvoyance
Length of output: 2771
🏁 Script executed:
# Look at the DevCycle sync operation around line 203
cat -n app/services/live_config/store.py | sed -n '195,215p'Repository: juspay/clairvoyance
Length of output: 788
🏁 Script executed:
# Check if there's any locking mechanism or mutex elsewhere in the codebase
rg "asyncio.Lock|Lock\(\)|threading.Lock|mutex" app/services/live_config/Repository: juspay/clairvoyance
Length of output: 45
🏁 Script executed:
# Check how frequently set_config and sync are called
rg "set_config\(" -B 2 -A 2Repository: juspay/clairvoyance
Length of output: 932
🏁 Script executed:
# Find the DevCycle sync operation that writes to FEATURE_FLAGS_KEY
rg "def.*sync|def.*update.*flag" app/services/live_config/ -A 10 | head -50Repository: juspay/clairvoyance
Length of output: 807
🏁 Script executed:
# Check if there are any background tasks that sync DevCycle
rg "devcycle|DevCycle" app/services/live_config/ -B 2 -A 5Repository: juspay/clairvoyance
Length of output: 10253
🏁 Script executed:
# Look at the tts_health_observer context more - this shows the race condition
cat -n app/ai/voice/tts/tts_health_observer.py | grep -B 5 -A 5 "set_config"Repository: juspay/clairvoyance
Length of output: 1111
Guard against lost updates in concurrent config writes.
set_config performs a read-modify-write on FEATURE_FLAGS_KEY without atomic guarantees. With multiple writers (e.g., set_config from TTS health observer and fetch_and_update_feature_flags from DevCycle sync both executing concurrently), updates can be lost—for example, one process reads the full flags dict, another process reads it and writes back, then the first process overwrites with its stale copy. The codebase already acknowledges this risk in the docstring ("When DevCycle syncs, it may overwrite values set through this function"), but there is no atomic mechanism to prevent it.
Consider using Redis transactions (WATCH/MULTI/EXEC for optimistic locking) or storing flags as individual Redis HASHes instead of a single JSON string to avoid RMW cycles.
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 267-267: Consider moving this statement to an else block
(TRY300)
[warning] 268-268: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@app/services/live_config/store.py` around lines 238 - 269, set_config
currently does a non-atomic read-modify-write on FEATURE_FLAGS_KEY using
client.get/client.set which can cause lost updates when concurrent writers
(e.g., set_config and fetch_and_update_feature_flags) run; make the update
atomic by either converting storage to a Redis HASH and using HSET/HGET/HGETALL
for individual flag updates (update normalize_key(key) via HSET
FEATURE_FLAGS_KEY normalized_key value) or implement optimistic locking with
WATCH/MULTI/EXEC around FEATURE_FLAGS_KEY in set_config: WATCH
FEATURE_FLAGS_KEY, read raw, modify the JSON, MULTI -> set -> EXEC and retry on
EXEC failure; reference the existing helpers get_redis_service, normalize_key
and the competing function fetch_and_update_feature_flags when adding the atomic
retry logic.
Summary by CodeRabbit
New Features
Chores