-
Notifications
You must be signed in to change notification settings - Fork 4
Model Fallbacks
adham90 edited this page Jan 21, 2026
·
2 revisions
Automatically try alternative models when your primary model fails.
module LLM
class MyAgent < ApplicationAgent
model "gpt-4o"
fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end
endWhen the primary model fails (after any retries):
1. Primary: gpt-4o
└─ Fails after retries
2. Fallback 1: gpt-4o-mini
└─ Succeeds! Return result
# If fallback 1 also fails:
3. Fallback 2: claude-3-5-sonnet
└─ Succeeds! Return result
# If all fail:
└─ Raise error
Each model gets its own retry attempts:
module LLM
class MyAgent < ApplicationAgent
model "gpt-4o"
retries max: 2
fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
end
end
# Total possible attempts:
# gpt-4o: 3 attempts (1 + 2 retries)
# gpt-4o-mini: 3 attempts
# claude-3-5-sonnet: 3 attempts
# = Up to 9 attempts totalresult = LLM::MyAgent.call(query: "test")
# Check which model succeeded
result.model_id # Original model requested
result.chosen_model_id # Model that actually succeeded
result.used_fallback? # true if not the primary model
# Example
result.model_id # => "gpt-4o"
result.chosen_model_id # => "claude-3-5-sonnet"
result.used_fallback? # => trueexecution = RubyLLM::Agents::Execution.last
execution.model_id # => "gpt-4o"
execution.chosen_model_id # => "claude-3-5-sonnet"
execution.attempts.each do |attempt|
puts "Model: #{attempt['model_id']}"
puts "Success: #{attempt['success']}"
puts "Error: #{attempt['error_class']}" unless attempt['success']
endStart expensive, fall back to cheaper:
module LLM
class CostOptimizedAgent < ApplicationAgent
model "gpt-4o" # Best quality
fallback_models "gpt-4o-mini" # Cheaper fallback
end
endSpread across providers for outage resilience:
module LLM
class MultiProviderAgent < ApplicationAgent
model "gpt-4o"
fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
# OpenAI → Anthropic → Google
end
endProgressively lower quality:
module LLM
class TieredAgent < ApplicationAgent
model "gpt-4o"
fallback_models "gpt-4o-mini", "gpt-3.5-turbo"
end
endFastest models first:
module LLM
class SpeedFirstAgent < ApplicationAgent
model "gemini-2.0-flash"
fallback_models "gpt-4o-mini", "claude-3-haiku"
end
endSet fallbacks for all agents:
# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
config.default_fallback_models = ["gpt-4o-mini", "claude-3-haiku"]
endPer-agent configuration overrides global:
module LLM
class MyAgent < ApplicationAgent
model "gpt-4o"
fallback_models "claude-3-5-sonnet" # Overrides global
end
endWhen using fallbacks across providers, ensure your prompts work with all models:
All fallback models should support your schema:
module LLM
class MyAgent < ApplicationAgent
model "gpt-4o"
fallback_models "claude-3-5-sonnet", "gemini-2.0-flash"
# All three support JSON mode/structured output
def schema
@schema ||= RubyLLM::Schema.create do
string :result
end
end
end
endAvoid provider-specific prompt features:
# Good: Universal prompt
def system_prompt
"You are a helpful assistant."
end
# Potentially problematic: Provider-specific syntax
def system_prompt
"<|im_start|>system..." # OpenAI-specific
endBe aware of capability differences:
| Feature | GPT-4o | Claude 3.5 | Gemini 2.0 |
|---|---|---|---|
| JSON mode | Yes | Yes | Yes |
| Vision | Yes | Yes | Yes |
| Function calling | Yes | Yes | Yes |
| Max tokens | 128K | 200K | 2M |
Track how often fallbacks are used:
# Fallback rate this week
total = RubyLLM::Agents::Execution.this_week.count
fallbacks = RubyLLM::Agents::Execution
.this_week
.where("chosen_model_id != model_id")
.count
fallback_rate = fallbacks.to_f / total
puts "Fallback rate: #{(fallback_rate * 100).round(1)}%"
# Breakdown by model
RubyLLM::Agents::Execution
.this_week
.where("chosen_model_id != model_id")
.group(:model_id, :chosen_model_id)
.count
# => { ["gpt-4o", "claude-3-5-sonnet"] => 45, ... }# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
config.alerts = {
on_events: [:high_fallback_rate],
slack_webhook_url: ENV['SLACK_WEBHOOK_URL'],
fallback_rate_threshold: 0.1 # Alert if > 10%
}
end# First fallback should be the best alternative
fallback_models "best_alternative", "second_choice", "last_resort"# Know the cost implications
model "gpt-4o" # $0.005/1K input
fallback_models "claude-3-opus" # $0.015/1K input (more expensive!)
# Better: Fall back to cheaper
fallback_models "gpt-4o-mini" # $0.00015/1K input# In tests, verify each model works
["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"].each do |model|
result = LLM::MyAgent.call(query: "test", model: model)
expect(result.success?).to be true
end# Good: 2-3 fallbacks
fallback_models "alternative1", "alternative2"
# Excessive: Too many
fallback_models "a", "b", "c", "d", "e", "f"
# Wastes time trying failed providers- Reliability - Overview of reliability features
- Automatic Retries - Retry configuration
- Circuit Breakers - Prevent cascading failures
- Agent DSL - Configuration reference