This guide explains how to use Google's Gemini model with the Code Mode benchmark.
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy your API key
Add your Gemini API key to .env:
GOOGLE_API_KEY=your_google_api_key_hereYou can have both Claude and Gemini keys in the same .env file:
ANTHROPIC_API_KEY=sk-ant-xxxxx
GOOGLE_API_KEY=your_google_api_key_here# Run full benchmark with Gemini
python benchmark.py --model gemini
# Run quick test (2 scenarios) with Gemini
python benchmark.py --model gemini --limit 2
# Run specific scenario with Gemini
python benchmark.py --model gemini --scenario 3# Run full benchmark with Gemini
make run-gemini
# Run quick test with Gemini
make run-gemini-quickYou can run the benchmark with both models to compare:
# Run with Claude (default)
python benchmark.py --limit 2
# Results saved to: benchmark_results_claude.json
# Run with Gemini
python benchmark.py --model gemini --limit 2
# Results saved to: benchmark_results_gemini.json
# Compare the JSON filesUses Gemini's native function calling:
- Converts Anthropic tool schemas to Gemini format
- Uses
genai.GenerativeModelwith tools parameter - Handles function calls via
function_callparts - Returns function responses using
FunctionResponseproto
Location: agents/gemini_regular_agent.py
Uses Gemini to generate Python code:
- Same system prompt as Claude version
- Generates code in ```python blocks
- Executes in the same sandbox (RestrictedPython)
- Uses the same tools API
Location: agents/gemini_codemode_agent.py
# Test Gemini Regular Agent
cd codemode_benchmark
source venv/bin/activate
python agents/gemini_regular_agent.py
# Test Gemini Code Mode Agent
python agents/gemini_codemode_agent.py
# Test agent factory with both models
python agents/agent_factory.py- Schema format: Gemini uses different parameter schema format
- Function calling: Gemini uses proto-based function responses
- Token counting: Different token counting mechanisms
- Context window: Different limits
Both models should:
- Complete all scenarios successfully
- Pass validation checks
- Generate correct final state
Performance may differ:
- Token usage
- Execution time
- Number of iterations
- Code quality (for Code Mode)
- Check
.envfile exists - Verify the key is correct
- Make sure there are no extra spaces or quotes
- Gemini has free tier rate limits
- Wait a minute between runs
- Or upgrade to paid tier
- The code uses
gemini-1.5-pro-latest - Ensure your API key has access to this model
- You can change the model name in the agent files if needed
- If a tool doesn't work, check schema conversion in
_convert_schemas_to_gemini - Gemini may have stricter requirements for some parameter types
To change the Gemini model version, edit:
For Regular Agent (agents/gemini_regular_agent.py):
self.model_name = "gemini-1.5-pro-latest" # Change hereFor Code Mode Agent (agents/gemini_codemode_agent.py):
self.model_name = "gemini-1.5-pro-latest" # Change hereAvailable models:
gemini-1.5-pro-latestgemini-1.5-flash-latestgemini-1.0-pro-latest
- Gemini's function calling API is different from Claude's
- Schema conversion happens automatically
- Code Mode uses the same sandbox for both models
- Results are saved to separate files by model name
To add support for other models (e.g., GPT-4), follow this pattern:
- Create
agents/yourmodel_regular_agent.py - Create
agents/yourmodel_codemode_agent.py - Add to
agents/agent_factory.py:"yourmodel": { "name": "Your Model Name", "api_key_env": "YOUR_API_KEY", "regular": YourModelRegularAgent, "codemode": YourModelCodeModeAgent, }
- Update
benchmark.pyto handle the new API key - Update documentation
The agent factory makes it easy to support multiple LLM providers!