Skip to content

Conversation

@letusfly85
Copy link

Problem

HTTPAgent was returning the entire OpenAI API response JSON as a string instead of extracting the actual message content. This caused agent validation failures when using OpenAI-compatible servers like vLLM.

Current Behavior

  • Agent responses contain raw API JSON: {'id': 'chatcmpl-xxx', 'object': 'chat.completion', 'choices': [...]}
  • AgentBench tasks cannot parse these responses, leading to validation failures
  • Users must manually modify the agent code to work with OpenAI-compatible servers

Expected Behavior

  • Agent responses should contain only the text content: the actual LLM output
  • Content should be extracted from choices[0].message.content for OpenAI format
  • No manual intervention required

Solution

This PR modifies HTTPAgent.inference() to intelligently handle OpenAI-compatible API responses:

  1. Detect OpenAI format: Check for the presence of choices field in response
  2. Extract content: Get the actual message from choices[0].message.content
  3. Fallback mechanism: If not OpenAI format, use original return_format behavior

Changes

  • File: src/client/agents/http_agent.py
  • Lines: 212-222 (inference method)
  • Impact: 9 lines added (detection + extraction logic)

Effects

Enables vLLM Integration

  • AgentBench can now work directly with vLLM's OpenAI-compatible server
  • No need for custom wrappers or response transformation layers
  • Supports the most widely used open-source LLM inference engine

Eliminates Agent Validation Failures

  • Agents receive properly formatted text responses
  • Task validation logic works correctly
  • Benchmark results become meaningful and accurate

Backward Compatibility

  • Existing configurations continue to work without modification
  • Non-OpenAI API formats remain supported via fallback mechanism
  • Zero breaking changes for current users

Broader Compatibility

  • Works with any OpenAI-compatible inference server:
    • vLLM
    • Text Generation Inference (TGI)
    • LocalAI
    • LiteLLM
    • And others following OpenAI's response format

Implementation Details

The fix adds intelligent response parsing:

# Extract content from OpenAI-compatible API response (vLLM)
if isinstance(resp, dict) and "choices" in resp and len(resp["choices"]) > 0:
    message = resp["choices"][0].get("message", {})
    content = message.get("content", "")
    if content:
        return content

# Fallback to return_format if not OpenAI format
return self.return_format.format(response=resp)

This approach ensures that:

  • OpenAI-format responses are properly parsed
  • Other formats continue working as before
  • No configuration changes are needed
  • The code is self-documenting with clear comments

Use Case

This fix is particularly important for researchers and practitioners who:

  • Use vLLM for efficient LLM serving
  • Want to benchmark open-source models with AgentBench
  • Need cost-effective alternatives to proprietary APIs
  • Require high-throughput inference for large-scale evaluations

Related

This addresses a common pain point when using AgentBench with modern open-source inference engines. The OpenAI API format has become the de facto standard for LLM serving, and this fix ensures AgentBench works seamlessly with that ecosystem.

HTTPAgent was returning the entire API response JSON as a string instead of
extracting the actual message content. This caused agent validation failures
when using OpenAI-compatible servers like vLLM.

Changes:
- Add detection for OpenAI response format (checks for 'choices' field)
- Extract content from choices[0].message.content when available
- Maintain backward compatibility with fallback to return_format

This fix enables AgentBench to work correctly with vLLM and other
OpenAI-compatible inference servers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant