Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .gitignore
Binary file not shown.
333 changes: 333 additions & 0 deletions SETUP_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
# AgentBench Setup Guide (Azure OpenAI + WSL2)

This guide walks you through setting up and running AgentBench with Azure OpenAI on Windows using WSL2.

## Prerequisites
- Use Ubuntu
- Windows 10/11 with WSL2 enabled
- Docker Desktop for Windows with WSL2 integration enabled
- An Azure OpenAI resource with a deployed model (e.g., `gpt-4o-mini`)

---

## Step 1: Enable WSL2 Integration in Docker Desktop

1. Open **Docker Desktop**
2. Go to **Settings** → **Resources** → **WSL Integration**
3. Enable integration with your WSL2 distro (e.g., Ubuntu)
4. Click **Apply & Restart**

---

## Step 2: Clone the Repository

```bash
# In WSL2 terminal
cd ~
git clone https://github.com/Jay-Dev01/AgentBench.git
cd AgentBench
git checkout ubuntu-azure-setup
```

---

## Step 3: Set Up Python Environment

```bash
# Install Python 3.11 if not available
sudo apt update
sudo apt install -y python3.11 python3.11-venv python3.11-dev

# Create virtual environment
python3.11 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
```

---

## Step 4: Configure Azure OpenAI API Key

Set your Azure OpenAI API key as an environment variable:

```bash
export AZURE_OPENAI_API_KEY="your-azure-api-key-here"
```

To make it persistent, add it to your `~/.bashrc`:

```bash
echo 'export AZURE_OPENAI_API_KEY="your-azure-api-key-here"' >> ~/.bashrc
source ~/.bashrc
```

### Finding Your Azure OpenAI Credentials

1. Go to [Azure Portal](https://portal.azure.com)
2. Navigate to your **Azure OpenAI resource**
3. Click **Keys and Endpoint**
4. Copy **Key 1** or **Key 2**

The configuration file (`configs/agents/openai-chat.yaml`) is already set up to use:
- **Endpoint**: `https://algoverse-ab.openai.azure.com/`
- **Deployment**: `gpt-4o-mini`
- **API Version**: `2024-08-01-preview`

If your Azure resource is different, update the URL in `configs/agents/openai-chat.yaml`.

---

## Step 5: Start Docker Services

```bash
cd ~/AgentBench/extra

# Start the controller, redis, and alfworld worker
docker compose up -d controller redis alfworld-std

# Wait for services to initialize (~30-60 seconds)

# Verify services are running
docker compose ps

# Check that the worker registered
curl http://localhost:5020/api/list_workers
```

You should see output showing `alfworld-std` with workers registered.

### Verify Direct Worker Access

```bash
curl http://localhost:5021/api/get_sessions
```

This should return `[]` or a list of sessions.

---

## Step 6: Run the Benchmark

```bash
cd ~/AgentBench
source venv/bin/activate

# Make sure API key is set
echo $AZURE_OPENAI_API_KEY

# Run the assigner
python -m src.assigner
```

### Expected Output

```
TaskClient created: alfworld-std (http://localhost:5020/api)
-> Using direct worker address: http://localhost:5021/api
Message: 109 samples remaining.
Agent "gpt-4o-mini" needs to run 1 tasks with total 109 samples:
Task "alfworld-std": 109
Running Count: 0
Assigned gpt-4o-mini/alfworld-std#108
...
```

The benchmark will run through 109 ALFWorld tasks. Results are saved to the `outputs/` directory.

---

## Troubleshooting

### Rate Limit Errors

If you see `RateLimitReached` errors, the concurrency is set to 1 in `configs/assignments/default.yaml` to minimize this. You can:

1. Wait and retry (the error message tells you how long)
2. Increase your Azure quota at [aka.ms/oai/quotaincrease](https://aka.ms/oai/quotaincrease)

### Connection Refused

If you get connection errors:

```bash
# Check Docker services are running
docker compose ps

# Check controller logs
docker logs agentrl-controller --tail 50

# Check worker logs
docker logs agentbench-fc-alfworld-std-1 --tail 50

# Restart services
docker compose down
docker compose up -d controller redis alfworld-std
```

### Worker Not Registering

If `curl http://localhost:5020/api/list_workers` shows empty workers:

```bash
# Check worker logs for errors
docker logs agentbench-fc-alfworld-std-1 --tail 100

# Rebuild and restart
docker compose down
docker compose build alfworld-std
docker compose up -d controller redis alfworld-std
```

---

## Configuration Files

| File | Purpose |
|------|---------|
| `configs/agents/openai-chat.yaml` | Azure OpenAI endpoint and API key |
| `configs/agents/api_agents.yaml` | Agent definitions (gpt-4o-mini) |
| `configs/assignments/default.yaml` | Task assignments and concurrency |
| `configs/assignments/definition.yaml` | Controller address (port 5020) |
| `extra/docker-compose.yml` | Docker service definitions |

---

## Architecture Overview

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Python │ │ Controller │ │ ALFWorld │
│ Assigner │────▶│ (port 5020) │────▶│ Worker │
│ │ │ │ │ (port 5021) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
│ (direct communication - bypasses controller) │
└─────────────────────────────────────────────────┘

┌─────────────────┐
│ Azure OpenAI │
│ (gpt-4o-mini) │
└─────────────────┘
```

**Note:** The Python client talks directly to the worker (port 5021) because the controller has a bug that prevents proper `/interact` forwarding.

---

## Stopping Services

```bash
cd ~/AgentBench/extra
docker compose down
```

---

## Running Different Tasks

All tasks are pre-configured. To switch between tasks:

### Option 1: Use the run script

```bash
chmod +x run_task.sh
./run_task.sh alfworld-std # or dbbench-std, os-std, kg-std, webshop-std
```

### Option 2: Manual setup

#### 1. Edit `configs/assignments/default.yaml`

Uncomment the task you want to run:

```yaml
task:
# - alfworld-std # House-holding tasks
- dbbench-std # Database tasks (uncomment this one)
# - os-std # OS interaction tasks
# - kg-std # Knowledge graph tasks
# - webshop-std # Web shopping tasks
```

#### 2. Start the Docker service

```bash
cd ~/AgentBench/extra

# For alfworld (house-holding)
docker compose up -d controller redis alfworld-std

# For dbbench (database)
docker compose up -d controller redis dbbench-std

# For os-std (OS interaction) - requires building images first
docker compose up -d controller redis os_interaction-std

# For kg-std (knowledge graph) - requires freebase data
docker compose up -d controller redis knowledgegraph-std freebase

# For webshop (web shopping) - requires ~16GB RAM
docker compose up -d controller redis webshop-std
```

#### 3. Run the assigner

```bash
cd ~/AgentBench
source venv/bin/activate
python -m src.assigner
```

---

## Task-Specific Requirements

### OS Interaction (os-std)

Build the required Docker images first:

```bash
cd ~/AgentBench
docker build -t local-os/default -f data/os_interaction/res/dockerfiles/default data/os_interaction/res/dockerfiles
docker build -t local-os/packages -f data/os_interaction/res/dockerfiles/packages data/os_interaction/res/dockerfiles
docker build -t local-os/ubuntu -f data/os_interaction/res/dockerfiles/ubuntu data/os_interaction/res/dockerfiles
```

### Knowledge Graph (kg-std)

Requires Freebase data:

1. Download data from [Freebase-Setup](https://github.com/dki-lab/Freebase-Setup)
2. Extract and place at `./extra/virtuoso_db/virtuoso.db`
3. Start with: `docker compose up -d controller redis knowledgegraph-std freebase`

### WebShop (webshop-std)

- Requires ~16GB RAM
- Takes ~3 minutes to start
- Start with: `docker compose up -d controller redis webshop-std`

---

## Port Mapping Reference

| Task | Host Port | Worker Port |
|------|-----------|-------------|
| Controller | 5020 | 5020 |
| alfworld-std | 5021 | 5021 |
| dbbench-std | 5022 | 5021 |
| os-std | 5023 | 5021 |
| kg-std | 5024 | 5021 |
| webshop-std | 5025 | 5021 |

---

## License

Apache-2.0 - See [LICENSE](LICENSE) for details.

5 changes: 2 additions & 3 deletions configs/agents/api_agents.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
gpt-3.5-turbo-0613:
gpt-4o-mini:
import: "./openai-chat.yaml"
parameters:
name: "gpt-3.5-turbo-0613"
name: "gpt-4o-mini"
body:
model: "gpt-3.5-turbo-0613"
max_tokens: 512

text-davinci-003:
Expand Down
10 changes: 4 additions & 6 deletions configs/agents/openai-chat.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
module: src.client.agents.HTTPAgent
parameters:
url: https://api.openai.com/v1/chat/completions
url: https://algoverse-ab.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-08-01-preview
headers:
Content-Type: application/json
Authorization: Bearer <% PUT-YOUR-OPENAI-KEY-HERE %>
api-key: ${AZURE_OPENAI_API_KEY}
body:
temperature: 0
prompter:
name: role_content_dict
args:
agent_role: assistant
return_format: "{response[choices][0][message][content]}"
name: openai_passthrough
return_format: openai_chat
21 changes: 14 additions & 7 deletions configs/assignments/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@ import: definition.yaml

concurrency:
task:
dbbench-std: 5
os-std: 5
alfworld-std: 1
dbbench-std: 1
os-std: 1
kg-std: 1
webshop-std: 1
agent:
gpt-3.5-turbo-0613: 5
gpt-4o-mini: 1

assignments: # List[Assignment] | Assignment
- agent: # "task": List[str] | str , "agent": List[str] | str
- gpt-3.5-turbo-0613
- agent:
- gpt-4o-mini
task:
- dbbench-std
- os-std
# ===== UNCOMMENT THE TASK(S) YOU WANT TO RUN =====
- alfworld-std # House-holding tasks (ALFWorld)
# - dbbench-std # Database tasks
# - os-std # OS interaction tasks
# - kg-std # Knowledge graph tasks (requires freebase)
# - webshop-std # Web shopping tasks (requires ~16GB RAM)

output: "outputs/{TIMESTAMP}"
Loading