From 888098c6ca2d150a6b3898c930aa66f6ba4902de Mon Sep 17 00:00:00 2001
From: drewdrew <drewdrewthis@gmail.com>
Date: Wed, 14 Jan 2026 22:38:12 +0100
Subject: [PATCH 1/4] docs: add On-Platform Scenarios documentation

Add documentation for the new Scenarios feature (M1) that allows users to
create and run agent simulations directly on the LangWatch platform.

New pages:
- scenarios/overview.mdx - Introduction and key concepts
- scenarios/creating-scenarios.mdx - Creating and editing scenarios
- scenarios/targets.mdx - HTTP, LLM, and Prompt Config targets
- scenarios/running-scenarios.mdx - Executing and analyzing runs

Updates navigation to organize Agent Simulations into:
- On-Platform Scenarios (new visual authoring)
- Scenario SDK (existing code-based approach)

Note: Screenshots needed - placeholder image references included.

Closes langwatch/langwatch#1094

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs.json                        |  24 ++++-
 scenarios/creating-scenarios.mdx | 126 ++++++++++++++++++++++
 scenarios/overview.mdx           | 108 +++++++++++++++++++
 scenarios/running-scenarios.mdx  | 161 +++++++++++++++++++++++++++
 scenarios/targets.mdx            | 180 +++++++++++++++++++++++++++++++
 5 files changed, 594 insertions(+), 5 deletions(-)
 create mode 100644 scenarios/creating-scenarios.mdx
 create mode 100644 scenarios/overview.mdx
 create mode 100644 scenarios/running-scenarios.mdx
 create mode 100644 scenarios/targets.mdx
diff --git a/docs.json b/docs.json
index 45e08ce..ea7bcfb 100644
--- a/docs.json
+++ b/docs.json
@@ -66,11 +66,25 @@
             "group": "Agent Simulations",
             "pages": [
               "agent-simulations/introduction",
-              "agent-simulations/overview",
-              "agent-simulations/getting-started",
-              "agent-simulations/set-overview",
-              "agent-simulations/batch-runs",
-              "agent-simulations/individual-run"
+              {
+                "group": "On-Platform Scenarios",
+                "pages": [
+                  "scenarios/overview",
+                  "scenarios/creating-scenarios",
+                  "scenarios/targets",
+                  "scenarios/running-scenarios"
+                ]
+              },
+              {
+                "group": "Scenario SDK",
+                "pages": [
+                  "agent-simulations/overview",
+                  "agent-simulations/getting-started",
+                  "agent-simulations/set-overview",
+                  "agent-simulations/batch-runs",
+                  "agent-simulations/individual-run"
+                ]
+              }
             ]
           },
           {
diff --git a/scenarios/creating-scenarios.mdx b/scenarios/creating-scenarios.mdx
new file mode 100644
index 0000000..6c730fb
--- /dev/null
+++ b/scenarios/creating-scenarios.mdx
@@ -0,0 +1,126 @@
+---
+title: Creating Scenarios
+description: Learn how to create and edit scenarios on the LangWatch platform
+---
+
+# Creating Scenarios
+
+This guide walks you through creating scenarios in the LangWatch UI.
+
+## Accessing the Scenario Library
+
+Navigate to **Scenarios** in the left sidebar to open the Scenario Library. This is where all your project's scenarios are listed.
+
+<img src="/images/scenarios/scenario-library.png" alt="Scenario Library" />
+
+From here you can:
+- View all scenarios with their labels and last updated time
+- Filter scenarios by label
+- Create new scenarios
+- Click a scenario to edit it
+
+## Creating a New Scenario
+
+Click the **New Scenario** button to create a scenario. This opens the Scenario Editor.
+
+<img src="/images/scenarios/scenario-editor.png" alt="Scenario Editor" />
+
+### Step 1: Name Your Scenario
+
+Give your scenario a descriptive name that explains what it tests:
+
+- "Handles refund request politely"
+- "Recommends vegetarian recipes"
+- "Escalates frustrated customer to human"
+
+### Step 2: Define the Situation
+
+The **Situation** describes the context for the simulated user. Write it as a narrative that captures:
+
+- **Who** the user is (persona, mood, background)
+- **What** they're trying to accomplish
+- **Any constraints** or special circumstances
+
+**Example:**
+
+```
+The user is a frustrated customer who received the wrong item in their order.
+They've already tried the chatbot twice without success. They're running out of
+patience and want either a replacement shipped overnight or a full refund.
+They're not interested in store credit.
+```
+
+<Tip>
+  Be specific about the user's emotional state and constraints. This helps the
+  User Simulator generate realistic, challenging interactions.
+</Tip>
+
+### Step 3: Add Evaluation Criteria
+
+The **Criteria** (or Score) define how to evaluate the agent's behavior. Add criteria as natural language statements that should be true for the scenario to pass.
+
+Click **Add Criterion** and enter statements like:
+
+- "Agent should acknowledge the customer's frustration"
+- "Agent should offer a concrete solution within 3 messages"
+- "Agent should not ask the customer to repeat information"
+- "Agent should use a polite, empathetic tone throughout"
+
+<img src="/images/scenarios/criteria-list.png" alt="Criteria List" />
+
+**Tips for writing good criteria:**
+
+| Do | Don't |
+|----|-------|
+| Be specific and measurable | Use vague language ("be nice") |
+| Focus on observable behavior | Reference internal state |
+| Test one thing per criterion | Combine multiple requirements |
+| Include edge cases | Only test happy paths |
+
+### Step 4: Add Labels (Optional)
+
+Labels help organize scenarios in your library. Add labels to group scenarios by:
+
+- Feature area: `checkout`, `support`, `onboarding`
+- Agent type: `customer-service`, `sales`, `assistant`
+- Priority: `critical`, `regression`, `exploratory`
+
+## Editing Scenarios
+
+Click any scenario in the library to open it in the editor. All changes are auto-saved.
+
+<Warning>
+  Changes to a scenario don't affect past runs. Each run captures the scenario
+  state at execution time.
+</Warning>
+
+## Scenario Anatomy
+
+Here's how the scenario components map to the testing flow:
+
+```mermaid
+graph LR
+    S[Situation] --> US[User Simulator]
+    US --> A[Your Agent]
+    A --> US
+    C[Criteria] --> J[Judge]
+    US --> J
+    A --> J
+    J --> R[Pass/Fail]
+```
+
+1. The **Situation** configures the User Simulator's persona
+2. The User Simulator and your Agent have a conversation
+3. The **Criteria** configure the Judge's evaluation
+4. The Judge scores the conversation and determines pass/fail
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Configuring Targets" icon="bullseye" href="/scenarios/targets">
+    Connect your scenario to an agent
+  </Card>
+  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
+    Execute scenarios and view results
+  </Card>
+</CardGroup>
diff --git a/scenarios/overview.mdx b/scenarios/overview.mdx
new file mode 100644
index 0000000..df35a34
--- /dev/null
+++ b/scenarios/overview.mdx
@@ -0,0 +1,108 @@
+---
+title: Overview
+description: Create and run agent simulations directly on the LangWatch platform
+---
+
+# On-Platform Scenarios
+
+**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI - no code required. This is a visual, no-code companion to the [Scenario SDK](/agent-simulations/getting-started) for testing agents.
+
+<img src="/images/scenarios/scenario-library.png" alt="Scenario Library" />
+
+## When to Use On-Platform Scenarios
+
+| Use Case | On-Platform | SDK |
+|----------|-------------|-----|
+| Quick iteration and experimentation | Best | Good |
+| Non-technical team members (PMs, QA) | Best | - |
+| Simple behavioral tests | Best | Good |
+| CI/CD integration | - | Best |
+| Complex multi-turn scripts | Good | Best |
+| Programmatic assertions | - | Best |
+| Dataset-driven testing | Coming soon | Best |
+
+**Use On-Platform Scenarios when:**
+- You want to quickly test agent behavior without writing code
+- Non-technical team members need to create or run tests
+- You're iterating on prompts and want fast feedback
+- You need to demonstrate agent behavior to stakeholders
+
+**Use the SDK when:**
+- You need to run tests in CI/CD pipelines
+- You require complex programmatic assertions
+- You're building automated regression test suites
+- You need fine-grained control over conversation flow
+
+## What is a Scenario?
+
+A Scenario is a **3-part specification** that defines how to test an agent:
+
+### 1. Situation (Context)
+
+The **Situation** describes the context and persona of the simulated user. It tells the User Simulator how to behave during the conversation.
+
+```
+It's Saturday evening. The user is hungry and tired but doesn't want to order
+out. They're looking for a quick, easy vegetarian recipe they can make with
+common pantry ingredients.
+```
+
+### 2. Script (Conversation Flow)
+
+The **Script** defines the turn-by-turn flow of the conversation. For M1, scenarios use auto-pilot mode where the User Simulator drives the conversation based on the Situation.
+
+<Note>
+  The visual Turn Builder for creating custom scripts is coming in M2 (Jan 31).
+</Note>
+
+### 3. Score (Evaluation Criteria)
+
+The **Score** is a list of criteria the Judge uses to evaluate the agent's behavior. Each criterion is a natural language statement that should be true for the scenario to pass.
+
+```
+- Agent should not ask more than two follow-up questions
+- Agent should generate a recipe
+- Recipe should include a list of ingredients
+- Recipe should include step-by-step cooking instructions
+- Recipe should be vegetarian and not include any meat
+```
+
+## Key Concepts
+
+### Targets
+
+A **Target** is what the scenario tests against. It defines how the platform invokes your agent:
+
+- **HTTP**: Call an external API endpoint
+- **LLM**: Direct model calls using your project's provider keys
+- **Prompt Config**: Use a versioned prompt from Prompt Management
+
+See [Configuring Targets](/scenarios/targets) for details.
+
+### Runs
+
+A **Run** is a single execution of a scenario against a target. Each run produces:
+- A conversation trace showing all messages
+- Evaluation scores for each criterion
+- Pass/fail status
+
+### Labels
+
+**Labels** help organize scenarios in your library. Use them to group scenarios by feature, agent type, or any other taxonomy that makes sense for your team.
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
+    Learn how to create and edit scenarios
+  </Card>
+  <Card title="Configuring Targets" icon="bullseye" href="/scenarios/targets">
+    Set up HTTP, LLM, or Prompt Config targets
+  </Card>
+  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
+    Execute scenarios and analyze results
+  </Card>
+  <Card title="SDK Integration" icon="code" href="/agent-simulations/getting-started">
+    Use the Scenario SDK for CI/CD
+  </Card>
+</CardGroup>
diff --git a/scenarios/running-scenarios.mdx b/scenarios/running-scenarios.mdx
new file mode 100644
index 0000000..d93272d
--- /dev/null
+++ b/scenarios/running-scenarios.mdx
@@ -0,0 +1,161 @@
+---
+title: Running Scenarios
+description: Execute scenarios and analyze results in the Run Visualizer
+---
+
+# Running Scenarios
+
+Once you've created a scenario and configured a target, you can run it to test your agent's behavior.
+
+## Quick Run
+
+From the Scenario Editor, click the **Run** button to execute the scenario against the configured target.
+
+<img src="/images/scenarios/quick-run.png" alt="Quick Run Button" />
+
+The scenario runs immediately and you'll see real-time progress as:
+
+1. The User Simulator generates the first message based on the Situation
+2. Your agent (Target) responds
+3. The conversation continues until completion
+4. The Judge evaluates against your Criteria
+
+## Run Visualizer
+
+After a run completes, the Run Visualizer shows the full conversation and evaluation results.
+
+<img src="/images/scenarios/run-visualizer.png" alt="Run Visualizer" />
+
+### Conversation View
+
+The left panel shows the full conversation trace:
+
+- **User messages** (blue): Generated by the User Simulator
+- **Agent messages** (gray): Responses from your target
+- **Tool calls** (if any): Actions taken by the agent
+
+Click any message to see details like:
+- Raw content
+- Timestamp
+- Token count
+- Tool call arguments
+
+### Evaluation Results
+
+The right panel shows evaluation results:
+
+| Field | Description |
+|-------|-------------|
+| **Status** | Overall pass/fail |
+| **Score** | Percentage of criteria passed |
+| **Duration** | Total run time |
+
+### Criteria Breakdown
+
+Each criterion shows:
+- **Pass/Fail** indicator
+- **Reasoning** from the Judge explaining the evaluation
+
+<img src="/images/scenarios/criteria-results.png" alt="Criteria Results" />
+
+<Tip>
+  The Judge's reasoning helps you understand exactly why a criterion passed or
+  failed. Use this to refine your criteria or identify agent issues.
+</Tip>
+
+## Analyzing Failed Runs
+
+When a scenario fails, use the Run Visualizer to diagnose the issue:
+
+### 1. Check the Criteria Breakdown
+
+Look at which criteria failed and read the Judge's reasoning. Common issues:
+
+| Failed Because | Likely Issue |
+|----------------|--------------|
+| "Agent did not acknowledge..." | Missing empathy in responses |
+| "Agent asked too many questions" | Overly verbose conversation flow |
+| "Agent recommended wrong category" | Knowledge or retrieval issue |
+| "Conversation ended abruptly" | Error handling or timeout |
+
+### 2. Review the Conversation
+
+Step through the conversation to find where things went wrong:
+- Did the agent misunderstand the user's intent?
+- Did the agent get stuck in a loop?
+- Did an error interrupt the flow?
+
+### 3. Check Tool Calls
+
+If your agent uses tools, verify:
+- Were the right tools called?
+- Were arguments correct?
+- Did tool results get used properly?
+
+## Run History
+
+Access past runs from the Scenario Editor by clicking **View Runs**. This shows all previous executions with:
+
+- Timestamp
+- Target used
+- Pass/fail status
+- Quick link to the Run Visualizer
+
+<img src="/images/scenarios/run-history.png" alt="Run History" />
+
+Use run history to:
+- **Track progress** as you iterate on your agent
+- **Compare runs** before and after changes
+- **Identify regressions** when a previously passing scenario fails
+
+## Best Practices
+
+### Iterate on Criteria
+
+If a scenario fails unexpectedly, consider whether the criteria are:
+- **Too strict**: Requiring exact wording or behavior
+- **Too vague**: Not specific enough for the Judge to evaluate
+- **Conflicting**: Multiple criteria that can't all be satisfied
+
+### Test Edge Cases
+
+Create scenarios for:
+- Happy paths (expected behavior)
+- Error conditions (invalid inputs, timeouts)
+- Edge cases (unusual requests, adversarial users)
+- Multi-turn complexity (long conversations, topic changes)
+
+### Use Labels for Organization
+
+As your scenario library grows, use labels to:
+- Filter to relevant scenarios quickly
+- Group scenarios for batch runs (coming in M2)
+- Track coverage across features
+
+## Coming Soon
+
+<CardGroup cols={2}>
+  <Card title="Suites (M2)" icon="layer-group">
+    Run multiple scenarios against multiple targets in batch
+  </Card>
+  <Card title="Turn Builder (M2)" icon="list-timeline">
+    Create custom conversation scripts with fixed turns
+  </Card>
+  <Card title="Dataset Mode (M2)" icon="database">
+    Run scenarios with different inputs from a dataset
+  </Card>
+  <Card title="AI Generation (M3)" icon="wand-magic-sparkles">
+    Generate scenarios automatically from agent descriptions
+  </Card>
+</CardGroup>
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="SDK Integration" icon="code" href="/agent-simulations/getting-started">
+    Run scenarios in CI/CD with the SDK
+  </Card>
+  <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
+    Create more scenarios to expand coverage
+  </Card>
+</CardGroup>
diff --git a/scenarios/targets.mdx b/scenarios/targets.mdx
new file mode 100644
index 0000000..9bfd956
--- /dev/null
+++ b/scenarios/targets.mdx
@@ -0,0 +1,180 @@
+---
+title: Configuring Targets
+description: Set up HTTP, LLM, or Prompt Config targets for your scenarios
+---
+
+# Configuring Targets
+
+A **Target** defines how the LangWatch platform invokes your agent during a scenario run. You can configure three types of targets:
+
+| Target Type | Use Case |
+|-------------|----------|
+| **HTTP** | External API endpoints (production agents, staging environments) |
+| **LLM** | Direct model calls for testing prompts |
+| **Prompt Config** | Versioned prompts from Prompt Management |
+
+## Accessing the Target Drawer
+
+From the Scenario Editor, click **Configure Target** to open the Target Drawer.
+
+<img src="/images/scenarios/target-drawer.png" alt="Target Drawer" />
+
+## HTTP Target
+
+Use HTTP targets to test agents deployed as API endpoints.
+
+### Configuration
+
+| Field | Description |
+|-------|-------------|
+| **URL** | The endpoint to call (e.g., `https://api.example.com/chat`) |
+| **Method** | HTTP method (typically `POST`) |
+| **Headers** | Request headers (authentication, content-type) |
+| **Body Template** | JSON body with `{{messages}}` placeholder |
+
+<img src="/images/scenarios/http-target-form.png" alt="HTTP Target Form" />
+
+### Body Template
+
+The body template supports variable interpolation. Use `{{messages}}` to inject the conversation history:
+
+```json
+{
+  "messages": {{messages}},
+  "stream": false
+}
+```
+
+The `{{messages}}` placeholder is replaced with the OpenAI-format message array:
+
+```json
+[
+  {"role": "user", "content": "Hello!"},
+  {"role": "assistant", "content": "Hi! How can I help?"},
+  {"role": "user", "content": "I need a refund"}
+]
+```
+
+### Authentication
+
+Add authentication headers as needed:
+
+```
+Authorization: Bearer sk-your-api-key
+X-API-Key: your-api-key
+```
+
+<Warning>
+  Store sensitive API keys securely. Consider using environment variables or a
+  secrets manager for production deployments.
+</Warning>
+
+### Expected Response Format
+
+Your endpoint should return a response with the assistant's message:
+
+```json
+{
+  "choices": [
+    {
+      "message": {
+        "role": "assistant",
+        "content": "I'd be happy to help with your refund..."
+      }
+    }
+  ]
+}
+```
+
+Or a simple string response:
+
+```json
+{
+  "response": "I'd be happy to help with your refund..."
+}
+```
+
+## LLM Target
+
+Use LLM targets to test prompts directly against a model using your project's provider keys.
+
+### Configuration
+
+| Field | Description |
+|-------|-------------|
+| **Model** | The model to use (e.g., `gpt-4`, `claude-3-opus`) |
+| **System Prompt** | The system message for the agent |
+| **Temperature** | Sampling temperature (0-2) |
+
+<img src="/images/scenarios/llm-target-form.png" alt="LLM Target Form" />
+
+### Model Selection
+
+Select from any model configured in your project's Model Providers. The platform uses your existing provider API keys.
+
+### System Prompt
+
+Define the agent's behavior with a system prompt:
+
+```
+You are a helpful customer service agent for Acme Corp. You help customers
+with orders, returns, and product questions. Always be polite and empathetic.
+If you can't resolve an issue, offer to escalate to a human agent.
+```
+
+<Tip>
+  LLM targets are great for rapid iteration on prompts. Test different system
+  prompts without deploying changes to your production agent.
+</Tip>
+
+## Prompt Config Target
+
+Use Prompt Config targets to test versioned prompts from [Prompt Management](/prompt-management/overview).
+
+### Configuration
+
+| Field | Description |
+|-------|-------------|
+| **Prompt** | Select a prompt from your project |
+| **Version** | Select a specific version or use latest |
+
+<img src="/images/scenarios/prompt-config-form.png" alt="Prompt Config Target Form" />
+
+### Benefits
+
+- **Version Control**: Test specific prompt versions
+- **A/B Testing**: Compare different prompt versions
+- **Consistency**: Ensure scenarios use the same prompt as production
+
+## Choosing a Target Type
+
+| Scenario | Recommended Target |
+|----------|-------------------|
+| Testing a deployed agent | HTTP |
+| Iterating on a prompt | LLM |
+| Regression testing prompts | Prompt Config |
+| Testing agent tools/integrations | HTTP |
+| Quick prototyping | LLM |
+
+## Multiple Targets
+
+You can run the same scenario against multiple targets to compare behavior. This is useful for:
+
+- **A/B testing** different prompt versions
+- **Regression testing** after changes
+- **Benchmarking** different models
+
+<Note>
+  Suites for running scenarios against multiple targets are coming in M2 (Jan 31).
+</Note>
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
+    Execute scenarios and analyze results
+  </Card>
+  <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">
+    Learn about versioned prompts
+  </Card>
+</CardGroup>

From 7ebe7b57afd8dbe670d6e1f2eeba143b2d20bf9c Mon Sep 17 00:00:00 2001
From: drewdrew <drewdrewthis@gmail.com>
Date: Wed, 14 Jan 2026 22:56:43 +0100
Subject: [PATCH 2/4] docs: revise Scenario docs based on implementation review

Key changes:
- Remove separate targets.mdx - consolidate into running-scenarios.mdx
- Fix target types: HTTP Agent and Prompt (not LLM + Prompt Config)
- Clarify scenarios vs simulations terminology
- Add comprehensive "Writing Good Criteria" guidance
- Update agent-simulations introduction with On-Platform vs SDK comparison
- Reference target selector as unified dropdown, not separate forms
- Document Save and Run flow with target memory

The documentation now accurately reflects the M1 implementation where:
- Targets are HTTP Agents or Prompts (not a separate "LLM" type)
- Target selection is via unified TargetSelector dropdown
- Results flow to existing Simulations visualizer

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 agent-simulations/introduction.mdx |  44 ++++-
 docs.json                          |   1 -
 scenarios/creating-scenarios.mdx   | 228 ++++++++++++++++-------
 scenarios/overview.mdx             | 136 ++++++++------
 scenarios/running-scenarios.mdx    | 279 +++++++++++++++++++----------
 scenarios/targets.mdx              | 180 -------------------
 6 files changed, 472 insertions(+), 396 deletions(-)
 delete mode 100644 scenarios/targets.mdx

diff --git a/agent-simulations/introduction.mdx b/agent-simulations/introduction.mdx
index c93369c..277e265 100644
--- a/agent-simulations/introduction.mdx
+++ b/agent-simulations/introduction.mdx
@@ -83,17 +83,39 @@ script=[
 - **Simple integration** - Just implement one `call()` method
 - **Multi-language support** - Python, TypeScript, and Go
 
+## Two Ways to Create Simulations
+
+LangWatch offers two approaches to agent testing:
+
+### On-Platform Scenarios (No Code)
+
+Create and run simulations directly in the LangWatch UI:
+- Define situations and evaluation criteria visually
+- Run against HTTP agents or managed prompts
+- Ideal for quick iteration and non-technical team members
+
+[Get started with On-Platform Scenarios →](/scenarios/overview)
+
+### Scenario SDK (Code-Based)
+
+Write simulations in code for maximum control:
+- Full programmatic control over conversation flow
+- Complex assertions and tool call verification
+- CI/CD integration for automated testing
+
+[Get started with the Scenario SDK →](/agent-simulations/getting-started)
+
+Both approaches produce simulations that appear in the same visualizer, so you can mix and match based on your needs.
+
 ## Visualizing Simulations in LangWatch
 
-Once you've set up your agent tests with Scenario, LangWatch provides powerful visualization tools to:
+The Simulations visualizer helps you analyze results from both On-Platform Scenarios and SDK-based tests:
 
 - **Organize simulations** into sets and batches
 - **Debug agent behavior** by stepping through conversations
 - **Track performance** over time with run history
 - **Collaborate** with your team on agent improvements
 
-The rest of this documentation will show you how to use LangWatch's simulation visualizer to get the most out of your agent testing.
-
 <img
   src="/images/simulations/simulation-set-overview.png"
   alt="Simulations Sets"
@@ -102,8 +124,14 @@ The rest of this documentation will show you how to use LangWatch's simulation v
 
 ## Next Steps
 
-- [Overview](/agent-simulations/overview) - Learn about LangWatch's simulation visualizer
-- [Getting Started](/agent-simulations/getting-started) - Set up your first simulation
-- [Individual Run Analysis](/agent-simulations/individual-run) - Learn how to debug specific scenarios
-- [Batch Runs](/agent-simulations/batch-runs) - Understand how to organize multiple tests
-- [Scenario Documentation](https://langwatch.ai/scenario/) - Deep dive into the testing framework
+### On-Platform Scenarios
+- [Overview](/scenarios/overview) - Create scenarios in the UI
+- [Creating Scenarios](/scenarios/creating-scenarios) - Write effective test cases
+- [Running Scenarios](/scenarios/running-scenarios) - Execute and analyze results
+
+### Scenario SDK
+- [Visualizer Overview](/agent-simulations/overview) - Learn about the simulation visualizer
+- [SDK Getting Started](/agent-simulations/getting-started) - Set up your first code-based simulation
+- [Individual Run Analysis](/agent-simulations/individual-run) - Debug specific scenarios
+- [Batch Runs](/agent-simulations/batch-runs) - Organize multiple tests
+- [Scenario Documentation](https://langwatch.ai/scenario/) - Deep dive into the SDK
diff --git a/docs.json b/docs.json
index ea7bcfb..e78411f 100644
--- a/docs.json
+++ b/docs.json
@@ -71,7 +71,6 @@
                 "pages": [
                   "scenarios/overview",
                   "scenarios/creating-scenarios",
-                  "scenarios/targets",
                   "scenarios/running-scenarios"
                 ]
               },
diff --git a/scenarios/creating-scenarios.mdx b/scenarios/creating-scenarios.mdx
index 6c730fb..6d1876e 100644
--- a/scenarios/creating-scenarios.mdx
+++ b/scenarios/creating-scenarios.mdx
@@ -1,47 +1,56 @@
 ---
 title: Creating Scenarios
-description: Learn how to create and edit scenarios on the LangWatch platform
+description: Write effective scenarios with good situations and criteria
+sidebarTitle: Creating Scenarios
 ---
 
-# Creating Scenarios
-
-This guide walks you through creating scenarios in the LangWatch UI.
+This guide walks you through creating scenarios in the LangWatch UI and provides best practices for writing effective test cases.
 
 ## Accessing the Scenario Library
 
-Navigate to **Scenarios** in the left sidebar to open the Scenario Library. This is where all your project's scenarios are listed.
+Navigate to **Scenarios** in the left sidebar to open the Scenario Library.
 
-<img src="/images/scenarios/scenario-library.png" alt="Scenario Library" />
+<Frame>
+  <img src="/images/scenarios/scenario-library.png" alt="Scenario Library" />
+</Frame>
 
 From here you can:
 - View all scenarios with their labels and last updated time
 - Filter scenarios by label
 - Create new scenarios
-- Click a scenario to edit it
+- Click any scenario to edit it
 
 ## Creating a New Scenario
 
-Click the **New Scenario** button to create a scenario. This opens the Scenario Editor.
+Click **New Scenario** to open the Scenario Editor.
 
-<img src="/images/scenarios/scenario-editor.png" alt="Scenario Editor" />
+<Frame>
+  <img src="/images/scenarios/scenario-editor.png" alt="Scenario Editor" />
+</Frame>
 
 ### Step 1: Name Your Scenario
 
 Give your scenario a descriptive name that explains what it tests:
 
-- "Handles refund request politely"
-- "Recommends vegetarian recipes"
-- "Escalates frustrated customer to human"
+**Good names:**
+- "Handles refund request for damaged item"
+- "Recommends vegetarian recipes when asked"
+- "Escalates frustrated customer to human agent"
+
+**Avoid vague names:**
+- "Test 1"
+- "Refund"
+- "Customer service"
 
-### Step 2: Define the Situation
+### Step 2: Write the Situation
 
-The **Situation** describes the context for the simulated user. Write it as a narrative that captures:
+The **Situation** describes the simulated user's context, persona, and goals. Write it as a narrative that captures:
 
 - **Who** the user is (persona, mood, background)
-- **What** they're trying to accomplish
-- **Any constraints** or special circumstances
+- **What** they want to accomplish
+- **Constraints** or special circumstances
 
-**Example:**
+**Example - Support scenario:**
 
 ```
 The user is a frustrated customer who received the wrong item in their order.
@@ -50,69 +59,166 @@ patience and want either a replacement shipped overnight or a full refund.
 They're not interested in store credit.
 ```
 
+**Example - Sales scenario:**
+
+```
+The user is researching project management tools for their 15-person startup.
+They currently use spreadsheets and are overwhelmed. Budget is limited to $50
+per user per month. They need something that integrates with Slack and Google
+Workspace.
+```
+
 <Tip>
-  Be specific about the user's emotional state and constraints. This helps the
-  User Simulator generate realistic, challenging interactions.
+  Be specific about emotional state and constraints. Vague situations produce
+  generic conversations that don't test edge cases.
 </Tip>
 
-### Step 3: Add Evaluation Criteria
+### Step 3: Define Criteria
 
-The **Criteria** (or Score) define how to evaluate the agent's behavior. Add criteria as natural language statements that should be true for the scenario to pass.
+**Criteria** are natural language statements that should be true for the scenario to pass. The Judge evaluates each criterion and explains its reasoning.
 
-Click **Add Criterion** and enter statements like:
+Click **Add Criterion** and enter evaluation statements:
 
-- "Agent should acknowledge the customer's frustration"
-- "Agent should offer a concrete solution within 3 messages"
-- "Agent should not ask the customer to repeat information"
-- "Agent should use a polite, empathetic tone throughout"
+<Frame>
+  <img src="/images/scenarios/criteria-list.png" alt="Criteria List" />
+</Frame>
 
-<img src="/images/scenarios/criteria-list.png" alt="Criteria List" />
+## Writing Good Criteria
 
-**Tips for writing good criteria:**
+Criteria are the heart of your scenario. Well-written criteria catch real issues; poorly-written ones create noise.
 
-| Do | Don't |
-|----|-------|
-| Be specific and measurable | Use vague language ("be nice") |
-| Focus on observable behavior | Reference internal state |
-| Test one thing per criterion | Combine multiple requirements |
-| Include edge cases | Only test happy paths |
+### Be Specific and Observable
 
-### Step 4: Add Labels (Optional)
+| Good | Bad |
+|------|-----|
+| Agent acknowledges the customer's frustration within the first 2 messages | Agent is empathetic |
+| Agent offers a concrete solution (refund, replacement, or escalation) | Agent helps the customer |
+| Agent does not ask the customer to repeat their order number | Agent doesn't waste time |
 
-Labels help organize scenarios in your library. Add labels to group scenarios by:
+### Test One Thing Per Criterion
 
-- Feature area: `checkout`, `support`, `onboarding`
-- Agent type: `customer-service`, `sales`, `assistant`
-- Priority: `critical`, `regression`, `exploratory`
+| Good | Bad |
+|------|-----|
+| Agent uses a polite tone throughout | Agent is polite and helpful and resolves the issue quickly |
+| Agent offers a solution within 3 messages | Agent is fast and accurate |
 
-## Editing Scenarios
+### Include Both Positive and Negative Checks
 
-Click any scenario in the library to open it in the editor. All changes are auto-saved.
+```
+✓ Agent should offer to process a refund
+✓ Agent should not suggest store credit after user declined it
+✓ Agent should apologize for the inconvenience
+✓ Agent should not ask for the order number more than once
+```
 
-<Warning>
-  Changes to a scenario don't affect past runs. Each run captures the scenario
-  state at execution time.
-</Warning>
+### Cover Different Aspects
+
+**Behavioral criteria:**
+- "Agent should not ask more than 2 clarifying questions"
+- "Agent should summarize the user's issue before proposing a solution"
+
+**Content criteria:**
+- "Recipe should include a list of ingredients with quantities"
+- "Response should mention the 30-day return policy"
+
+**Tone criteria:**
+- "Agent should maintain a professional but friendly tone"
+- "Agent should not use corporate jargon"
+
+**Safety criteria:**
+- "Agent should not make promises it cannot keep"
+- "Agent should not disclose other customers' information"
+
+### Avoid Criteria the Judge Can't Evaluate
+
+The Judge can only see the conversation. It cannot:
+- Check if a database was updated
+- Verify if an email was sent
+- Confirm tool calls succeeded (use the SDK for this)
+
+## Adding Labels
+
+Labels help organize your scenario library. Click the label input to add tags.
 
-## Scenario Anatomy
+**Common labeling strategies:**
 
-Here's how the scenario components map to the testing flow:
+| Category | Examples |
+|----------|----------|
+| Feature area | `checkout`, `support`, `onboarding`, `search` |
+| Agent type | `customer-service`, `sales`, `assistant` |
+| Priority | `critical`, `regression`, `exploratory` |
+| User type | `new-user`, `power-user`, `frustrated-user` |
 
-```mermaid
-graph LR
-    S[Situation] --> US[User Simulator]
-    US --> A[Your Agent]
-    A --> US
-    C[Criteria] --> J[Judge]
-    US --> J
-    A --> J
-    J --> R[Pass/Fail]
+## Scenario Templates
+
+Here are templates for common scenario types:
+
+### Customer Support
+
+```
+Name: Handles [issue type] for [customer type]
+
+Situation:
+The user is a [persona] who [problem description]. They have [relevant context]
+and want [specific outcome]. They are feeling [emotional state].
+
+Criteria:
+- Agent acknowledges the issue within first response
+- Agent asks relevant clarifying questions (no more than 2)
+- Agent provides a clear solution or next steps
+- Agent maintains empathetic tone throughout
+- Agent does not make promises outside policy
+```
+
+### Product Recommendation
+
+```
+Name: Recommends [product type] for [use case]
+
+Situation:
+The user is looking for [product category] because [reason]. They need
+[specific requirements] and have [constraints]. They're comparing options
+and want honest recommendations.
+
+Criteria:
+- Agent asks about key requirements before recommending
+- Recommendations match stated requirements
+- Agent explains why each recommendation fits
+- Agent mentions relevant tradeoffs
+- Agent does not oversell or make exaggerated claims
+```
+
+### Information Retrieval
+
+```
+Name: Answers [topic] question accurately
+
+Situation:
+The user needs to know [specific information] for [reason]. They have
+[level of expertise] and prefer [communication style].
+
+Criteria:
+- Agent provides accurate information
+- Agent cites sources or documentation when available
+- Agent admits uncertainty rather than guessing
+- Response is appropriately detailed for the question
+- Agent offers to clarify or expand if needed
 ```
 
-1. The **Situation** configures the User Simulator's persona
-2. The User Simulator and your Agent have a conversation
-3. The **Criteria** configure the Judge's evaluation
-4. The Judge scores the conversation and determines pass/fail
+## Iterating on Scenarios
+
+Scenarios improve through iteration:
+
+1. **Start simple**: Begin with core criteria that capture the main behavior
+2. **Run and review**: Execute the scenario and read the Judge's reasoning
+3. **Refine criteria**: If criteria pass/fail unexpectedly, adjust the wording
+4. **Add edge cases**: Once the happy path works, add criteria for edge cases
+5. **Use labels**: Tag scenarios by iteration stage (`draft`, `validated`, `production`)
+
+<Warning>
+  Editing a scenario doesn't affect past runs. Each run captures the scenario
+  state at execution time.
+</Warning>
 
 ## Next Steps
 
@@ -121,6 +227,6 @@ graph LR
     Connect your scenario to an agent
   </Card>
   <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
-    Execute scenarios and view results
+    Execute and analyze results
   </Card>
 </CardGroup>
diff --git a/scenarios/overview.mdx b/scenarios/overview.mdx
index df35a34..b5f1cc5 100644
--- a/scenarios/overview.mdx
+++ b/scenarios/overview.mdx
@@ -1,43 +1,57 @@
 ---
-title: Overview
-description: Create and run agent simulations directly on the LangWatch platform
+title: On-Platform Scenarios
+description: Create and run agent simulations directly in the LangWatch UI without writing code
+sidebarTitle: Overview
 ---
 
-# On-Platform Scenarios
+**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI. This is a visual, no-code alternative to the [Scenario SDK](https://langwatch.ai/scenario/) for testing agents.
 
-**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI - no code required. This is a visual, no-code companion to the [Scenario SDK](/agent-simulations/getting-started) for testing agents.
+<Frame>
+  <img src="/images/scenarios/scenario-library.png" alt="Scenario Library showing a list of scenarios with labels and run status" />
+</Frame>
 
-<img src="/images/scenarios/scenario-library.png" alt="Scenario Library" />
+## Scenarios vs. Simulations
 
-## When to Use On-Platform Scenarios
+Understanding the terminology:
+
+| Term | What it means |
+|------|---------------|
+| **Scenario** | A test case definition: the situation, criteria, and configuration |
+| **Simulation** | An execution of a scenario against a target, producing a conversation trace |
+| **Run** | A single simulation execution with its results |
+| **Set** | A group of related scenario runs (used by the SDK) |
+
+**On-Platform Scenarios** are test definitions you create in the UI. When you run a scenario against a target, it produces a **simulation** that you can view in the [Simulations visualizer](/agent-simulations/overview).
+
+## When to Use On-Platform vs. SDK
 
 | Use Case | On-Platform | SDK |
-|----------|-------------|-----|
-| Quick iteration and experimentation | Best | Good |
-| Non-technical team members (PMs, QA) | Best | - |
-| Simple behavioral tests | Best | Good |
-| CI/CD integration | - | Best |
-| Complex multi-turn scripts | Good | Best |
-| Programmatic assertions | - | Best |
-| Dataset-driven testing | Coming soon | Best |
-
-**Use On-Platform Scenarios when:**
-- You want to quickly test agent behavior without writing code
-- Non-technical team members need to create or run tests
-- You're iterating on prompts and want fast feedback
-- You need to demonstrate agent behavior to stakeholders
-
-**Use the SDK when:**
-- You need to run tests in CI/CD pipelines
-- You require complex programmatic assertions
-- You're building automated regression test suites
-- You need fine-grained control over conversation flow
+|----------|:-----------:|:---:|
+| Quick iteration and experimentation | ✓ | |
+| Non-technical team members (PMs, QA) | ✓ | |
+| Simple behavioral tests | ✓ | ✓ |
+| CI/CD pipeline integration | | ✓ |
+| Complex multi-turn scripts | | ✓ |
+| Programmatic assertions | | ✓ |
+| Dataset-driven testing | Coming soon | ✓ |
+
+**Choose On-Platform Scenarios when you want to:**
+- Quickly test agent behavior without writing code
+- Enable non-technical team members to create and run tests
+- Iterate on prompts with fast visual feedback
+- Demonstrate agent behavior to stakeholders
+
+**Choose the [Scenario SDK](https://langwatch.ai/scenario/) when you need to:**
+- Run tests in CI/CD pipelines
+- Write complex programmatic assertions
+- Build automated regression test suites
+- Define custom conversation scripts with precise control
 
 ## What is a Scenario?
 
-A Scenario is a **3-part specification** that defines how to test an agent:
+A Scenario is a test case with three parts:
 
-### 1. Situation (Context)
+### 1. Situation
 
 The **Situation** describes the context and persona of the simulated user. It tells the User Simulator how to behave during the conversation.
 
@@ -47,17 +61,17 @@ out. They're looking for a quick, easy vegetarian recipe they can make with
 common pantry ingredients.
 ```
 
-### 2. Script (Conversation Flow)
+### 2. Script
 
-The **Script** defines the turn-by-turn flow of the conversation. For M1, scenarios use auto-pilot mode where the User Simulator drives the conversation based on the Situation.
+The **Script** defines the conversation flow. In the current release, scenarios run in autopilot mode where the User Simulator drives the conversation based on the Situation.
 
 <Note>
-  The visual Turn Builder for creating custom scripts is coming in M2 (Jan 31).
+  The visual Turn Builder for creating custom conversation scripts is coming in a future release.
 </Note>
 
-### 3. Score (Evaluation Criteria)
+### 3. Criteria
 
-The **Score** is a list of criteria the Judge uses to evaluate the agent's behavior. Each criterion is a natural language statement that should be true for the scenario to pass.
+The **Criteria** (or Score) define how to evaluate the agent's behavior. Each criterion is a natural language statement that should be true for the scenario to pass.
 
 ```
 - Agent should not ask more than two follow-up questions
@@ -69,40 +83,58 @@ The **Score** is a list of criteria the Judge uses to evaluate the agent's behav
 
 ## Key Concepts
 
-### Targets
+### What to Test Against
 
-A **Target** is what the scenario tests against. It defines how the platform invokes your agent:
+When you run a scenario, you choose what to test:
 
-- **HTTP**: Call an external API endpoint
-- **LLM**: Direct model calls using your project's provider keys
-- **Prompt Config**: Use a versioned prompt from Prompt Management
+- **HTTP Agent**: Call an external API endpoint (your deployed agent)
+- **Prompt**: Use a versioned prompt from [Prompt Management](/prompt-management/overview)
 
-See [Configuring Targets](/scenarios/targets) for details.
+See [Running Scenarios](/scenarios/running-scenarios) for details on setting up each option.
 
-### Runs
+### Labels
 
-A **Run** is a single execution of a scenario against a target. Each run produces:
-- A conversation trace showing all messages
-- Evaluation scores for each criterion
-- Pass/fail status
+**Labels** help organize scenarios in your library. Use them to group scenarios by feature, agent type, priority, or any taxonomy that works for your team.
 
-### Labels
+## Architecture
 
-**Labels** help organize scenarios in your library. Use them to group scenarios by feature, agent type, or any other taxonomy that makes sense for your team.
+When you run a scenario, here's what happens:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     LangWatch Platform                      │
+│                                                             │
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐ │
+│  │  Scenario   │───▶│    User     │◀──▶│   Your Agent    │ │
+│  │ (Situation) │    │  Simulator  │    │    (Target)     │ │
+│  └─────────────┘    └─────────────┘    └─────────────────┘ │
+│                            │                                │
+│                            ▼                                │
+│  ┌─────────────┐    ┌─────────────┐                        │
+│  │  Criteria   │───▶│    Judge    │───▶ Pass/Fail         │
+│  └─────────────┘    └─────────────┘                        │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+
+1. The **Situation** configures the User Simulator's persona and goals
+2. The **User Simulator** and your **Target** have a multi-turn conversation
+3. The **Judge** evaluates the conversation against your **Criteria**
+4. The result (pass/fail with reasoning) is displayed in the Run Visualizer
 
 ## Next Steps
 
 <CardGroup cols={2}>
   <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
-    Learn how to create and edit scenarios
-  </Card>
-  <Card title="Configuring Targets" icon="bullseye" href="/scenarios/targets">
-    Set up HTTP, LLM, or Prompt Config targets
+    Write effective scenarios with good criteria
   </Card>
   <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
     Execute scenarios and analyze results
   </Card>
-  <Card title="SDK Integration" icon="code" href="/agent-simulations/getting-started">
-    Use the Scenario SDK for CI/CD
+  <Card title="Simulations Visualizer" icon="chart-line" href="/agent-simulations/overview">
+    Analyze simulation results
+  </Card>
+  <Card title="Scenario SDK" icon="code" href="https://langwatch.ai/scenario/">
+    Use the SDK for CI/CD integration
   </Card>
 </CardGroup>
diff --git a/scenarios/running-scenarios.mdx b/scenarios/running-scenarios.mdx
index d93272d..8d638af 100644
--- a/scenarios/running-scenarios.mdx
+++ b/scenarios/running-scenarios.mdx
@@ -1,161 +1,252 @@
 ---
 title: Running Scenarios
-description: Execute scenarios and analyze results in the Run Visualizer
+description: Execute scenarios against HTTP agents or prompts and analyze results
+sidebarTitle: Running Scenarios
 ---
 
-# Running Scenarios
+Once you've created a scenario, you can run it against your agent to test its behavior.
 
-Once you've created a scenario and configured a target, you can run it to test your agent's behavior.
+## Choosing What to Test
 
-## Quick Run
+When you run a scenario, you select what to test against:
 
-From the Scenario Editor, click the **Run** button to execute the scenario against the configured target.
+| Option | Description |
+|--------|-------------|
+| **HTTP Agent** | An external API endpoint (your deployed agent) |
+| **Prompt** | A versioned prompt from [Prompt Management](/prompt-management/overview) |
 
-<img src="/images/scenarios/quick-run.png" alt="Quick Run Button" />
+The selector shows both options grouped by type:
 
-The scenario runs immediately and you'll see real-time progress as:
+<Frame>
+  <img src="/images/scenarios/target-selector.png" alt="Selector showing HTTP agents and prompts" />
+</Frame>
 
-1. The User Simulator generates the first message based on the Situation
-2. Your agent (Target) responds
-3. The conversation continues until completion
-4. The Judge evaluates against your Criteria
+## Running Against an HTTP Agent
 
-## Run Visualizer
+Use HTTP agents to test agents deployed as API endpoints. This is the most common option for testing production or staging environments.
 
-After a run completes, the Run Visualizer shows the full conversation and evaluation results.
+### Creating an HTTP Agent
 
-<img src="/images/scenarios/run-visualizer.png" alt="Run Visualizer" />
+1. In the selector dropdown, click **Add New Agent**
+2. Configure the HTTP settings:
 
-### Conversation View
+<Frame>
+  <img src="/images/scenarios/http-agent-form.png" alt="HTTP Agent configuration form" />
+</Frame>
+
+### Configuration Options
+
+| Field | Description |
+|-------|-------------|
+| **Name** | A descriptive name for this agent |
+| **URL** | The endpoint to call (e.g., `https://api.example.com/chat`) |
+| **Authentication** | Bearer token, API key, basic auth, or none |
+| **Body Template** | JSON body with `{{messages}}` placeholder |
+| **Response Path** | JSONPath to extract the response |
+
+### Body Template
+
+Use `{{messages}}` to inject the conversation history:
+
+```json
+{
+  "messages": {{messages}},
+  "stream": false
+}
+```
+
+The placeholder is replaced with an OpenAI-format message array:
+
+```json
+[
+  {"role": "user", "content": "Hello!"},
+  {"role": "assistant", "content": "Hi! How can I help?"}
+]
+```
+
+Other available variables: `{{input}}` (latest message as string), `{{threadId}}` (conversation ID).
+
+### Response Extraction
+
+Use JSONPath to extract the response from your API:
+
+```
+// For: { "choices": [{ "message": { "content": "Hello!" } }] }
+$.choices[0].message.content
+
+// For: { "response": "Hello!" }
+$.response
+```
+
+## Running Against a Prompt
+
+Use prompts to test directly against an LLM using your project's configured model providers. This is useful for:
+
+- Testing prompt changes before deployment
+- Quick iteration without infrastructure
+- Comparing different prompt versions
+
+### Selecting a Prompt
+
+1. In the selector dropdown, choose from the **Prompts** section
+2. Only published prompts (version > 0) appear
+
+<Frame>
+  <img src="/images/scenarios/prompt-selector.png" alt="Prompt selector" />
+</Frame>
 
-The left panel shows the full conversation trace:
+When you run against a prompt, the platform uses the prompt's configured model, system message, and temperature settings with your project's API keys.
 
-- **User messages** (blue): Generated by the User Simulator
-- **Agent messages** (gray): Responses from your target
-- **Tool calls** (if any): Actions taken by the agent
+<Tip>
+  Don't have a prompt yet? Click **Add New Prompt** to open Prompt Management
+  in a new tab, create your prompt, then return to select it.
+</Tip>
+
+## Executing a Run
+
+From the Scenario Editor, use the **Save and Run** menu:
+
+<Frame>
+  <img src="/images/scenarios/save-and-run.png" alt="Save and Run menu" />
+</Frame>
+
+1. Click **Save and Run** to open the selector
+2. Choose an HTTP Agent or Prompt
+3. The scenario runs immediately
 
-Click any message to see details like:
-- Raw content
-- Timestamp
-- Token count
-- Tool call arguments
+The platform:
+1. Sends the Situation to the User Simulator
+2. Runs a multi-turn conversation between the User Simulator and your agent
+3. Passes the conversation to the Judge with your Criteria
+4. Records the verdict and reasoning
+
+## Viewing Results
+
+After a run completes, you're taken to the Simulations visualizer.
+
+<Frame>
+  <img src="/images/scenarios/run-visualizer.png" alt="Run Visualizer" />
+</Frame>
+
+### Conversation View
 
-### Evaluation Results
+The main panel shows the full conversation:
 
-The right panel shows evaluation results:
+- **User messages** - Generated by the User Simulator based on your Situation
+- **Assistant messages** - Responses from your agent
+- **Tool calls** - If your agent uses tools
+
+### Results Panel
+
+The side panel shows:
 
 | Field | Description |
 |-------|-------------|
-| **Status** | Overall pass/fail |
-| **Score** | Percentage of criteria passed |
-| **Duration** | Total run time |
+| **Status** | Pass, Fail, or Error |
+| **Criteria Results** | Each criterion with pass/fail and reasoning |
+| **Run Duration** | Total execution time |
 
 ### Criteria Breakdown
 
-Each criterion shows:
-- **Pass/Fail** indicator
-- **Reasoning** from the Judge explaining the evaluation
+Each criterion shows the Judge's reasoning:
 
-<img src="/images/scenarios/criteria-results.png" alt="Criteria Results" />
+<Frame>
+  <img src="/images/scenarios/criteria-results.png" alt="Criteria results" />
+</Frame>
 
 <Tip>
-  The Judge's reasoning helps you understand exactly why a criterion passed or
-  failed. Use this to refine your criteria or identify agent issues.
+  Read the reasoning carefully. It explains exactly what the Judge observed
+  and why it made its decision.
 </Tip>
 
 ## Analyzing Failed Runs
 
-When a scenario fails, use the Run Visualizer to diagnose the issue:
-
-### 1. Check the Criteria Breakdown
+When a scenario fails:
 
-Look at which criteria failed and read the Judge's reasoning. Common issues:
+### 1. Read the Failed Criteria
 
-| Failed Because | Likely Issue |
-|----------------|--------------|
-| "Agent did not acknowledge..." | Missing empathy in responses |
-| "Agent asked too many questions" | Overly verbose conversation flow |
-| "Agent recommended wrong category" | Knowledge or retrieval issue |
-| "Conversation ended abruptly" | Error handling or timeout |
+| Reasoning Says... | Likely Issue |
+|-------------------|--------------|
+| "Agent did not acknowledge..." | Missing empathy |
+| "Agent asked 4 questions, exceeding limit of 2" | Too verbose |
+| "No mention of refund policy" | Missing information |
+| "Conversation ended without resolution" | Incomplete flow |
 
 ### 2. Review the Conversation
 
-Step through the conversation to find where things went wrong:
-- Did the agent misunderstand the user's intent?
-- Did the agent get stuck in a loop?
+Step through messages to find where things went wrong:
+- Did the agent misunderstand the user?
+- Did it get stuck repeating itself?
 - Did an error interrupt the flow?
 
-### 3. Check Tool Calls
+### 3. Fix and Re-run
 
-If your agent uses tools, verify:
-- Were the right tools called?
-- Were arguments correct?
-- Did tool results get used properly?
+| Pattern | Fix |
+|---------|-----|
+| Ignores constraints | Update system prompt to emphasize listening |
+| Too verbose | Add brevity instructions |
+| Wrong tone | Add tone guidelines |
+| Missing info | Add to knowledge base or prompt |
 
 ## Run History
 
-Access past runs from the Scenario Editor by clicking **View Runs**. This shows all previous executions with:
-
-- Timestamp
-- Target used
-- Pass/fail status
-- Quick link to the Run Visualizer
-
-<img src="/images/scenarios/run-history.png" alt="Run History" />
-
-Use run history to:
-- **Track progress** as you iterate on your agent
-- **Compare runs** before and after changes
-- **Identify regressions** when a previously passing scenario fails
+Access past runs from the **Simulations** section in the sidebar.
 
-## Best Practices
+<Frame>
+  <img src="/images/scenarios/simulations-list.png" alt="Simulations list" />
+</Frame>
 
-### Iterate on Criteria
+The visualizer shows all runs with:
+- Pass/fail status
+- Timestamps and duration
+- Quick navigation to details
 
-If a scenario fails unexpectedly, consider whether the criteria are:
-- **Too strict**: Requiring exact wording or behavior
-- **Too vague**: Not specific enough for the Judge to evaluate
-- **Conflicting**: Multiple criteria that can't all be satisfied
+Use history to:
+- Track progress as you iterate
+- Compare runs before and after changes
+- Identify regressions
+- Share results with your team
 
-### Test Edge Cases
+## Relationship to Simulations
 
-Create scenarios for:
-- Happy paths (expected behavior)
-- Error conditions (invalid inputs, timeouts)
-- Edge cases (unusual requests, adversarial users)
-- Multi-turn complexity (long conversations, topic changes)
+On-Platform Scenarios and the [Simulations visualizer](/agent-simulations/overview) work together:
 
-### Use Labels for Organization
+1. **Scenarios** define test cases (situation, criteria)
+2. **Running a scenario** produces a **simulation**
+3. **Simulations** appear in the visualizer
 
-As your scenario library grows, use labels to:
-- Filter to relevant scenarios quickly
-- Group scenarios for batch runs (coming in M2)
-- Track coverage across features
+Both On-Platform Scenarios and the [Scenario SDK](https://langwatch.ai/scenario/) produce simulations in the same visualizer, so you can mix approaches.
 
 ## Coming Soon
 
 <CardGroup cols={2}>
-  <Card title="Suites (M2)" icon="layer-group">
-    Run multiple scenarios against multiple targets in batch
+  <Card title="Suites" icon="layer-group">
+    Run multiple scenarios against multiple agents in batch
   </Card>
-  <Card title="Turn Builder (M2)" icon="list-timeline">
-    Create custom conversation scripts with fixed turns
+  <Card title="Turn Builder" icon="list-timeline">
+    Create custom conversation scripts
   </Card>
-  <Card title="Dataset Mode (M2)" icon="database">
-    Run scenarios with different inputs from a dataset
+  <Card title="Dataset Mode" icon="database">
+    Run scenarios with inputs from a dataset
   </Card>
-  <Card title="AI Generation (M3)" icon="wand-magic-sparkles">
-    Generate scenarios automatically from agent descriptions
+  <Card title="AI Generation" icon="wand-magic-sparkles">
+    Generate scenarios from agent descriptions
   </Card>
 </CardGroup>
 
 ## Next Steps
 
 <CardGroup cols={2}>
-  <Card title="SDK Integration" icon="code" href="/agent-simulations/getting-started">
-    Run scenarios in CI/CD with the SDK
+  <Card title="Simulations Visualizer" icon="chart-line" href="/agent-simulations/overview">
+    Learn more about analyzing results
   </Card>
   <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
-    Create more scenarios to expand coverage
+    Write more scenarios
+  </Card>
+  <Card title="Scenario SDK" icon="code" href="https://langwatch.ai/scenario/">
+    Run scenarios in CI/CD
+  </Card>
+  <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">
+    Create versioned prompts
   </Card>
 </CardGroup>
diff --git a/scenarios/targets.mdx b/scenarios/targets.mdx
deleted file mode 100644
index 9bfd956..0000000
--- a/scenarios/targets.mdx
+++ /dev/null
@@ -1,180 +0,0 @@
----
-title: Configuring Targets
-description: Set up HTTP, LLM, or Prompt Config targets for your scenarios
----
-
-# Configuring Targets
-
-A **Target** defines how the LangWatch platform invokes your agent during a scenario run. You can configure three types of targets:
-
-| Target Type | Use Case |
-|-------------|----------|
-| **HTTP** | External API endpoints (production agents, staging environments) |
-| **LLM** | Direct model calls for testing prompts |
-| **Prompt Config** | Versioned prompts from Prompt Management |
-
-## Accessing the Target Drawer
-
-From the Scenario Editor, click **Configure Target** to open the Target Drawer.
-
-<img src="/images/scenarios/target-drawer.png" alt="Target Drawer" />
-
-## HTTP Target
-
-Use HTTP targets to test agents deployed as API endpoints.
-
-### Configuration
-
-| Field | Description |
-|-------|-------------|
-| **URL** | The endpoint to call (e.g., `https://api.example.com/chat`) |
-| **Method** | HTTP method (typically `POST`) |
-| **Headers** | Request headers (authentication, content-type) |
-| **Body Template** | JSON body with `{{messages}}` placeholder |
-
-<img src="/images/scenarios/http-target-form.png" alt="HTTP Target Form" />
-
-### Body Template
-
-The body template supports variable interpolation. Use `{{messages}}` to inject the conversation history:
-
-```json
-{
-  "messages": {{messages}},
-  "stream": false
-}
-```
-
-The `{{messages}}` placeholder is replaced with the OpenAI-format message array:
-
-```json
-[
-  {"role": "user", "content": "Hello!"},
-  {"role": "assistant", "content": "Hi! How can I help?"},
-  {"role": "user", "content": "I need a refund"}
-]
-```
-
-### Authentication
-
-Add authentication headers as needed:
-
-```
-Authorization: Bearer sk-your-api-key
-X-API-Key: your-api-key
-```
-
-<Warning>
-  Store sensitive API keys securely. Consider using environment variables or a
-  secrets manager for production deployments.
-</Warning>
-
-### Expected Response Format
-
-Your endpoint should return a response with the assistant's message:
-
-```json
-{
-  "choices": [
-    {
-      "message": {
-        "role": "assistant",
-        "content": "I'd be happy to help with your refund..."
-      }
-    }
-  ]
-}
-```
-
-Or a simple string response:
-
-```json
-{
-  "response": "I'd be happy to help with your refund..."
-}
-```
-
-## LLM Target
-
-Use LLM targets to test prompts directly against a model using your project's provider keys.
-
-### Configuration
-
-| Field | Description |
-|-------|-------------|
-| **Model** | The model to use (e.g., `gpt-4`, `claude-3-opus`) |
-| **System Prompt** | The system message for the agent |
-| **Temperature** | Sampling temperature (0-2) |
-
-<img src="/images/scenarios/llm-target-form.png" alt="LLM Target Form" />
-
-### Model Selection
-
-Select from any model configured in your project's Model Providers. The platform uses your existing provider API keys.
-
-### System Prompt
-
-Define the agent's behavior with a system prompt:
-
-```
-You are a helpful customer service agent for Acme Corp. You help customers
-with orders, returns, and product questions. Always be polite and empathetic.
-If you can't resolve an issue, offer to escalate to a human agent.
-```
-
-<Tip>
-  LLM targets are great for rapid iteration on prompts. Test different system
-  prompts without deploying changes to your production agent.
-</Tip>
-
-## Prompt Config Target
-
-Use Prompt Config targets to test versioned prompts from [Prompt Management](/prompt-management/overview).
-
-### Configuration
-
-| Field | Description |
-|-------|-------------|
-| **Prompt** | Select a prompt from your project |
-| **Version** | Select a specific version or use latest |
-
-<img src="/images/scenarios/prompt-config-form.png" alt="Prompt Config Target Form" />
-
-### Benefits
-
-- **Version Control**: Test specific prompt versions
-- **A/B Testing**: Compare different prompt versions
-- **Consistency**: Ensure scenarios use the same prompt as production
-
-## Choosing a Target Type
-
-| Scenario | Recommended Target |
-|----------|-------------------|
-| Testing a deployed agent | HTTP |
-| Iterating on a prompt | LLM |
-| Regression testing prompts | Prompt Config |
-| Testing agent tools/integrations | HTTP |
-| Quick prototyping | LLM |
-
-## Multiple Targets
-
-You can run the same scenario against multiple targets to compare behavior. This is useful for:
-
-- **A/B testing** different prompt versions
-- **Regression testing** after changes
-- **Benchmarking** different models
-
-<Note>
-  Suites for running scenarios against multiple targets are coming in M2 (Jan 31).
-</Note>
-
-## Next Steps
-
-<CardGroup cols={2}>
-  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
-    Execute scenarios and analyze results
-  </Card>
-  <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">
-    Learn about versioned prompts
-  </Card>
-</CardGroup>

From 80483119ff63fa88cfc80384c771d3f321ff3234 Mon Sep 17 00:00:00 2001
From: drewdrew <drewdrewthis@gmail.com>
Date: Wed, 14 Jan 2026 23:04:08 +0100
Subject: [PATCH 3/4] docs: fix terminology and clarify trace-based evaluation

Terminology fixes:
- Change "SDK" to "library" or "testing library" throughout
- Scenario is a "testing framework/library", not an SDK
- Update navigation group from "Scenario SDK" to "Scenario Library"

Content improvements:
- Clarify platform vs code-based evaluation capabilities
- On-Platform: evaluates conversation transcript only
- Code-based: can access execution traces via OpenTelemetry
- Add examples of trace-based criteria (tool calls, latency, errors)
- Point users to Scenario library for advanced trace-based evaluation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 agent-simulations/introduction.mdx | 13 +++++++------
 docs.json                          |  2 +-
 scenarios/creating-scenarios.mdx   | 24 +++++++++++++++++++-----
 scenarios/overview.mdx             | 14 +++++++-------
 scenarios/running-scenarios.mdx    |  4 ++--
 5 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/agent-simulations/introduction.mdx b/agent-simulations/introduction.mdx
index 277e265..c9a5d1e 100644
--- a/agent-simulations/introduction.mdx
+++ b/agent-simulations/introduction.mdx
@@ -96,20 +96,21 @@ Create and run simulations directly in the LangWatch UI:
 
 [Get started with On-Platform Scenarios →](/scenarios/overview)
 
-### Scenario SDK (Code-Based)
+### Scenario Library (Code-Based)
 
 Write simulations in code for maximum control:
 - Full programmatic control over conversation flow
 - Complex assertions and tool call verification
 - CI/CD integration for automated testing
+- **Trace-based evaluation** via OpenTelemetry integration
 
-[Get started with the Scenario SDK →](/agent-simulations/getting-started)
+[Get started with the Scenario library →](/agent-simulations/getting-started)
 
 Both approaches produce simulations that appear in the same visualizer, so you can mix and match based on your needs.
 
 ## Visualizing Simulations in LangWatch
 
-The Simulations visualizer helps you analyze results from both On-Platform Scenarios and SDK-based tests:
+The Simulations visualizer helps you analyze results from both On-Platform Scenarios and code-based tests:
 
 - **Organize simulations** into sets and batches
 - **Debug agent behavior** by stepping through conversations
@@ -129,9 +130,9 @@ The Simulations visualizer helps you analyze results from both On-Platform Scena
 - [Creating Scenarios](/scenarios/creating-scenarios) - Write effective test cases
 - [Running Scenarios](/scenarios/running-scenarios) - Execute and analyze results
 
-### Scenario SDK
+### Scenario Library
 - [Visualizer Overview](/agent-simulations/overview) - Learn about the simulation visualizer
-- [SDK Getting Started](/agent-simulations/getting-started) - Set up your first code-based simulation
+- [Library Getting Started](/agent-simulations/getting-started) - Set up your first code-based simulation
 - [Individual Run Analysis](/agent-simulations/individual-run) - Debug specific scenarios
 - [Batch Runs](/agent-simulations/batch-runs) - Organize multiple tests
-- [Scenario Documentation](https://langwatch.ai/scenario/) - Deep dive into the SDK
+- [Scenario Documentation](https://langwatch.ai/scenario/) - Deep dive into the testing library
diff --git a/docs.json b/docs.json
index e78411f..be8fd22 100644
--- a/docs.json
+++ b/docs.json
@@ -75,7 +75,7 @@
                 ]
               },
               {
-                "group": "Scenario SDK",
+                "group": "Scenario Library",
                 "pages": [
                   "agent-simulations/overview",
                   "agent-simulations/getting-started",
diff --git a/scenarios/creating-scenarios.mdx b/scenarios/creating-scenarios.mdx
index 6d1876e..c171367 100644
--- a/scenarios/creating-scenarios.mdx
+++ b/scenarios/creating-scenarios.mdx
@@ -129,12 +129,26 @@ Criteria are the heart of your scenario. Well-written criteria catch real issues
 - "Agent should not make promises it cannot keep"
 - "Agent should not disclose other customers' information"
 
-### Avoid Criteria the Judge Can't Evaluate
+### Platform vs. Code-Based Evaluation
 
-The Judge can only see the conversation. It cannot:
-- Check if a database was updated
-- Verify if an email was sent
-- Confirm tool calls succeeded (use the SDK for this)
+On-Platform Scenarios evaluate based on the **conversation transcript only**. The Judge sees the messages exchanged but not internal system behavior.
+
+For advanced evaluation that includes **execution traces** (tool calls, API latency, span attributes), use the [Scenario testing library](https://langwatch.ai/scenario/) in code. The library integrates with OpenTelemetry to give the Judge access to:
+- Tool call verification (was the right tool called?)
+- Execution timing (was latency under threshold?)
+- Span attributes (what model was used? how many tokens?)
+- Error detection (did any operations fail?)
+
+**On-Platform (conversation only):**
+- "Agent should apologize for the inconvenience" ✓
+- "Agent should mention the 30-day return policy" ✓
+
+**Code-based (with trace access):**
+- "Agent called the search_inventory tool exactly once" ✓
+- "No errors occurred during execution" ✓
+- "API response time was under 500ms" ✓
+
+See the [Scenario documentation](https://langwatch.ai/scenario/) for trace-based evaluation.
 
 ## Adding Labels
 
diff --git a/scenarios/overview.mdx b/scenarios/overview.mdx
index b5f1cc5..9a529cc 100644
--- a/scenarios/overview.mdx
+++ b/scenarios/overview.mdx
@@ -4,7 +4,7 @@ description: Create and run agent simulations directly in the LangWatch UI witho
 sidebarTitle: Overview
 ---
 
-**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI. This is a visual, no-code alternative to the [Scenario SDK](https://langwatch.ai/scenario/) for testing agents.
+**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI. This is a visual, no-code alternative to the [Scenario library](https://langwatch.ai/scenario/) for testing agents.
 
 <Frame>
   <img src="/images/scenarios/scenario-library.png" alt="Scenario Library showing a list of scenarios with labels and run status" />
@@ -19,13 +19,13 @@ Understanding the terminology:
 | **Scenario** | A test case definition: the situation, criteria, and configuration |
 | **Simulation** | An execution of a scenario against a target, producing a conversation trace |
 | **Run** | A single simulation execution with its results |
-| **Set** | A group of related scenario runs (used by the SDK) |
+| **Set** | A group of related scenario runs (used by the testing library) |
 
 **On-Platform Scenarios** are test definitions you create in the UI. When you run a scenario against a target, it produces a **simulation** that you can view in the [Simulations visualizer](/agent-simulations/overview).
 
-## When to Use On-Platform vs. SDK
+## When to Use On-Platform vs. Code
 
-| Use Case | On-Platform | SDK |
+| Use Case | On-Platform | Code |
 |----------|:-----------:|:---:|
 | Quick iteration and experimentation | ✓ | |
 | Non-technical team members (PMs, QA) | ✓ | |
@@ -41,7 +41,7 @@ Understanding the terminology:
 - Iterate on prompts with fast visual feedback
 - Demonstrate agent behavior to stakeholders
 
-**Choose the [Scenario SDK](https://langwatch.ai/scenario/) when you need to:**
+**Choose the [Scenario library](https://langwatch.ai/scenario/) when you need to:**
 - Run tests in CI/CD pipelines
 - Write complex programmatic assertions
 - Build automated regression test suites
@@ -134,7 +134,7 @@ When you run a scenario, here's what happens:
   <Card title="Simulations Visualizer" icon="chart-line" href="/agent-simulations/overview">
     Analyze simulation results
   </Card>
-  <Card title="Scenario SDK" icon="code" href="https://langwatch.ai/scenario/">
-    Use the SDK for CI/CD integration
+  <Card title="Scenario library" icon="code" href="https://langwatch.ai/scenario/">
+    Use the library for CI/CD integration
   </Card>
 </CardGroup>
diff --git a/scenarios/running-scenarios.mdx b/scenarios/running-scenarios.mdx
index 8d638af..c6ec1a3 100644
--- a/scenarios/running-scenarios.mdx
+++ b/scenarios/running-scenarios.mdx
@@ -215,7 +215,7 @@ On-Platform Scenarios and the [Simulations visualizer](/agent-simulations/overvi
 2. **Running a scenario** produces a **simulation**
 3. **Simulations** appear in the visualizer
 
-Both On-Platform Scenarios and the [Scenario SDK](https://langwatch.ai/scenario/) produce simulations in the same visualizer, so you can mix approaches.
+Both On-Platform Scenarios and the [Scenario library](https://langwatch.ai/scenario/) produce simulations in the same visualizer, so you can mix approaches.
 
 ## Coming Soon
 
@@ -243,7 +243,7 @@ Both On-Platform Scenarios and the [Scenario SDK](https://langwatch.ai/scenario/
   <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
     Write more scenarios
   </Card>
-  <Card title="Scenario SDK" icon="code" href="https://langwatch.ai/scenario/">
+  <Card title="Scenario library" icon="code" href="https://langwatch.ai/scenario/">
     Run scenarios in CI/CD
   </Card>
   <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">

From 3a0e3c17251c9ee2143af97bdee67ce0fa89be9f Mon Sep 17 00:00:00 2001
From: drewdrew <drewdrewthis@gmail.com>
Date: Wed, 14 Jan 2026 23:08:49 +0100
Subject: [PATCH 4/4] docs: add Agents section with HTTP Agents documentation

Create dedicated Agents documentation section (similar to Prompt Management):
- agents/overview.mdx - Overview of agent types and concepts
- agents/http-agents.mdx - Full HTTP agent configuration guide

Refactor running-scenarios.mdx:
- Remove inline HTTP agent configuration details
- Reference /agents/http-agents for configuration
- Keep focused on the workflow of running scenarios

Add Agents group to navigation under Agent Simulations.

This mirrors how Prompt Management is structured - detailed configuration
in its own section, referenced from scenario running docs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 agents/http-agents.mdx          | 201 ++++++++++++++++++++++++++++++++
 agents/overview.mdx             |  57 +++++++++
 docs.json                       |   7 ++
 scenarios/running-scenarios.mdx |  89 +++-----------
 4 files changed, 283 insertions(+), 71 deletions(-)
 create mode 100644 agents/http-agents.mdx
 create mode 100644 agents/overview.mdx

diff --git a/agents/http-agents.mdx b/agents/http-agents.mdx
new file mode 100644
index 0000000..e9b75f5
--- /dev/null
+++ b/agents/http-agents.mdx
@@ -0,0 +1,201 @@
+---
+title: HTTP Agents
+description: Configure HTTP endpoints as testable agents for LangWatch scenarios
+sidebarTitle: HTTP Agents
+---
+
+HTTP Agents let you test any AI agent deployed as an API endpoint. Configure your endpoint once, then use it across multiple scenarios.
+
+## Creating an HTTP Agent
+
+1. Navigate to **Scenarios** in the sidebar
+2. When running a scenario, click **Add New Agent** in the target selector
+3. Configure the HTTP agent settings
+
+<Frame>
+  <img src="/images/agents/http-agent-form.png" alt="HTTP Agent configuration form" />
+</Frame>
+
+## Configuration
+
+### Basic Settings
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| **Name** | Descriptive name | "Production Chat API" |
+| **URL** | Endpoint to call | `https://api.example.com/chat` |
+| **Method** | HTTP method | `POST` |
+
+### Authentication
+
+Choose how to authenticate requests:
+
+| Type | Description | Header |
+|------|-------------|--------|
+| **None** | No authentication | - |
+| **Bearer Token** | OAuth/JWT token | `Authorization: Bearer <token>` |
+| **API Key** | Custom API key header | `X-API-Key: <key>` (configurable) |
+| **Basic Auth** | Username/password | `Authorization: Basic <base64>` |
+
+<Frame>
+  <img src="/images/agents/auth-config.png" alt="Authentication configuration" />
+</Frame>
+
+### Body Template
+
+Define the JSON body sent to your endpoint. Use placeholders for dynamic values:
+
+```json
+{
+  "messages": {{messages}},
+  "stream": false,
+  "max_tokens": 1000
+}
+```
+
+**Available placeholders:**
+
+| Placeholder | Type | Description |
+|-------------|------|-------------|
+| `{{messages}}` | Array | Full conversation history (OpenAI format) |
+| `{{input}}` | String | Latest user message only |
+| `{{threadId}}` | String | Unique conversation identifier |
+
+**Messages format:**
+
+The `{{messages}}` placeholder expands to an OpenAI-compatible message array:
+
+```json
+[
+  {"role": "system", "content": "You are a helpful assistant."},
+  {"role": "user", "content": "Hello!"},
+  {"role": "assistant", "content": "Hi! How can I help?"},
+  {"role": "user", "content": "I need help with my order"}
+]
+```
+
+### Response Extraction
+
+Use JSONPath to extract the assistant's response from your API's response format.
+
+**Common patterns:**
+
+| API Response Format | Response Path |
+|--------------------|---------------|
+| `{"choices": [{"message": {"content": "..."}}]}` | `$.choices[0].message.content` |
+| `{"response": "..."}` | `$.response` |
+| `{"data": {"reply": "..."}}` | `$.data.reply` |
+| `{"message": "..."}` | `$.message` |
+
+<Tip>
+  If your endpoint returns the message directly as a string (not JSON), leave
+  the response path empty.
+</Tip>
+
+## Example Configurations
+
+### OpenAI-Compatible Endpoint
+
+```
+Name: OpenAI Compatible API
+URL: https://api.yourcompany.com/v1/chat/completions
+Method: POST
+Auth: Bearer Token
+
+Body Template:
+{
+  "model": "gpt-4",
+  "messages": {{messages}},
+  "temperature": 0.7
+}
+
+Response Path: $.choices[0].message.content
+```
+
+### Simple Chat API
+
+```
+Name: Simple Chat Service
+URL: https://chat.yourcompany.com/api/message
+Method: POST
+Auth: API Key (X-API-Key)
+
+Body Template:
+{
+  "message": {{input}},
+  "conversation_id": {{threadId}}
+}
+
+Response Path: $.reply
+```
+
+### Custom Agent with Context
+
+```
+Name: Customer Support Agent
+URL: https://support.yourcompany.com/agent
+Method: POST
+Auth: Bearer Token
+
+Body Template:
+{
+  "messages": {{messages}},
+  "context": {
+    "source": "scenario_test",
+    "timestamp": "{{threadId}}"
+  }
+}
+
+Response Path: $.response.content
+```
+
+## Managing Agents
+
+### Editing Agents
+
+HTTP Agents are project-level resources. To edit an existing agent:
+
+1. Open any scenario
+2. Click the target selector
+3. Find the agent in the HTTP Agents section
+4. Click the edit icon
+
+### Deleting Agents
+
+Deleting an agent won't affect past scenario runs, but will prevent future runs against that agent.
+
+## Troubleshooting
+
+### Common Issues
+
+| Problem | Possible Cause | Solution |
+|---------|---------------|----------|
+| 401 Unauthorized | Invalid or expired token | Check authentication credentials |
+| 404 Not Found | Wrong URL | Verify endpoint URL |
+| Timeout | Slow response | Check endpoint performance |
+| Invalid JSON | Malformed body template | Validate JSON syntax |
+| Empty response | Wrong response path | Test JSONPath against actual response |
+
+### Testing Your Configuration
+
+Before running scenarios:
+
+1. Test your endpoint manually (curl, Postman)
+2. Verify the response format matches your JSONPath
+3. Check that authentication works
+
+<Warning>
+  HTTP Agent credentials are stored in your project. Use environment-specific
+  agents (dev, staging, prod) rather than sharing credentials.
+</Warning>
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
+    Test your agent with scenarios
+  </Card>
+  <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
+    Write test cases for your agent
+  </Card>
+</CardGroup>
diff --git a/agents/overview.mdx b/agents/overview.mdx
new file mode 100644
index 0000000..ab8b4a5
--- /dev/null
+++ b/agents/overview.mdx
@@ -0,0 +1,57 @@
+---
+title: Agents Overview
+description: Configure HTTP agents to test your deployed AI agents with LangWatch scenarios
+sidebarTitle: Overview
+---
+
+**Agents** in LangWatch represent external AI systems you want to test. When you run a scenario, you test it against an agent to evaluate its behavior.
+
+## Agent Types
+
+Currently, LangWatch supports **HTTP Agents** - external API endpoints that receive conversation messages and return responses.
+
+<Frame>
+  <img src="/images/agents/agent-list.png" alt="Agent list showing configured HTTP agents" />
+</Frame>
+
+## When to Use Agents
+
+Use HTTP Agents when you want to test:
+
+- **Deployed agents** - Your production or staging AI endpoints
+- **External services** - Third-party AI APIs
+- **Custom implementations** - Any HTTP endpoint that handles conversations
+
+For testing prompts directly (without a deployed endpoint), use [Prompt targets](/prompt-management/overview) instead.
+
+## Key Concepts
+
+### HTTP Agent
+
+An HTTP Agent configuration includes:
+
+| Field | Description |
+|-------|-------------|
+| **Name** | Descriptive name for the agent |
+| **URL** | The endpoint to call |
+| **Authentication** | How to authenticate requests |
+| **Body Template** | JSON body format with message placeholders |
+| **Response Path** | JSONPath to extract the response |
+
+### Agent vs. Prompt
+
+| Testing... | Use |
+|------------|-----|
+| A deployed endpoint (API) | HTTP Agent |
+| A prompt before deployment | Prompt (from Prompt Management) |
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="HTTP Agents" icon="globe" href="/agents/http-agents">
+    Configure HTTP agents for scenario testing
+  </Card>
+  <Card title="Running Scenarios" icon="play" href="/scenarios/running-scenarios">
+    Test your agents with scenarios
+  </Card>
+</CardGroup>
diff --git a/docs.json b/docs.json
index be8fd22..9e1af99 100644
--- a/docs.json
+++ b/docs.json
@@ -83,6 +83,13 @@
                   "agent-simulations/batch-runs",
                   "agent-simulations/individual-run"
                 ]
+              },
+              {
+                "group": "Agents",
+                "pages": [
+                  "agents/overview",
+                  "agents/http-agents"
+                ]
               }
             ]
           },
diff --git a/scenarios/running-scenarios.mdx b/scenarios/running-scenarios.mdx
index c6ec1a3..6aae48f 100644
--- a/scenarios/running-scenarios.mdx
+++ b/scenarios/running-scenarios.mdx
@@ -10,10 +10,10 @@ Once you've created a scenario, you can run it against your agent to test its be
 
 When you run a scenario, you select what to test against:
 
-| Option | Description |
-|--------|-------------|
-| **HTTP Agent** | An external API endpoint (your deployed agent) |
-| **Prompt** | A versioned prompt from [Prompt Management](/prompt-management/overview) |
+| Option | Description | Learn More |
+|--------|-------------|------------|
+| **HTTP Agent** | An external API endpoint (your deployed agent) | [HTTP Agents →](/agents/http-agents) |
+| **Prompt** | A versioned prompt using your project's model providers | [Prompt Management →](/prompt-management/overview) |
 
 The selector shows both options grouped by type:
 
@@ -23,60 +23,12 @@ The selector shows both options grouped by type:
 
 ## Running Against an HTTP Agent
 
-Use HTTP agents to test agents deployed as API endpoints. This is the most common option for testing production or staging environments.
+Use [HTTP Agents](/agents/http-agents) to test agents deployed as API endpoints. This is the most common option for testing production or staging environments.
 
-### Creating an HTTP Agent
-
-1. In the selector dropdown, click **Add New Agent**
-2. Configure the HTTP settings:
-
-<Frame>
-  <img src="/images/scenarios/http-agent-form.png" alt="HTTP Agent configuration form" />
-</Frame>
-
-### Configuration Options
-
-| Field | Description |
-|-------|-------------|
-| **Name** | A descriptive name for this agent |
-| **URL** | The endpoint to call (e.g., `https://api.example.com/chat`) |
-| **Authentication** | Bearer token, API key, basic auth, or none |
-| **Body Template** | JSON body with `{{messages}}` placeholder |
-| **Response Path** | JSONPath to extract the response |
-
-### Body Template
-
-Use `{{messages}}` to inject the conversation history:
-
-```json
-{
-  "messages": {{messages}},
-  "stream": false
-}
-```
-
-The placeholder is replaced with an OpenAI-format message array:
-
-```json
-[
-  {"role": "user", "content": "Hello!"},
-  {"role": "assistant", "content": "Hi! How can I help?"}
-]
-```
-
-Other available variables: `{{input}}` (latest message as string), `{{threadId}}` (conversation ID).
-
-### Response Extraction
-
-Use JSONPath to extract the response from your API:
-
-```
-// For: { "choices": [{ "message": { "content": "Hello!" } }] }
-$.choices[0].message.content
-
-// For: { "response": "Hello!" }
-$.response
-```
+To create an HTTP Agent, click **Add New Agent** in the selector dropdown. See [HTTP Agents](/agents/http-agents) for configuration details including:
+- URL and authentication setup
+- Body templates with message placeholders
+- Response extraction with JSONPath
 
 ## Running Against a Prompt
 
@@ -86,20 +38,15 @@ Use prompts to test directly against an LLM using your project's configured mode
 - Quick iteration without infrastructure
 - Comparing different prompt versions
 
-### Selecting a Prompt
-
+To use a prompt:
 1. In the selector dropdown, choose from the **Prompts** section
 2. Only published prompts (version > 0) appear
 
-<Frame>
-  <img src="/images/scenarios/prompt-selector.png" alt="Prompt selector" />
-</Frame>
-
 When you run against a prompt, the platform uses the prompt's configured model, system message, and temperature settings with your project's API keys.
 
 <Tip>
-  Don't have a prompt yet? Click **Add New Prompt** to open Prompt Management
-  in a new tab, create your prompt, then return to select it.
+  Don't have a prompt yet? Click **Add New Prompt** to open
+  [Prompt Management](/prompt-management/getting-started) in a new tab.
 </Tip>
 
 ## Executing a Run
@@ -237,16 +184,16 @@ Both On-Platform Scenarios and the [Scenario library](https://langwatch.ai/scena
 ## Next Steps
 
 <CardGroup cols={2}>
+  <Card title="HTTP Agents" icon="globe" href="/agents/http-agents">
+    Configure HTTP agent endpoints
+  </Card>
+  <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">
+    Create versioned prompts
+  </Card>
   <Card title="Simulations Visualizer" icon="chart-line" href="/agent-simulations/overview">
     Learn more about analyzing results
   </Card>
-  <Card title="Creating Scenarios" icon="plus" href="/scenarios/creating-scenarios">
-    Write more scenarios
-  </Card>
   <Card title="Scenario library" icon="code" href="https://langwatch.ai/scenario/">
     Run scenarios in CI/CD
   </Card>
-  <Card title="Prompt Management" icon="file-lines" href="/prompt-management/overview">
-    Create versioned prompts
-  </Card>
 </CardGroup>