diff --git a/agent-simulations/introduction.mdx b/agent-simulations/introduction.mdx
index c93369c..c9a5d1e 100644
--- a/agent-simulations/introduction.mdx
+++ b/agent-simulations/introduction.mdx
@@ -83,17 +83,40 @@ script=[
- **Simple integration** - Just implement one `call()` method
- **Multi-language support** - Python, TypeScript, and Go
+## Two Ways to Create Simulations
+
+LangWatch offers two approaches to agent testing:
+
+### On-Platform Scenarios (No Code)
+
+Create and run simulations directly in the LangWatch UI:
+- Define situations and evaluation criteria visually
+- Run against HTTP agents or managed prompts
+- Ideal for quick iteration and non-technical team members
+
+[Get started with On-Platform Scenarios →](/scenarios/overview)
+
+### Scenario Library (Code-Based)
+
+Write simulations in code for maximum control:
+- Full programmatic control over conversation flow
+- Complex assertions and tool call verification
+- CI/CD integration for automated testing
+- **Trace-based evaluation** via OpenTelemetry integration
+
+[Get started with the Scenario library →](/agent-simulations/getting-started)
+
+Both approaches produce simulations that appear in the same visualizer, so you can mix and match based on your needs.
+
## Visualizing Simulations in LangWatch
-Once you've set up your agent tests with Scenario, LangWatch provides powerful visualization tools to:
+The Simulations visualizer helps you analyze results from both On-Platform Scenarios and code-based tests:
- **Organize simulations** into sets and batches
- **Debug agent behavior** by stepping through conversations
- **Track performance** over time with run history
- **Collaborate** with your team on agent improvements
-The rest of this documentation will show you how to use LangWatch's simulation visualizer to get the most out of your agent testing.
-
+
+
+
+## Configuration
+
+### Basic Settings
+
+| Field | Description | Example |
+|-------|-------------|---------|
+| **Name** | Descriptive name | "Production Chat API" |
+| **URL** | Endpoint to call | `https://api.example.com/chat` |
+| **Method** | HTTP method | `POST` |
+
+### Authentication
+
+Choose how to authenticate requests:
+
+| Type | Description | Header |
+|------|-------------|--------|
+| **None** | No authentication | - |
+| **Bearer Token** | OAuth/JWT token | `Authorization: Bearer ` |
+| **API Key** | Custom API key header | `X-API-Key: ` (configurable) |
+| **Basic Auth** | Username/password | `Authorization: Basic ` |
+
+
+
+
+
+### Body Template
+
+Define the JSON body sent to your endpoint. Use placeholders for dynamic values:
+
+```json
+{
+ "messages": {{messages}},
+ "stream": false,
+ "max_tokens": 1000
+}
+```
+
+**Available placeholders:**
+
+| Placeholder | Type | Description |
+|-------------|------|-------------|
+| `{{messages}}` | Array | Full conversation history (OpenAI format) |
+| `{{input}}` | String | Latest user message only |
+| `{{threadId}}` | String | Unique conversation identifier |
+
+**Messages format:**
+
+The `{{messages}}` placeholder expands to an OpenAI-compatible message array:
+
+```json
+[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "Hello!"},
+ {"role": "assistant", "content": "Hi! How can I help?"},
+ {"role": "user", "content": "I need help with my order"}
+]
+```
+
+### Response Extraction
+
+Use JSONPath to extract the assistant's response from your API's response format.
+
+**Common patterns:**
+
+| API Response Format | Response Path |
+|--------------------|---------------|
+| `{"choices": [{"message": {"content": "..."}}]}` | `$.choices[0].message.content` |
+| `{"response": "..."}` | `$.response` |
+| `{"data": {"reply": "..."}}` | `$.data.reply` |
+| `{"message": "..."}` | `$.message` |
+
+
+ If your endpoint returns the message directly as a string (not JSON), leave
+ the response path empty.
+
+
+## Example Configurations
+
+### OpenAI-Compatible Endpoint
+
+```
+Name: OpenAI Compatible API
+URL: https://api.yourcompany.com/v1/chat/completions
+Method: POST
+Auth: Bearer Token
+
+Body Template:
+{
+ "model": "gpt-4",
+ "messages": {{messages}},
+ "temperature": 0.7
+}
+
+Response Path: $.choices[0].message.content
+```
+
+### Simple Chat API
+
+```
+Name: Simple Chat Service
+URL: https://chat.yourcompany.com/api/message
+Method: POST
+Auth: API Key (X-API-Key)
+
+Body Template:
+{
+ "message": {{input}},
+ "conversation_id": {{threadId}}
+}
+
+Response Path: $.reply
+```
+
+### Custom Agent with Context
+
+```
+Name: Customer Support Agent
+URL: https://support.yourcompany.com/agent
+Method: POST
+Auth: Bearer Token
+
+Body Template:
+{
+ "messages": {{messages}},
+ "context": {
+ "source": "scenario_test",
+ "timestamp": "{{threadId}}"
+ }
+}
+
+Response Path: $.response.content
+```
+
+## Managing Agents
+
+### Editing Agents
+
+HTTP Agents are project-level resources. To edit an existing agent:
+
+1. Open any scenario
+2. Click the target selector
+3. Find the agent in the HTTP Agents section
+4. Click the edit icon
+
+### Deleting Agents
+
+Deleting an agent won't affect past scenario runs, but will prevent future runs against that agent.
+
+## Troubleshooting
+
+### Common Issues
+
+| Problem | Possible Cause | Solution |
+|---------|---------------|----------|
+| 401 Unauthorized | Invalid or expired token | Check authentication credentials |
+| 404 Not Found | Wrong URL | Verify endpoint URL |
+| Timeout | Slow response | Check endpoint performance |
+| Invalid JSON | Malformed body template | Validate JSON syntax |
+| Empty response | Wrong response path | Test JSONPath against actual response |
+
+### Testing Your Configuration
+
+Before running scenarios:
+
+1. Test your endpoint manually (curl, Postman)
+2. Verify the response format matches your JSONPath
+3. Check that authentication works
+
+
+ HTTP Agent credentials are stored in your project. Use environment-specific
+ agents (dev, staging, prod) rather than sharing credentials.
+
+
+## Next Steps
+
+
+
+ Test your agent with scenarios
+
+
+ Write test cases for your agent
+
+
diff --git a/agents/overview.mdx b/agents/overview.mdx
new file mode 100644
index 0000000..ab8b4a5
--- /dev/null
+++ b/agents/overview.mdx
@@ -0,0 +1,57 @@
+---
+title: Agents Overview
+description: Configure HTTP agents to test your deployed AI agents with LangWatch scenarios
+sidebarTitle: Overview
+---
+
+**Agents** in LangWatch represent external AI systems you want to test. When you run a scenario, you test it against an agent to evaluate its behavior.
+
+## Agent Types
+
+Currently, LangWatch supports **HTTP Agents** - external API endpoints that receive conversation messages and return responses.
+
+
+
+
+
+## When to Use Agents
+
+Use HTTP Agents when you want to test:
+
+- **Deployed agents** - Your production or staging AI endpoints
+- **External services** - Third-party AI APIs
+- **Custom implementations** - Any HTTP endpoint that handles conversations
+
+For testing prompts directly (without a deployed endpoint), use [Prompt targets](/prompt-management/overview) instead.
+
+## Key Concepts
+
+### HTTP Agent
+
+An HTTP Agent configuration includes:
+
+| Field | Description |
+|-------|-------------|
+| **Name** | Descriptive name for the agent |
+| **URL** | The endpoint to call |
+| **Authentication** | How to authenticate requests |
+| **Body Template** | JSON body format with message placeholders |
+| **Response Path** | JSONPath to extract the response |
+
+### Agent vs. Prompt
+
+| Testing... | Use |
+|------------|-----|
+| A deployed endpoint (API) | HTTP Agent |
+| A prompt before deployment | Prompt (from Prompt Management) |
+
+## Next Steps
+
+
+
+ Configure HTTP agents for scenario testing
+
+
+ Test your agents with scenarios
+
+
diff --git a/docs.json b/docs.json
index 45e08ce..9e1af99 100644
--- a/docs.json
+++ b/docs.json
@@ -66,11 +66,31 @@
"group": "Agent Simulations",
"pages": [
"agent-simulations/introduction",
- "agent-simulations/overview",
- "agent-simulations/getting-started",
- "agent-simulations/set-overview",
- "agent-simulations/batch-runs",
- "agent-simulations/individual-run"
+ {
+ "group": "On-Platform Scenarios",
+ "pages": [
+ "scenarios/overview",
+ "scenarios/creating-scenarios",
+ "scenarios/running-scenarios"
+ ]
+ },
+ {
+ "group": "Scenario Library",
+ "pages": [
+ "agent-simulations/overview",
+ "agent-simulations/getting-started",
+ "agent-simulations/set-overview",
+ "agent-simulations/batch-runs",
+ "agent-simulations/individual-run"
+ ]
+ },
+ {
+ "group": "Agents",
+ "pages": [
+ "agents/overview",
+ "agents/http-agents"
+ ]
+ }
]
},
{
diff --git a/scenarios/creating-scenarios.mdx b/scenarios/creating-scenarios.mdx
new file mode 100644
index 0000000..c171367
--- /dev/null
+++ b/scenarios/creating-scenarios.mdx
@@ -0,0 +1,246 @@
+---
+title: Creating Scenarios
+description: Write effective scenarios with good situations and criteria
+sidebarTitle: Creating Scenarios
+---
+
+This guide walks you through creating scenarios in the LangWatch UI and provides best practices for writing effective test cases.
+
+## Accessing the Scenario Library
+
+Navigate to **Scenarios** in the left sidebar to open the Scenario Library.
+
+
+
+
+
+From here you can:
+- View all scenarios with their labels and last updated time
+- Filter scenarios by label
+- Create new scenarios
+- Click any scenario to edit it
+
+## Creating a New Scenario
+
+Click **New Scenario** to open the Scenario Editor.
+
+
+
+
+
+### Step 1: Name Your Scenario
+
+Give your scenario a descriptive name that explains what it tests:
+
+**Good names:**
+- "Handles refund request for damaged item"
+- "Recommends vegetarian recipes when asked"
+- "Escalates frustrated customer to human agent"
+
+**Avoid vague names:**
+- "Test 1"
+- "Refund"
+- "Customer service"
+
+### Step 2: Write the Situation
+
+The **Situation** describes the simulated user's context, persona, and goals. Write it as a narrative that captures:
+
+- **Who** the user is (persona, mood, background)
+- **What** they want to accomplish
+- **Constraints** or special circumstances
+
+**Example - Support scenario:**
+
+```
+The user is a frustrated customer who received the wrong item in their order.
+They've already tried the chatbot twice without success. They're running out of
+patience and want either a replacement shipped overnight or a full refund.
+They're not interested in store credit.
+```
+
+**Example - Sales scenario:**
+
+```
+The user is researching project management tools for their 15-person startup.
+They currently use spreadsheets and are overwhelmed. Budget is limited to $50
+per user per month. They need something that integrates with Slack and Google
+Workspace.
+```
+
+
+ Be specific about emotional state and constraints. Vague situations produce
+ generic conversations that don't test edge cases.
+
+
+### Step 3: Define Criteria
+
+**Criteria** are natural language statements that should be true for the scenario to pass. The Judge evaluates each criterion and explains its reasoning.
+
+Click **Add Criterion** and enter evaluation statements:
+
+
+
+
+
+## Writing Good Criteria
+
+Criteria are the heart of your scenario. Well-written criteria catch real issues; poorly-written ones create noise.
+
+### Be Specific and Observable
+
+| Good | Bad |
+|------|-----|
+| Agent acknowledges the customer's frustration within the first 2 messages | Agent is empathetic |
+| Agent offers a concrete solution (refund, replacement, or escalation) | Agent helps the customer |
+| Agent does not ask the customer to repeat their order number | Agent doesn't waste time |
+
+### Test One Thing Per Criterion
+
+| Good | Bad |
+|------|-----|
+| Agent uses a polite tone throughout | Agent is polite and helpful and resolves the issue quickly |
+| Agent offers a solution within 3 messages | Agent is fast and accurate |
+
+### Include Both Positive and Negative Checks
+
+```
+✓ Agent should offer to process a refund
+✓ Agent should not suggest store credit after user declined it
+✓ Agent should apologize for the inconvenience
+✓ Agent should not ask for the order number more than once
+```
+
+### Cover Different Aspects
+
+**Behavioral criteria:**
+- "Agent should not ask more than 2 clarifying questions"
+- "Agent should summarize the user's issue before proposing a solution"
+
+**Content criteria:**
+- "Recipe should include a list of ingredients with quantities"
+- "Response should mention the 30-day return policy"
+
+**Tone criteria:**
+- "Agent should maintain a professional but friendly tone"
+- "Agent should not use corporate jargon"
+
+**Safety criteria:**
+- "Agent should not make promises it cannot keep"
+- "Agent should not disclose other customers' information"
+
+### Platform vs. Code-Based Evaluation
+
+On-Platform Scenarios evaluate based on the **conversation transcript only**. The Judge sees the messages exchanged but not internal system behavior.
+
+For advanced evaluation that includes **execution traces** (tool calls, API latency, span attributes), use the [Scenario testing library](https://langwatch.ai/scenario/) in code. The library integrates with OpenTelemetry to give the Judge access to:
+- Tool call verification (was the right tool called?)
+- Execution timing (was latency under threshold?)
+- Span attributes (what model was used? how many tokens?)
+- Error detection (did any operations fail?)
+
+**On-Platform (conversation only):**
+- "Agent should apologize for the inconvenience" ✓
+- "Agent should mention the 30-day return policy" ✓
+
+**Code-based (with trace access):**
+- "Agent called the search_inventory tool exactly once" ✓
+- "No errors occurred during execution" ✓
+- "API response time was under 500ms" ✓
+
+See the [Scenario documentation](https://langwatch.ai/scenario/) for trace-based evaluation.
+
+## Adding Labels
+
+Labels help organize your scenario library. Click the label input to add tags.
+
+**Common labeling strategies:**
+
+| Category | Examples |
+|----------|----------|
+| Feature area | `checkout`, `support`, `onboarding`, `search` |
+| Agent type | `customer-service`, `sales`, `assistant` |
+| Priority | `critical`, `regression`, `exploratory` |
+| User type | `new-user`, `power-user`, `frustrated-user` |
+
+## Scenario Templates
+
+Here are templates for common scenario types:
+
+### Customer Support
+
+```
+Name: Handles [issue type] for [customer type]
+
+Situation:
+The user is a [persona] who [problem description]. They have [relevant context]
+and want [specific outcome]. They are feeling [emotional state].
+
+Criteria:
+- Agent acknowledges the issue within first response
+- Agent asks relevant clarifying questions (no more than 2)
+- Agent provides a clear solution or next steps
+- Agent maintains empathetic tone throughout
+- Agent does not make promises outside policy
+```
+
+### Product Recommendation
+
+```
+Name: Recommends [product type] for [use case]
+
+Situation:
+The user is looking for [product category] because [reason]. They need
+[specific requirements] and have [constraints]. They're comparing options
+and want honest recommendations.
+
+Criteria:
+- Agent asks about key requirements before recommending
+- Recommendations match stated requirements
+- Agent explains why each recommendation fits
+- Agent mentions relevant tradeoffs
+- Agent does not oversell or make exaggerated claims
+```
+
+### Information Retrieval
+
+```
+Name: Answers [topic] question accurately
+
+Situation:
+The user needs to know [specific information] for [reason]. They have
+[level of expertise] and prefer [communication style].
+
+Criteria:
+- Agent provides accurate information
+- Agent cites sources or documentation when available
+- Agent admits uncertainty rather than guessing
+- Response is appropriately detailed for the question
+- Agent offers to clarify or expand if needed
+```
+
+## Iterating on Scenarios
+
+Scenarios improve through iteration:
+
+1. **Start simple**: Begin with core criteria that capture the main behavior
+2. **Run and review**: Execute the scenario and read the Judge's reasoning
+3. **Refine criteria**: If criteria pass/fail unexpectedly, adjust the wording
+4. **Add edge cases**: Once the happy path works, add criteria for edge cases
+5. **Use labels**: Tag scenarios by iteration stage (`draft`, `validated`, `production`)
+
+
+ Editing a scenario doesn't affect past runs. Each run captures the scenario
+ state at execution time.
+
+
+## Next Steps
+
+
+
+ Connect your scenario to an agent
+
+
+ Execute and analyze results
+
+
diff --git a/scenarios/overview.mdx b/scenarios/overview.mdx
new file mode 100644
index 0000000..9a529cc
--- /dev/null
+++ b/scenarios/overview.mdx
@@ -0,0 +1,140 @@
+---
+title: On-Platform Scenarios
+description: Create and run agent simulations directly in the LangWatch UI without writing code
+sidebarTitle: Overview
+---
+
+**On-Platform Scenarios** let you create, configure, and run agent simulations directly in the LangWatch UI. This is a visual, no-code alternative to the [Scenario library](https://langwatch.ai/scenario/) for testing agents.
+
+
+
+
+
+## Scenarios vs. Simulations
+
+Understanding the terminology:
+
+| Term | What it means |
+|------|---------------|
+| **Scenario** | A test case definition: the situation, criteria, and configuration |
+| **Simulation** | An execution of a scenario against a target, producing a conversation trace |
+| **Run** | A single simulation execution with its results |
+| **Set** | A group of related scenario runs (used by the testing library) |
+
+**On-Platform Scenarios** are test definitions you create in the UI. When you run a scenario against a target, it produces a **simulation** that you can view in the [Simulations visualizer](/agent-simulations/overview).
+
+## When to Use On-Platform vs. Code
+
+| Use Case | On-Platform | Code |
+|----------|:-----------:|:---:|
+| Quick iteration and experimentation | ✓ | |
+| Non-technical team members (PMs, QA) | ✓ | |
+| Simple behavioral tests | ✓ | ✓ |
+| CI/CD pipeline integration | | ✓ |
+| Complex multi-turn scripts | | ✓ |
+| Programmatic assertions | | ✓ |
+| Dataset-driven testing | Coming soon | ✓ |
+
+**Choose On-Platform Scenarios when you want to:**
+- Quickly test agent behavior without writing code
+- Enable non-technical team members to create and run tests
+- Iterate on prompts with fast visual feedback
+- Demonstrate agent behavior to stakeholders
+
+**Choose the [Scenario library](https://langwatch.ai/scenario/) when you need to:**
+- Run tests in CI/CD pipelines
+- Write complex programmatic assertions
+- Build automated regression test suites
+- Define custom conversation scripts with precise control
+
+## What is a Scenario?
+
+A Scenario is a test case with three parts:
+
+### 1. Situation
+
+The **Situation** describes the context and persona of the simulated user. It tells the User Simulator how to behave during the conversation.
+
+```
+It's Saturday evening. The user is hungry and tired but doesn't want to order
+out. They're looking for a quick, easy vegetarian recipe they can make with
+common pantry ingredients.
+```
+
+### 2. Script
+
+The **Script** defines the conversation flow. In the current release, scenarios run in autopilot mode where the User Simulator drives the conversation based on the Situation.
+
+
+ The visual Turn Builder for creating custom conversation scripts is coming in a future release.
+
+
+### 3. Criteria
+
+The **Criteria** (or Score) define how to evaluate the agent's behavior. Each criterion is a natural language statement that should be true for the scenario to pass.
+
+```
+- Agent should not ask more than two follow-up questions
+- Agent should generate a recipe
+- Recipe should include a list of ingredients
+- Recipe should include step-by-step cooking instructions
+- Recipe should be vegetarian and not include any meat
+```
+
+## Key Concepts
+
+### What to Test Against
+
+When you run a scenario, you choose what to test:
+
+- **HTTP Agent**: Call an external API endpoint (your deployed agent)
+- **Prompt**: Use a versioned prompt from [Prompt Management](/prompt-management/overview)
+
+See [Running Scenarios](/scenarios/running-scenarios) for details on setting up each option.
+
+### Labels
+
+**Labels** help organize scenarios in your library. Use them to group scenarios by feature, agent type, priority, or any taxonomy that works for your team.
+
+## Architecture
+
+When you run a scenario, here's what happens:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ LangWatch Platform │
+│ │
+│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
+│ │ Scenario │───▶│ User │◀──▶│ Your Agent │ │
+│ │ (Situation) │ │ Simulator │ │ (Target) │ │
+│ └─────────────┘ └─────────────┘ └─────────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌─────────────┐ ┌─────────────┐ │
+│ │ Criteria │───▶│ Judge │───▶ Pass/Fail │
+│ └─────────────┘ └─────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────┘
+```
+
+1. The **Situation** configures the User Simulator's persona and goals
+2. The **User Simulator** and your **Target** have a multi-turn conversation
+3. The **Judge** evaluates the conversation against your **Criteria**
+4. The result (pass/fail with reasoning) is displayed in the Run Visualizer
+
+## Next Steps
+
+
+
+ Write effective scenarios with good criteria
+
+
+ Execute scenarios and analyze results
+
+
+ Analyze simulation results
+
+
+ Use the library for CI/CD integration
+
+
diff --git a/scenarios/running-scenarios.mdx b/scenarios/running-scenarios.mdx
new file mode 100644
index 0000000..6aae48f
--- /dev/null
+++ b/scenarios/running-scenarios.mdx
@@ -0,0 +1,199 @@
+---
+title: Running Scenarios
+description: Execute scenarios against HTTP agents or prompts and analyze results
+sidebarTitle: Running Scenarios
+---
+
+Once you've created a scenario, you can run it against your agent to test its behavior.
+
+## Choosing What to Test
+
+When you run a scenario, you select what to test against:
+
+| Option | Description | Learn More |
+|--------|-------------|------------|
+| **HTTP Agent** | An external API endpoint (your deployed agent) | [HTTP Agents →](/agents/http-agents) |
+| **Prompt** | A versioned prompt using your project's model providers | [Prompt Management →](/prompt-management/overview) |
+
+The selector shows both options grouped by type:
+
+
+
+
+
+## Running Against an HTTP Agent
+
+Use [HTTP Agents](/agents/http-agents) to test agents deployed as API endpoints. This is the most common option for testing production or staging environments.
+
+To create an HTTP Agent, click **Add New Agent** in the selector dropdown. See [HTTP Agents](/agents/http-agents) for configuration details including:
+- URL and authentication setup
+- Body templates with message placeholders
+- Response extraction with JSONPath
+
+## Running Against a Prompt
+
+Use prompts to test directly against an LLM using your project's configured model providers. This is useful for:
+
+- Testing prompt changes before deployment
+- Quick iteration without infrastructure
+- Comparing different prompt versions
+
+To use a prompt:
+1. In the selector dropdown, choose from the **Prompts** section
+2. Only published prompts (version > 0) appear
+
+When you run against a prompt, the platform uses the prompt's configured model, system message, and temperature settings with your project's API keys.
+
+
+ Don't have a prompt yet? Click **Add New Prompt** to open
+ [Prompt Management](/prompt-management/getting-started) in a new tab.
+
+
+## Executing a Run
+
+From the Scenario Editor, use the **Save and Run** menu:
+
+
+
+
+
+1. Click **Save and Run** to open the selector
+2. Choose an HTTP Agent or Prompt
+3. The scenario runs immediately
+
+The platform:
+1. Sends the Situation to the User Simulator
+2. Runs a multi-turn conversation between the User Simulator and your agent
+3. Passes the conversation to the Judge with your Criteria
+4. Records the verdict and reasoning
+
+## Viewing Results
+
+After a run completes, you're taken to the Simulations visualizer.
+
+
+
+
+
+### Conversation View
+
+The main panel shows the full conversation:
+
+- **User messages** - Generated by the User Simulator based on your Situation
+- **Assistant messages** - Responses from your agent
+- **Tool calls** - If your agent uses tools
+
+### Results Panel
+
+The side panel shows:
+
+| Field | Description |
+|-------|-------------|
+| **Status** | Pass, Fail, or Error |
+| **Criteria Results** | Each criterion with pass/fail and reasoning |
+| **Run Duration** | Total execution time |
+
+### Criteria Breakdown
+
+Each criterion shows the Judge's reasoning:
+
+
+
+
+
+
+ Read the reasoning carefully. It explains exactly what the Judge observed
+ and why it made its decision.
+
+
+## Analyzing Failed Runs
+
+When a scenario fails:
+
+### 1. Read the Failed Criteria
+
+| Reasoning Says... | Likely Issue |
+|-------------------|--------------|
+| "Agent did not acknowledge..." | Missing empathy |
+| "Agent asked 4 questions, exceeding limit of 2" | Too verbose |
+| "No mention of refund policy" | Missing information |
+| "Conversation ended without resolution" | Incomplete flow |
+
+### 2. Review the Conversation
+
+Step through messages to find where things went wrong:
+- Did the agent misunderstand the user?
+- Did it get stuck repeating itself?
+- Did an error interrupt the flow?
+
+### 3. Fix and Re-run
+
+| Pattern | Fix |
+|---------|-----|
+| Ignores constraints | Update system prompt to emphasize listening |
+| Too verbose | Add brevity instructions |
+| Wrong tone | Add tone guidelines |
+| Missing info | Add to knowledge base or prompt |
+
+## Run History
+
+Access past runs from the **Simulations** section in the sidebar.
+
+
+
+
+
+The visualizer shows all runs with:
+- Pass/fail status
+- Timestamps and duration
+- Quick navigation to details
+
+Use history to:
+- Track progress as you iterate
+- Compare runs before and after changes
+- Identify regressions
+- Share results with your team
+
+## Relationship to Simulations
+
+On-Platform Scenarios and the [Simulations visualizer](/agent-simulations/overview) work together:
+
+1. **Scenarios** define test cases (situation, criteria)
+2. **Running a scenario** produces a **simulation**
+3. **Simulations** appear in the visualizer
+
+Both On-Platform Scenarios and the [Scenario library](https://langwatch.ai/scenario/) produce simulations in the same visualizer, so you can mix approaches.
+
+## Coming Soon
+
+
+
+ Run multiple scenarios against multiple agents in batch
+
+
+ Create custom conversation scripts
+
+
+ Run scenarios with inputs from a dataset
+
+
+ Generate scenarios from agent descriptions
+
+
+
+## Next Steps
+
+
+
+ Configure HTTP agent endpoints
+
+
+ Create versioned prompts
+
+
+ Learn more about analyzing results
+
+
+ Run scenarios in CI/CD
+
+