scorecard-ai · Yash1hi · Jan 29, 2026 · Jan 29, 2026 · Jan 29, 2026 · Jan 29, 2026
diff --git a/features/playground.mdx b/features/playground.mdx
@@ -1,125 +1,43 @@
 ---
 title: "Playground"
-description: "The Playground is where you test and refine your AI agent prompts using real data. It's a three-panel interface that lets you select test data, edit prompts with Jinja templating, configure AI models, and see results in real-time."
+description: "A visual workflow builder for testing and evaluating your AI agent with real data."
 ---
 
 import { DarkLightImage } from "/snippets/dark-light-image.jsx";
 
 <DarkLightImage
   lightSrc="/images/playground-light.png"
-  caption="Playground overview."
-  alt="Screenshot of the playground in the UI."
+  darkSrc="/images/playground-dark.png"
+  caption="Playground workflow with testcases, agent, evaluator, results, and scores."
+  alt="Screenshot of the Playground showing a visual node-based workflow with connected components."
 />
 
-## Getting Started
+## Overview
 
-### 1. Select Your Test Data (Left Panel)
+The Playground is a visual workflow builder where you connect nodes to test your AI agent:
 
-**Choose a Testset:**
-- Click the testset dropdown at the top of the left panel
-- Select a testset that contains the data you want to test your prompt against
-- If no testsets exist, click "Create testset" to create one directly from the Playground
-- The first testcase will be automatically selected
+- **Testset** (left): Your test data flows into the agent
+- **Agent** (center): Configure prompts and model settings
+- **Evaluator** (top right): Select metrics to score outputs
+- **Results**: View agent responses for each testcase
+- **Scores**: See pass/fail status and metric scores
 
-**Select Testcases:**
-- Click on individual testcases to select them
-- Hold Shift and click to select multiple testcases
-- Selected testcases have a blue left border
-- Hover over the info icon to see the full testcase data
-- Testcases with a green flask icon have been tested
+## Running an Evaluation
 
-### 2. Edit Your Prompt (Middle Panel)
+1. **Select a testset** or add testcases manually
+2. **Configure your prompt** using [Jinja syntax](https://jinja.palletsprojects.com/) with variables like `{{context}}` or `{{allInputs}}`
+3. **Add metrics** to the Evaluator node
+4. **Click Run** to execute the evaluation
 
-**Choose a Prompt:**
-- Select a prompt from the dropdown in the header
-- If no prompts exist, click "Create prompt" to create one directly from the Playground
+Results flow through the workflow—watch as responses appear in the Results node and scores populate in the Scores node.
 
-**Work with Prompt Versions:**
-- The left sidebar shows all versions of your selected prompt
-- Click any version to switch to it
-- Versions with unsaved changes show a save indicator
-- The production version is marked with a badge
+## Related
 
-**Edit Prompt Templates:**
-- Use the "Prompt Templates" tab to write your prompts
-- The editor supports **Jinja syntax** for dynamic content
-- Insert variables from your testcase data like `{{variable_name}}`
-- Add multiple messages by clicking "+ Add Message"
-- Set message roles (System, User, Assistant) using the dropdown
-- Remove messages with the trash can icon
-
-**Configure Model Settings:**
-- Switch to the "Model Settings" tab
-- Choose your AI model (GPT-4, Claude, etc.)
-- Adjust parameters like temperature, max tokens, and top-p
-- These settings affect how the AI generates responses
-
-### 3. Preview and Test (Right Panel)
-
-**Template Preview:**
-- The "Template Preview" tab shows how your prompt looks with real data
-- Variables are automatically replaced with values from selected testcases
-- This helps you verify your Jinja templating is working correctly
-
-**Run Tests:**
-- Click "Try" to test your prompt on selected testcases
-- Click "Kickoff Run" to create a full run for the entire testset, which will appear in your run history
-- Results appear in the "Results" tab automatically
-
-**View Results:**
-- The "Results" tab shows AI responses for each testcase
-- See response time, token count, and full output
-- Click the completion badge to open a detailed results modal
-- Green indicates successful completion, yellow shows partial results
-
-## Key Features
-
-### Jinja Templating
-Your prompts support Jinja syntax for dynamic content:
-```jinja
-Hello {{name}}, your order #{{order_id}} is {{status}}.
-```
-
-### Multi-testcase Testing
-- Test individual testcases for quick iteration
-- Run all testcases for comprehensive evaluation
-- Kickoff full runs that are tracked in your run history
-- Compare results across different prompt versions
-
-### Version Management
-- Save new versions of your prompts with the "Save" button
-- Switch between versions to compare performance
-- Publish versions to production when ready
-
-### Real-time Preview
-- See exactly how your prompt will look with real data
-- Catch templating errors before running tests
-- Understand how variables are populated
-
-## Best Practices
-
-1. **Start Small**: Select one testcase first to quickly iterate on your prompt
-2. **Use Variables**: Leverage Jinja templating to make prompts dynamic and reusable
-3. **Test Thoroughly**: Run all testcases before publishing to production
-4. **Save Versions**: Create new versions when making significant changes
-5. **Monitor Results**: Check response times and token usage to optimize costs
-
-## Common Workflows
-
-**Quick Testing:**
-
-Select a testcase → Edit prompt → Preview → Try → Review results
-
-**Comprehensive Evaluation:**
-
-Select all testcases → Edit prompt → Try All → Analyze results modal
-
-**Version Comparison:**
-
-Test Version A → Switch to Version B → Test → Compare results
-
-**Full Run Creation:**
-
-Select testset → Edit prompt → Kickoff Run → Monitor in run history
-
-The Playground makes prompt engineering intuitive by providing immediate feedback and real data testing in a single interface.
+<CardGroup cols={2}>
+  <Card title="Testsets" href="/features/testsets" icon="flask">
+    Create and manage test data
+  </Card>
+  <Card title="Metrics" href="/features/metrics" icon="chart-bar">
+    Define evaluation criteria
+  </Card>
+</CardGroup>
diff --git a/features/records.mdx b/features/records.mdx
@@ -26,6 +26,24 @@ A **Record** is an individual test execution within a run. Each record contains:
 
 Records are created when you run evaluations via the API, Playground, or from traces.
 
+## Searching and Filtering
+
+### Metadata Search
+
+Search through trace metadata to find specific records. Click the search field and enter any text that appears in your trace metadata. This search looks through all metadata fields and returns records from matching traces.
+
+<Tip>
+Metadata search uses ClickHouse for high-performance searches across large datasets. Results may take a few seconds to load for very large projects.
+</Tip>
+
+### Filtering Options
+
+Use the filter dropdown to narrow results by:
+- **Run**: Filter by specific evaluation run
+- **Source**: How the record was created (API, Playground, Kickoff, Trace)
+- **Status**: Scoring status (completed, pending, errored)
+- **Date range**: Records created within a specific time period
+
 ## Customizing the Table
 
 Click **Edit Table** to customize which columns appear and their order. You can add, remove, and reorder columns including:
@@ -62,7 +80,13 @@ Re-scoring uses the latest version of your metrics without re-running your AI sy
 
 ## Record Details
 
-Click any record to view its full details. The details view differs based on how the record was created:
+Click anywhere on a table row to view the full record details. You can also click the specific record ID link if you prefer. The entire row is clickable to improve discoverability.
+
+<Tip>
+Interactive elements like checkboxes, score cards, and popover buttons won't trigger navigation - only clicking on empty areas of the row will open the record details.
+</Tip>
+
+The details view differs based on how the record was created:
 
 ### Testcase-Based Records
 

diff --git a/features/runs.mdx b/features/runs.mdx
@@ -5,7 +5,7 @@

 import { DarkLightImage } from '/snippets/dark-light-image.jsx';

 A **Run** is an execution that evaluates your AI agent against some Testcases using specified metrics.

 Runs generate **Records** (individual test executions) and **Scores** (evaluation results) for each Record that help you understand your agent's performance across different scenarios.

@@ -15,10 +15,10 @@

 Every run consists of:

 - (Optional) **Testset**: Collection of test cases to evaluate against
 - (Optional) **Metrics**: Evaluation criteria that score system outputs
 - (Optional) **System Version**: Configuration defining your AI system's behavior
 - **Records**: Individual test executions, one per Testcase
 - **Scores**: Evaluation results for each record against each metric

 <DarkLightImage lightSrc="/images/runs-list-light.png" caption="List of recent runs with statuses." alt="Screenshot of viewing runs list in the UI." />
@@ -32,11 +32,15 @@
 - **Playground**: Runs kicked off from the Playground
 - **Kickoff**: Runs created via the Kickoff Run modal in the UI
 
+<Note>
+By default, monitor-created runs are excluded from the main runs list to reduce clutter. Use the source filter to specifically view monitor runs when needed.
+</Note>
+
 ## Creating Runs
 
 ### Kickoff Run from the UI

 You can kickoff a run from the Projects dashboard, Testsets list, or a Run ("Run again" button). The **Kickoff Run** modal lets you choose the Testset, Prompt, and Metrics for the run.

 The **Scorecard** tab lets you run using an LLM on Scorecard's servers, so you can specify the LLM parameters. The **GitHub** tab lets you trigger a run using GitHub Actions on your actual system.

@@ -46,7 +50,7 @@

 The [Playground](/features/playground) allows you to test prompts interactively.

 Click **Kickoff Run** to create a run with a specified testset, prompt version, and metrics.

 <DarkLightImage lightSrc="/images/playground-light.png" caption="Playground overview." alt="Screenshot of the playground in the UI." />

@@ -128,6 +132,12 @@
 
 <DarkLightImage lightSrc="/images/testrecord-score-explanation-light.png" caption="Record score explanation." alt="Screenshot of viewing testrecord score explanation in the UI." />
 
+Click anywhere on a table row to view detailed record information. The entire row is clickable to improve discoverability of record details.
+
+<Tip>
+Interactive elements like checkboxes, score cards, and truncated content popovers won't trigger navigation - only clicking on empty areas of the row will open the record details.
+</Tip>
+
 Drill down into specific test executions for detailed analysis:
 
 **Record Overview:**

diff --git a/images/playground-dark.png b/images/playground-dark.png
diff --git a/images/playground-light.png b/images/playground-light.png
diff --git a/intro/langchain-quickstart.mdx b/intro/langchain-quickstart.mdx
@@ -123,17 +123,35 @@
 
 ## What Gets Traced
 
-OpenLLMetry automatically captures comprehensive telemetry from your LangChain applications:
+OpenLLMetry automatically captures comprehensive telemetry from your LangChain applications. Scorecard includes enhanced LangChain/Traceloop adapter support for better trace visualization:
 
 | Trace Data | Description |
 |------------|-------------|
-| **LLM Calls** | Every LLM invocation with full prompt and completion |
+| **LLM Calls** | Every LLM invocation with full prompt and completion, including model information and token counts |
 | **Chains** | Chain executions with inputs, outputs, and intermediate steps |
 | **Agents** | Agent reasoning steps, tool selections, and action outputs |
+| **Tools** | Tool invocations with proper tool call sections (not prompt/completion) |
 | **Retrievers** | Document retrieval operations and retrieved content |
-| **Token Usage** | Input, output, and total token counts per LLM call |
+| **Token Usage** | Input, output, and total token counts per LLM call extracted from `gen_ai.*` attributes |
 | **Errors** | Any failures with full error context and stack traces |
 
+### Enhanced Span Classification
+
+Scorecard's LangChain adapter recognizes both OpenInference (`openinference.*`) and Traceloop (`traceloop.*`) attribute formats:
+
+- **Workflow spans** (`traceloop.span.kind: workflow`) - High-level application flows
+- **Task spans** (`traceloop.span.kind: task`) - Individual processing steps
+- **Tool spans** (`traceloop.span.kind: tool`) - Tool invocations with dedicated Tool Call sections
+- **LLM spans** - Model calls with extracted model names, token counts, and costs
+
+### Tool Visualization
+
+Common LangChain tools receive appropriate coloring and categorization:
+- **Retrievers** (retriever, vectorstore, search)
+- **SQL tools** (sql, database)
+- **Web search** (search, google, bing)
+- **Custom tools** - Automatically detected from span names
+
 ## Next Steps
 
 <CardGroup cols={2}>