HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens. by davidbau · Pull Request #102 · ndif-team/workbench

davidbau · 2026-01-06T22:02:14Z

Summary

This PR includes several enhancements to the LogitLens widget and adds backend test infrastructure:

Widget Improvements

Bidirectional hover sync: Hovering tokens in the React TokenArea highlights the corresponding row in the widget heatmap, and vice versa
Popup positioning fix: Cell prediction popups now appear to the left when near the right edge of the viewport
Rank mode trajectory fix: Gray hover trajectory lines now correctly display rank data when in rank mode

Backend Fixes

V2 endpoint entropy fix: Fixed bug where all_entropy (containing dead proxy objects) was used outside the trace context instead of the properly saved entropy tensor

Backend Test Infrastructure

Added pytest, pytest-asyncio, and httpx to dev dependencies
Created test config (test.toml) with only GPT-2 for fast local testing
Added comprehensive tests for lens endpoints (V2, grid, line)
Tests run with REMOTE=false using local nnsight execution
All 9 tests pass including V2 with entropy enabled

Test plan

Run uv run pytest workbench/_api/tests/ -v to verify all 9 tests pass
More automated tests needed.

Manual testing:

Test hover sync by hovering over tokens in the sidebar and seeing heatmap row highlight
Test popup positioning by clicking cells near the right edge of the screen
Test rank mode trajectory by switching to rank mode and hovering over cells

🤖 Generated with Claude Code

vercel · 2026-01-06T22:02:19Z

@davidbau is attempting to deploy a commit to the NDIF Team on Vercel.

A member of the Team first needs to authorize it.

Core features: - V2 lens endpoint with rank and entropy data support - LogitLens widget with heatmap, trajectory chart, and pin/group support - React integration via LogitLensWidgetEmbed component - Bidirectional hover sync between widget and React TokenArea Widget fixes: - Fix pinned row visibility with two-pass rendering algorithm - Fix popup positioning when near right edge of viewport - Fix hover trajectory display in rank mode - Fix widget ID collisions in Jupyter notebooks Co-Authored-By: Claude <noreply@anthropic.com>

- Add test configuration with GPT-2 only for fast local testing - Add pytest fixtures for test client and app state - Add comprehensive tests for V2, grid, and line lens endpoints - Tests run with REMOTE=false using local nnsight execution Co-Authored-By: Claude <noreply@anthropic.com>

- Add auto-pin last row with prominent token support - Simplify show_logit_lens to use **kwargs - Replace setEventHandlers with on/off event system - Add rank and entropy support to collect_logit_lens - Simplify pinned row serialization format - Fix NDIF remote execution issues - Unify collect_logit_lens between API and notebook module Co-Authored-By: Claude <noreply@anthropic.com>

Widget tests (Playwright): - Initialization, rendering, hover interactions - Pin/unpin tokens, metric switching (prob/rank) - Dark mode, title editing, state serialization - Visual regression tests Module tests (pytest): - Model architecture detection (GPT-2, Llama, Gemma) - Data collection with collect_logit_lens() - HTML/widget generation with show_logit_lens() E2E tests: - Full stack browser tests with real GPT-2 inference - API endpoint validation Co-Authored-By: Claude <noreply@anthropic.com>

- Add smoke test notebook for quick validation - Add tutorial notebook with interactive walkthrough - Add Playwright-based Colab test runner - Add auth setup flow for Google Colab authentication - Include data size measurement utilities for Llama 70B Tests verify widget renders correctly in real Colab environment with NDIF remote execution. Co-Authored-By: Claude <noreply@anthropic.com>

- LogitLens Python API documentation (collect and display functions) - Data format specification (how data flows from model to widget) - JavaScript widget API documentation for web embedding - Frontend README with development and testing guides - Testing guide with all test types and commands Co-Authored-By: Claude <noreply@anthropic.com>

- Add unified test runner (scripts/test.sh) for all test types - Auto-start/stop servers as needed for different test suites - Add architecture diagram to README - Add Colab link to tutorial notebook - Update package dependencies for testing - Clean up project structure Co-Authored-By: Claude <noreply@anthropic.com>

Remove duplicate conversion logic from process_v2_results in lens.py. Now both API and notebook module use the same to_js_format function from workbench.logitlens.display for converting tensors to V2 JSON. For local execution, result already contains vocab/model/input/layers. For remote execution (raw tensors only), we build missing metadata before calling to_js_format. Co-Authored-By: Claude <noreply@anthropic.com>

Use the real collect_logit_lens from workbench.logitlens.collect instead of maintaining a separate 145-line copy in the test file. Co-Authored-By: Claude <noreply@anthropic.com>

- Add svg() helper to utils.ts for cleaner SVG element creation - Consolidate duplicated legend-building code into createLegendEntry() - Refactor X-axis, Y-axis, and clip path construction to use svg() helper Reduces chart.ts by ~60 lines while improving readability. The verbose setAttribute() calls are now declarative object literals. Co-Authored-By: Claude <noreply@anthropic.com>

- Simplify cell text color determination from 16 lines to 4 lines - Fix memory leak: hint hover listeners were re-attached on every rebuild - Move hint listener setup to initialization (runs once, not per rebuild) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix bug where hovering row A didn't show pinned token trajectory when different row B was pinned (chart.ts positionsToShow logic) - Add unit test for hover trajectory with pinned rows - Add SQLite database initialization to test.sh and e2e.spec.ts - Fix test.sh widget grep to exclude React Integration Tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

For long prompts, only return the last max_loc token positions. This reduces memory/bandwidth when analyzing prompts with many tokens but you only care about the final predictions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

davidbau requested a review from AdamBelfki3 January 6, 2026 22:12

AdamBelfki3 changed the base branch from main to widget-refactoring January 7, 2026 19:24

davidbau and others added 7 commits January 8, 2026 05:50

davidbau force-pushed the kitwidget branch from 1dd3b6b to 206eca4 Compare January 8, 2026 10:55

davidbau and others added 6 commits January 8, 2026 09:34

Remove duplicate collect_logit_lens_inline from measure_data_size.py

e16f6c3

Use the real collect_logit_lens from workbench.logitlens.collect instead of maintaining a separate 145-line copy in the test file. Co-Authored-By: Claude <noreply@anthropic.com>

Add max_loc parameter to collect_logit_lens

18160c5

For long prompts, only return the last max_loc token positions. This reduces memory/bandwidth when analyzing prompts with many tokens but you only care about the final predictions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens.#102

HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens.#102
davidbau wants to merge 13 commits intondif-team:widget-refactoringfrom
davidbau:kitwidget

davidbau commented Jan 6, 2026

Uh oh!

vercel bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidbau commented Jan 6, 2026

Summary

Widget Improvements

Backend Fixes

Backend Test Infrastructure

Test plan

Manual testing:

Uh oh!

vercel bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant