HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens.#102
Open
davidbau wants to merge 13 commits intondif-team:widget-refactoringfrom
Open
HTML/SVG based LogitLens widget, plus "chunkier" backend for logit lens.#102davidbau wants to merge 13 commits intondif-team:widget-refactoringfrom
davidbau wants to merge 13 commits intondif-team:widget-refactoringfrom
Conversation
Contributor
|
@davidbau is attempting to deploy a commit to the NDIF Team on Vercel. A member of the Team first needs to authorize it. |
Core features: - V2 lens endpoint with rank and entropy data support - LogitLens widget with heatmap, trajectory chart, and pin/group support - React integration via LogitLensWidgetEmbed component - Bidirectional hover sync between widget and React TokenArea Widget fixes: - Fix pinned row visibility with two-pass rendering algorithm - Fix popup positioning when near right edge of viewport - Fix hover trajectory display in rank mode - Fix widget ID collisions in Jupyter notebooks Co-Authored-By: Claude <noreply@anthropic.com>
- Add test configuration with GPT-2 only for fast local testing - Add pytest fixtures for test client and app state - Add comprehensive tests for V2, grid, and line lens endpoints - Tests run with REMOTE=false using local nnsight execution Co-Authored-By: Claude <noreply@anthropic.com>
- Add auto-pin last row with prominent token support - Simplify show_logit_lens to use **kwargs - Replace setEventHandlers with on/off event system - Add rank and entropy support to collect_logit_lens - Simplify pinned row serialization format - Fix NDIF remote execution issues - Unify collect_logit_lens between API and notebook module Co-Authored-By: Claude <noreply@anthropic.com>
Widget tests (Playwright): - Initialization, rendering, hover interactions - Pin/unpin tokens, metric switching (prob/rank) - Dark mode, title editing, state serialization - Visual regression tests Module tests (pytest): - Model architecture detection (GPT-2, Llama, Gemma) - Data collection with collect_logit_lens() - HTML/widget generation with show_logit_lens() E2E tests: - Full stack browser tests with real GPT-2 inference - API endpoint validation Co-Authored-By: Claude <noreply@anthropic.com>
- Add smoke test notebook for quick validation - Add tutorial notebook with interactive walkthrough - Add Playwright-based Colab test runner - Add auth setup flow for Google Colab authentication - Include data size measurement utilities for Llama 70B Tests verify widget renders correctly in real Colab environment with NDIF remote execution. Co-Authored-By: Claude <noreply@anthropic.com>
- LogitLens Python API documentation (collect and display functions) - Data format specification (how data flows from model to widget) - JavaScript widget API documentation for web embedding - Frontend README with development and testing guides - Testing guide with all test types and commands Co-Authored-By: Claude <noreply@anthropic.com>
- Add unified test runner (scripts/test.sh) for all test types - Auto-start/stop servers as needed for different test suites - Add architecture diagram to README - Add Colab link to tutorial notebook - Update package dependencies for testing - Clean up project structure Co-Authored-By: Claude <noreply@anthropic.com>
Remove duplicate conversion logic from process_v2_results in lens.py. Now both API and notebook module use the same to_js_format function from workbench.logitlens.display for converting tensors to V2 JSON. For local execution, result already contains vocab/model/input/layers. For remote execution (raw tensors only), we build missing metadata before calling to_js_format. Co-Authored-By: Claude <noreply@anthropic.com>
Use the real collect_logit_lens from workbench.logitlens.collect instead of maintaining a separate 145-line copy in the test file. Co-Authored-By: Claude <noreply@anthropic.com>
- Add svg() helper to utils.ts for cleaner SVG element creation - Consolidate duplicated legend-building code into createLegendEntry() - Refactor X-axis, Y-axis, and clip path construction to use svg() helper Reduces chart.ts by ~60 lines while improving readability. The verbose setAttribute() calls are now declarative object literals. Co-Authored-By: Claude <noreply@anthropic.com>
- Simplify cell text color determination from 16 lines to 4 lines - Fix memory leak: hint hover listeners were re-attached on every rebuild - Move hint listener setup to initialization (runs once, not per rebuild) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix bug where hovering row A didn't show pinned token trajectory when different row B was pinned (chart.ts positionsToShow logic) - Add unit test for hover trajectory with pinned rows - Add SQLite database initialization to test.sh and e2e.spec.ts - Fix test.sh widget grep to exclude React Integration Tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For long prompts, only return the last max_loc token positions. This reduces memory/bandwidth when analyzing prompts with many tokens but you only care about the final predictions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR includes several enhancements to the LogitLens widget and adds backend test infrastructure:
Widget Improvements
Backend Fixes
all_entropy(containing dead proxy objects) was used outside the trace context instead of the properly savedentropytensorBackend Test Infrastructure
test.toml) with only GPT-2 for fast local testingREMOTE=falseusing local nnsight executionTest plan
uv run pytest workbench/_api/tests/ -vto verify all 9 tests passManual testing:
🤖 Generated with Claude Code