Skip to content

feat: visual validation, notable components, and Figma context improvements#254

Open
rubenmarcus wants to merge 1 commit intomainfrom
feat/visual-validation-and-figma-improvements
Open

feat: visual validation, notable components, and Figma context improvements#254
rubenmarcus wants to merge 1 commit intomainfrom
feat/visual-validation-and-figma-improvements

Conversation

@rubenmarcus
Copy link
Member

Summary

  • Visual validation: Automatic pixel-level + LLM vision comparison of implementation vs Figma screenshots on final loop iterations, with 2-iteration auto-extension to fix issues
  • Notable component detection: Identifies scroll indicators, social sidebars, nav elements in Figma sections with exact position hints, dimensions, and scroll behavior
  • Improved Figma context: Better z-index stacking guidance, STRETCH→cover mapping, scroll indicator patterns, completeness checklist, absolute positioning precision
  • New modules: dev-server.ts (Vite launcher + screenshot capture), visual-validation.ts (pixelmatch + vision API comparison)
  • CLI: Added --no-visual-check flag, vision cost tracking in CostTracker

Test plan

  • Run pnpm build — passes
  • Run ralph-starter run with a Figma spec that has screenshots in public/images/screenshots/ — visual validation triggers on final iteration
  • Verify --no-visual-check disables visual validation
  • Verify notable components appear in generated Figma plans (scroll indicators, sidebars)
  • Verify cost tracker reports vision API calls

🤖 Generated with Claude Code

…d Figma context

- Add visual comparison validation (pixelmatch + LLM vision) that runs on
  final iterations when Figma screenshots are present, with automatic
  2-iteration extension to fix visual issues
- Add --no-visual-check CLI flag to disable visual validation
- Add dev-server module for launching/capturing implementation screenshots
- Detect notable sub-components in Figma sections (scroll indicators,
  social sidebars, nav elements) with position hints and scroll behavior
- Improve context-builder Figma instructions: z-index stacking clarity,
  STRETCH→cover guidance, scroll indicator patterns, completeness checklist,
  absolute positioning precision
- Enhance plan-generator with inner shadow CSS output, deeper node traversal
  (depth 8), and text-descendant inclusion at deep levels
- Add vision call tracking to CostTracker

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions
Copy link
Contributor

Issue Linking Reminder

This PR doesn't appear to have a linked issue. Consider linking to:

  • This repo: Closes #123
  • ralph-ideas: Closes multivmlabs/ralph-ideas#123

Using Closes, Fixes, or Resolves will auto-close the issue when this PR is merged.


If this PR doesn't need an issue, you can ignore this message.

@github-actions
Copy link
Contributor

⚠️ Bundle Size Analysis

Metric Value
Base 2234.84 KB
PR 2314.06 KB
Diff 79.22 KB (3.00%)
Bundle breakdown
156K	dist/auth
32K	dist/automation
4.0K	dist/cli.d.ts
4.0K	dist/cli.d.ts.map
16K	dist/cli.js
12K	dist/cli.js.map
528K	dist/commands
28K	dist/config
4.0K	dist/index.d.ts
4.0K	dist/index.d.ts.map
4.0K	dist/index.js
4.0K	dist/index.js.map
828K	dist/integrations
84K	dist/llm
884K	dist/loop
188K	dist/mcp
32K	dist/presets
92K	dist/setup
40K	dist/skills
392K	dist/sources
76K	dist/ui
124K	dist/utils
336K	dist/wizard

Comment on lines +370 to +397
body: JSON.stringify({
model: model || config.defaultModel,
max_tokens: 2048,
messages: [
{ role: 'system', content: VISION_SYSTEM_PROMPT },
{
role: 'user',
content: [
{ type: 'text', text: 'TARGET DESIGN from Figma:' },
{
type: 'image_url',
image_url: { url: `data:image/png;base64,${designBase64}` },
},
{ type: 'text', text: 'CURRENT IMPLEMENTATION:' },
{
type: 'image_url',
image_url: { url: `data:image/png;base64,${implBase64}` },
},
{ type: 'text', text: 'PIXEL DIFF OVERLAY (red = different):' },
{
type: 'image_url',
image_url: { url: `data:image/png;base64,${diffBase64}` },
},
{ type: 'text', text: VISION_COMPARISON_PROMPT },
],
},
],
}),

Check warning

Code scanning / CodeQL

File data in outbound network request Medium

Outbound network request depends on
file data
.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds a three-layer visual validation pipeline (pixelmatch → LLM Vision → strict pixel gate) that automatically compares implementation screenshots against Figma design screenshots on the final loop iteration, with 2-iteration auto-extension to fix issues. It also introduces notable component detection (scroll indicators, social sidebars, nav elements) in Figma plan generation and improves Figma context with better z-index stacking guidance, STRETCH→cover mapping, and a completeness checklist.

  • New modules: visual-validation.ts (609 lines — pixelmatch + LLM vision comparison) and dev-server.ts (171 lines — auto-detect and launch dev servers for screenshot capture)
  • Bug: CostTracker.reset() doesn't clear the new _visionCalls/_visionCost fields, leading to stale vision metrics after reset
  • Inconsistency: design-spec.ts still emits z-index: 1 in 4 places for text/content layers, while context-builder.ts and plan-generator.ts were updated to recommend z-index: 10+ — this creates contradictory guidance that may confuse the coding agent
  • Safety concern: dev-server.ts uses null as unknown as ResultPromise for pre-existing servers, which could cause runtime errors if .process is accessed by callers

Confidence Score: 3/5

  • Largely safe but has a z-index inconsistency bug that directly undermines one of the PR's goals, plus a missing reset of new state fields
  • The PR is well-structured and the visual validation pipeline is thoughtfully designed with proper fallback behavior. However, two concrete bugs lower confidence: (1) the z-index inconsistency in design-spec.ts contradicts the very fix this PR introduces — the agent will receive "use z-index: 1" from the spec output and "use z-index: 10+" from the context builder simultaneously, which is the stacking issue the PR aims to solve; (2) the CostTracker.reset() not clearing vision state will produce incorrect metrics. The unsafe null cast in dev-server.ts is also a runtime risk.
  • Pay close attention to src/integrations/figma/parsers/design-spec.ts (z-index inconsistency at lines 608, 670, 674, 704) and src/loop/cost-tracker.ts (incomplete reset method)

Important Files Changed

Filename Overview
src/loop/visual-validation.ts New 609-line module implementing a three-layer visual comparison pipeline (pixelmatch → LLM Vision → strict pixel gate). Well-structured with clear separation. Minor concern: auto-installing Playwright globally with npm rather than the project's package manager.
src/loop/dev-server.ts New module for auto-detecting and launching dev servers. Uses unsafe null as unknown as ResultPromise cast when a server is already running on the target port, which could cause runtime errors if .process is accessed.
src/loop/executor.ts Adds visual validation integration to the main loop with 2-iteration auto-extension pattern (mirroring existing build-fix extension). Non-fatal error handling is appropriate.
src/loop/cost-tracker.ts Adds vision call tracking with _visionCalls and _visionCost fields. Bug: reset() method doesn't clear the new vision tracking state, leading to stale data after reset.
src/integrations/figma/parsers/design-spec.ts Adds interactive element detection, deeper depth traversal (6→8), and containsText filter for deep nodes. Bug: z-index guidance still says z-index: 1 in 4 places while context-builder was updated to z-index: 10, creating contradictory instructions.
src/integrations/figma/parsers/plan-generator.ts Adds NotableComponent detection (indicators, sidebars, nav), inner shadow support, updated z-index stacking guidance. Well-implemented with proper category classification and position hints.
src/loop/context-builder.ts Updates implementation guidance: STRETCH→cover mapping, z-index 10+ for text layers, scroll indicator patterns, completeness checklist, absolute positioning precision. All changes are prompt text updates — no logic risk.
src/cli.ts Adds --no-visual-check CLI flag. Straightforward commander option addition.
src/commands/run.ts Wires visual validation by auto-detecting screenshot PNGs in public/images/screenshots/ and passing paths to the loop. Clean integration with existing patterns.
package.json Adds pixelmatch, pngjs, and @types/pngjs dependencies. All are well-known, lightweight npm packages appropriate for pixel-level image comparison.
pnpm-lock.yaml Lock file updated for new dependencies. Auto-generated.

Sequence Diagram

sequenceDiagram
    participant Executor as Loop Executor
    participant VV as Visual Validation
    participant DS as Dev Server
    participant PW as Playwright
    participant PM as pixelmatch
    participant LLM as LLM Vision API
    participant CT as Cost Tracker

    Executor->>VV: runVisualValidation(cwd, screenshots)
    VV->>PW: ensurePlaywright()
    alt Not installed
        PW-->>PW: npm install -g playwright
    end
    VV->>DS: startDevServer(cwd)
    DS->>DS: detectDevCommand(cwd)
    DS->>DS: waitForServer(url)
    DS-->>VV: DevServerInfo {url, port, kill}
    VV->>PW: captureScreenshot(url)
    PW-->>VV: implScreenshot (PNG Buffer)
    
    loop For each design screenshot
        VV->>PM: pixelDiff(design, impl)
        PM-->>VV: {diffRatio, diffImage}
        alt diffRatio ≤ 5% (normal) or ≤ 2% (strict)
            VV-->>VV: PASS — skip LLM
        else diffRatio > threshold
            VV->>LLM: compareWithLLMVision(design, impl, diff)
            LLM-->>VV: {matches, issues, usage}
        end
    end
    
    VV->>DS: server.kill()
    VV-->>Executor: VisualValidationResult
    Executor->>CT: recordVisionCall(usage)
    
    alt Issues found & first pass
        Executor-->>Executor: Extend loop +2 iterations
        Note over Executor: Agent fixes issues
        Executor->>VV: runVisualValidation(strictMode=true)
    end
Loading

Last reviewed commit: c14b83e

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +125 to +128
try {
await execa('npm', ['install', '-g', 'playwright'], {
timeout: 120_000,
reject: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global npm install -g has side effects and may fail in restricted environments

Auto-installing Playwright globally with npm install -g playwright can fail in CI, Docker containers, or systems without global npm write access. It also bypasses the project's package manager (the repo uses pnpm). Consider using the project's package manager (pnpm add -D playwright), installing locally, or documenting this requirement so users can pre-install Playwright.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/visual-validation.ts
Line: 125-128

Comment:
**Global `npm install -g` has side effects and may fail in restricted environments**

Auto-installing Playwright globally with `npm install -g playwright` can fail in CI, Docker containers, or systems without global npm write access. It also bypasses the project's package manager (the repo uses pnpm). Consider using the project's package manager (`pnpm add -D playwright`), installing locally, or documenting this requirement so users can pre-install Playwright.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +123 to +129
return {
url,
port: devCmd.port,
process: null as unknown as ResultPromise,
kill: async () => {}, // Nothing to kill — not our process
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe cast of null to ResultPromise for pre-existing server

When a server is already running on the detected port, process is set to null as unknown as ResultPromise. Any code that accesses server.process (e.g., checking .pid, calling methods on it) will get a runtime error. Consider making process optional (process?: ResultPromise) in the DevServerInfo interface, or using a discriminated union, so callers handle this case explicitly.

Suggested change
return {
url,
port: devCmd.port,
process: null as unknown as ResultPromise,
kill: async () => {}, // Nothing to kill — not our process
};
}
return {
url,
port: devCmd.port,
process: undefined as unknown as ResultPromise,
kill: async () => {}, // Nothing to kill — not our process
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/dev-server.ts
Line: 123-129

Comment:
**Unsafe cast of `null` to `ResultPromise` for pre-existing server**

When a server is already running on the detected port, `process` is set to `null as unknown as ResultPromise`. Any code that accesses `server.process` (e.g., checking `.pid`, calling methods on it) will get a runtime error. Consider making `process` optional (`process?: ResultPromise`) in the `DevServerInfo` interface, or using a discriminated union, so callers handle this case explicitly.

```suggestion
    return {
      url,
      port: devCmd.port,
      process: undefined as unknown as ResultPromise,
      kill: async () => {}, // Nothing to kill — not our process
    };
```

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (2)

src/loop/cost-tracker.ts
reset() doesn't clear vision tracking state

The reset() method clears this.iterations but does not reset the new _visionCalls and _visionCost fields. If reset() is called (e.g., between sessions), the vision metrics will carry stale data, producing incorrect stats.

  reset(): void {
    this.iterations = [];
    this._visionCalls = 0;
    this._visionCost = 0;
  }
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/cost-tracker.ts
Line: 454-456

Comment:
**`reset()` doesn't clear vision tracking state**

The `reset()` method clears `this.iterations` but does not reset the new `_visionCalls` and `_visionCost` fields. If `reset()` is called (e.g., between sessions), the vision metrics will carry stale data, producing incorrect stats.

```suggestion
  reset(): void {
    this.iterations = [];
    this._visionCalls = 0;
    this._visionCost = 0;
  }
```

How can I resolve this? If you propose a fix, please make it concise.

src/integrations/figma/parsers/design-spec.ts
Inconsistent z-index guidance (1 vs 10)

The context-builder.ts and plan-generator.ts were updated in this PR to recommend z-index: 10 for text/content layers over background images. However, design-spec.ts still emits z-index: 1 in several places (lines 608, 670, 674, 704). This creates contradictory guidance — the context tells the agent "use z-index 10+" while the spec output says "z-index: 1". When there are multiple stacked image layers (z-0 through z-4), z-index: 1 will place text behind higher image layers, which is exactly the bug this PR aims to fix.

Lines affected:

  • Line 608: z-index: 1 for child content over background image
  • Line 670: z-index: 1 for text/content in composite hero parallax
  • Line 674: z-index: 1 for content in smaller composite backgrounds
  • Line 704: z-index: 1 for text/content in non-text-overlay composites
              `  * All child content must use \`position: relative; z-index: 10\` to appear above the image`
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/integrations/figma/parsers/design-spec.ts
Line: 608

Comment:
**Inconsistent z-index guidance (1 vs 10)**

The `context-builder.ts` and `plan-generator.ts` were updated in this PR to recommend `z-index: 10` for text/content layers over background images. However, `design-spec.ts` still emits `z-index: 1` in several places (lines 608, 670, 674, 704). This creates contradictory guidance — the context tells the agent "use z-index 10+" while the spec output says "z-index: 1". When there are multiple stacked image layers (z-0 through z-4), z-index: 1 will place text *behind* higher image layers, which is exactly the bug this PR aims to fix.

Lines affected:
- Line 608: `z-index: 1` for child content over background image
- Line 670: `z-index: 1` for text/content in composite hero parallax
- Line 674: `z-index: 1` for content in smaller composite backgrounds
- Line 704: `z-index: 1` for text/content in non-text-overlay composites

```suggestion
              `  * All child content must use \`position: relative; z-index: 10\` to appear above the image`
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant