Skip to content

Conversation

@ryanbreen
Copy link
Owner

@ryanbreen ryanbreen commented Jan 23, 2026

Summary

  • Optimize dirty region tracking with multiple rectangles instead of single bounding box union
  • Fix critical per-pixel dirty marking that was defeating all batching optimizations
  • Eliminate cascading flushes and batch cursor operations
  • Optimize terminal switching with smart log replay (skip offscreen lines)

Key Changes

Dirty Region Tracking (double_buffer.rs)

  • Track 4 separate dirty rectangles with smart merging (MERGE_PROXIMITY=32px)
  • Add mark_rect_dirty() for batch dirty marking
  • Fast-path flush for full-width dirty regions (single memcpy vs per-row)

Per-Pixel Fix (logger.rs, primitives.rs)

  • Critical: Removed per-pixel mark_region_dirty from Canvas set_pixel
  • Added single dirty mark at end of fill_rect and draw_glyph
  • Previously: 100 dirty ops per character → Now: 1 dirty op per character

Cascading Flush Elimination (terminal_manager.rs, render_queue.rs)

  • Created write_bytes_to_shell_internal() non-flushing API for render thread
  • Batch cursor operations (hide once/show once vs per-character)
  • Render thread now handles all flushing

Terminal Switching (terminal_manager.rs)

  • Changed flush_full() to flush() - only dirty regions, not entire 8MB buffer
  • Reduced log buffer from 200 to 50 lines
  • Skip offscreen log lines during replay - zero scroll operations

Performance Impact

Operation Before After
Dirty marks per char 100+ (per-pixel) 1 (per-glyph)
Cursor ops per line 3× per char 2 total
Log tab switch 200 lines + 160 scrolls ~40 visible lines, 0 scrolls
Flush on switch 8MB full buffer Dirty regions only

Test plan

  • Build succeeds with --features interactive
  • Run ./docker/qemu/run-interactive.sh and verify:
    • Character rendering is responsive
    • F1/F2 terminal switching is fast (<500ms)
    • Scrolling works correctly

🤖 Generated with Claude Code

ryanbreen and others added 5 commits January 23, 2026 08:22
Three performance improvements for framebuffer rendering:

1. Character-level dirty marking: Reduced mark_dirty() calls from
   ~80 per character (one per pixel) to 1 per character. Added
   write_pixel_no_mark() for batch operations with single
   mark_dirty_rect() call at the end.

2. Multiple dirty rectangles: Replace single bounding box with
   4 separate dirty rects. Uses intelligent merging:
   - Merge if rects overlap or are within 32 pixels
   - When full, merge closest pair to make room
   - Result: sparse updates (cursor + text) flush independently

3. Scroll optimization: Remove redundant flush_if_dirty() before
   scroll - pending dirty regions scroll with content, only the
   cleared bottom line needs flushing.

Expected improvements:
- Character rendering: 80x fewer dirty region operations
- Sparse updates: cursor blink + text no longer flush giant bbox
- Scroll: ~1.92MB less buffer copying per scroll operation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Additional optimizations for framebuffer rendering:

1. Add mark_rect_dirty() to DirtyRegionTracker - marks entire
   rectangle in one operation instead of per-row calls. For a
   16-pixel character, this is 16 merge operations → 1.

2. Add mark_region_dirty_rect() to DoubleBufferedFrameBuffer
   as wrapper for the new efficient method.

3. Add fast path in flush() for full-width dirty rectangles -
   if x_start == 0 && x_end == stride, copy entire vertical
   block in one memcpy instead of per-row copies.

4. Update mark_dirty_rect() in logger.rs to use the new
   batch method.

Expected improvements:
- Character dirty marking: 16x fewer operations
- Scroll/clear flush: Nx fewer memcpy calls (N = number of rows)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Critical performance fixes for graphics rendering:

1. Create write_bytes_to_shell_internal() that doesn't flush
   - Render thread uses this non-flushing version
   - Render thread handles flush once after all batches
   - Before: render_batch -> write_bytes_to_shell [FLUSH] -> flush_framebuffer [FLUSH AGAIN]
   - After: render_batch -> write_bytes_to_shell_internal [NO FLUSH] -> flush_framebuffer [FLUSH ONCE]

2. Batch cursor operations in write_bytes_to_shell()
   - Hide cursor ONCE at start of batch
   - Write all bytes
   - Show cursor ONCE at end of batch
   - Before: 80 chars = 240 cursor ops (hide/draw/show per char)
   - After: 80 chars = 2 cursor ops (hide once, show once)

3. Update run-interactive.sh to kill existing containers
   - Prevents port conflict errors
   - Cleaner startup experience

Expected improvement: 10-50x for batch text output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CRITICAL FIX: The Canvas::set_pixel implementation was calling
mark_region_dirty for EVERY SINGLE PIXEL, completely defeating
all our batching optimizations.

For a character with ~100 drawn pixels:
- Before: 100 dirty region operations (with merge logic each time)
- After: 1 dirty region operation for the entire glyph

Changes:
1. Remove mark_region_dirty from set_pixel in logger.rs
   - Pixel data still written, just not marked dirty per-pixel

2. Add mark_dirty_region at end of fill_rect in primitives.rs
   - Marks entire filled rectangle once

3. Add mark_dirty_region at end of draw_glyph in primitives.rs
   - Marks entire glyph bounding box once

This was the root cause of the slow rendering - O(n) dirty marking
per character where n = pixels in glyph.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change flush_full() to flush() in switch_terminal - only flush dirty
  regions instead of copying entire 8MB framebuffer
- Reduce LOG_BUFFER_SIZE from 200 to 50 lines for faster tab switching
- Skip offscreen log lines during replay - only render visible rows,
  eliminating expensive scroll operations entirely

Previously switching to Logs tab would render all 200 lines causing
~160 scroll operations. Now we calculate visible rows and skip lines
that would scroll off, reducing to zero scroll operations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ryanbreen ryanbreen changed the title perf(graphics): optimize dirty region tracking and character rendering perf(graphics): comprehensive framebuffer rendering optimizations Jan 23, 2026
@ryanbreen ryanbreen merged commit 0ab515c into main Jan 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants