perf(graphics): comprehensive framebuffer rendering optimizations #116

ryanbreen · 2026-01-23T13:23:11Z

Summary

Optimize dirty region tracking with multiple rectangles instead of single bounding box union
Fix critical per-pixel dirty marking that was defeating all batching optimizations
Eliminate cascading flushes and batch cursor operations
Optimize terminal switching with smart log replay (skip offscreen lines)

Key Changes

Dirty Region Tracking (double_buffer.rs)

Track 4 separate dirty rectangles with smart merging (MERGE_PROXIMITY=32px)
Add mark_rect_dirty() for batch dirty marking
Fast-path flush for full-width dirty regions (single memcpy vs per-row)

Per-Pixel Fix (logger.rs, primitives.rs)

Critical: Removed per-pixel mark_region_dirty from Canvas set_pixel
Added single dirty mark at end of fill_rect and draw_glyph
Previously: 100 dirty ops per character → Now: 1 dirty op per character

Cascading Flush Elimination (terminal_manager.rs, render_queue.rs)

Created write_bytes_to_shell_internal() non-flushing API for render thread
Batch cursor operations (hide once/show once vs per-character)
Render thread now handles all flushing

Terminal Switching (terminal_manager.rs)

Changed flush_full() to flush() - only dirty regions, not entire 8MB buffer
Reduced log buffer from 200 to 50 lines
Skip offscreen log lines during replay - zero scroll operations

Performance Impact

Operation	Before	After
Dirty marks per char	100+ (per-pixel)	1 (per-glyph)
Cursor ops per line	3× per char	2 total
Log tab switch	200 lines + 160 scrolls	~40 visible lines, 0 scrolls
Flush on switch	8MB full buffer	Dirty regions only

Test plan

Build succeeds with --features interactive
Run ./docker/qemu/run-interactive.sh and verify:
- Character rendering is responsive
- F1/F2 terminal switching is fast (<500ms)
- Scrolling works correctly

🤖 Generated with Claude Code

Three performance improvements for framebuffer rendering: 1. Character-level dirty marking: Reduced mark_dirty() calls from ~80 per character (one per pixel) to 1 per character. Added write_pixel_no_mark() for batch operations with single mark_dirty_rect() call at the end. 2. Multiple dirty rectangles: Replace single bounding box with 4 separate dirty rects. Uses intelligent merging: - Merge if rects overlap or are within 32 pixels - When full, merge closest pair to make room - Result: sparse updates (cursor + text) flush independently 3. Scroll optimization: Remove redundant flush_if_dirty() before scroll - pending dirty regions scroll with content, only the cleared bottom line needs flushing. Expected improvements: - Character rendering: 80x fewer dirty region operations - Sparse updates: cursor blink + text no longer flush giant bbox - Scroll: ~1.92MB less buffer copying per scroll operation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Additional optimizations for framebuffer rendering: 1. Add mark_rect_dirty() to DirtyRegionTracker - marks entire rectangle in one operation instead of per-row calls. For a 16-pixel character, this is 16 merge operations → 1. 2. Add mark_region_dirty_rect() to DoubleBufferedFrameBuffer as wrapper for the new efficient method. 3. Add fast path in flush() for full-width dirty rectangles - if x_start == 0 && x_end == stride, copy entire vertical block in one memcpy instead of per-row copies. 4. Update mark_dirty_rect() in logger.rs to use the new batch method. Expected improvements: - Character dirty marking: 16x fewer operations - Scroll/clear flush: Nx fewer memcpy calls (N = number of rows) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Critical performance fixes for graphics rendering: 1. Create write_bytes_to_shell_internal() that doesn't flush - Render thread uses this non-flushing version - Render thread handles flush once after all batches - Before: render_batch -> write_bytes_to_shell [FLUSH] -> flush_framebuffer [FLUSH AGAIN] - After: render_batch -> write_bytes_to_shell_internal [NO FLUSH] -> flush_framebuffer [FLUSH ONCE] 2. Batch cursor operations in write_bytes_to_shell() - Hide cursor ONCE at start of batch - Write all bytes - Show cursor ONCE at end of batch - Before: 80 chars = 240 cursor ops (hide/draw/show per char) - After: 80 chars = 2 cursor ops (hide once, show once) 3. Update run-interactive.sh to kill existing containers - Prevents port conflict errors - Cleaner startup experience Expected improvement: 10-50x for batch text output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CRITICAL FIX: The Canvas::set_pixel implementation was calling mark_region_dirty for EVERY SINGLE PIXEL, completely defeating all our batching optimizations. For a character with ~100 drawn pixels: - Before: 100 dirty region operations (with merge logic each time) - After: 1 dirty region operation for the entire glyph Changes: 1. Remove mark_region_dirty from set_pixel in logger.rs - Pixel data still written, just not marked dirty per-pixel 2. Add mark_dirty_region at end of fill_rect in primitives.rs - Marks entire filled rectangle once 3. Add mark_dirty_region at end of draw_glyph in primitives.rs - Marks entire glyph bounding box once This was the root cause of the slow rendering - O(n) dirty marking per character where n = pixels in glyph. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Change flush_full() to flush() in switch_terminal - only flush dirty regions instead of copying entire 8MB framebuffer - Reduce LOG_BUFFER_SIZE from 200 to 50 lines for faster tab switching - Skip offscreen log lines during replay - only render visible rows, eliminating expensive scroll operations entirely Previously switching to Logs tab would render all 200 lines causing ~160 scroll operations. Now we calculate visible rows and skip lines that would scroll off, reducing to zero scroll operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ryanbreen and others added 5 commits January 23, 2026 08:22

ryanbreen changed the title ~~perf(graphics): optimize dirty region tracking and character rendering~~ perf(graphics): comprehensive framebuffer rendering optimizations Jan 23, 2026

ryanbreen merged commit 0ab515c into main Jan 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(graphics): comprehensive framebuffer rendering optimizations #116

perf(graphics): comprehensive framebuffer rendering optimizations #116

ryanbreen commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf(graphics): comprehensive framebuffer rendering optimizations #116

perf(graphics): comprehensive framebuffer rendering optimizations #116

Conversation

ryanbreen commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Dirty Region Tracking (double_buffer.rs)

Per-Pixel Fix (logger.rs, primitives.rs)

Cascading Flush Elimination (terminal_manager.rs, render_queue.rs)

Terminal Switching (terminal_manager.rs)

Performance Impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryanbreen commented Jan 23, 2026 •

edited

Loading