Skip to content

Conversation

@bigFin
Copy link
Contributor

@bigFin bigFin commented Nov 29, 2025

Description

Papertoy but on multiple displays

  • independent configuration of frame rate, resolution, and scaling

  • handles display rotation

  • tested on Niri and Hyprland, both Nixos and Arch

  • Per-Output Paper Instances: Manage each output independently. A new Paper struct encapsulates all state necessary for rendering on a single output (including WlrSurface, Shader, and OpenGL contexts). Subsequently decided to try a shared OpenGL context to see if there were any per benefits, if so it wasn't major.

  • Respect Compositor Configuration: The application now correctly processes zwlr_layer_surface_v1.configure events, which provide the authoritative logical dimensions from the compositor. This ensures the application always resizes correctly based on compositor instructions.

  • Output Orientation Support: The application now handles wl_output.geometry events, which communicate display rotation. It swaps width and height when a display is rotated by 90 or 270 degrees, tested on 3440x1440 vs 1440x3440.

  • wlr_layer_surface Anchoring: Layer surfaces are now anchored to all four edges (top, bottom, left, right). This explicitly tells the compositor to stretch the wallpaper to fill the entire output area- this was needed for Niri when the surface might not cover the whole display when in vertical orientation.

  • display scaling: Added a scale parameter to uniformly adjust the resolution.

How to Test

1. Performance Mode (Recommended for 4K)
Render at half resolution (1080p on 4K) to save GPU power, but keep the window fullscreen.

$ papertoy shader.glsl --output "id=HDMI-A-1,scale=0.5"

2. Multi-Monitor Setup
Configure a high-refresh main monitor and a slower secondary monitor.

$ papertoy shader.glsl \
    --output "id=DP-1,frame-rate=144" \
    --output "id=HDMI-A-1,scale=0.5,frame-rate=30"

3. Custom Resolution
Force a specific aspect ratio or size.

$ papertoy shader.glsl --output "id=DP-1,resolution=800x600"

4. All displays at native settings
Automatically render papertoy on all displays

$ papertoy shader.glsl 

Verify that animations are smooth on all displays and that the wallpaper correctly fills the entire screen area, including any fractionally scaled or rotated displays.

Copy link
Owner

@sin-ack sin-ack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. From what I can tell you're getting help from an LLM for this change. That's fine, but:

  • The PR description and commit message seem to be generated too. Please describe the change with your own words. LLMs are good at explaining what happened but they're not good at explaining why.
  • Unfortunately LLM providers really like to RL their coding models into doing a single commit for all changes, resulting in a hard-to-review PR because it's more difficult to tell which line of code belongs to which change. Please use atomic commits for each separate change being made. I recommend you to look at this repo's history for how I expect things to be structured. You can ask the LLM about how to modify Git history, or make it do the split by itself (although I wouldn't really trust an LLM to run commands on my behalf, git reflog is your friend here).
  • As I detailed below, one of the reasons I kinda postponed working on this was due to not being able to figure out how to handle multiple frame-rate options within the same loop (I had initially planned to spawn multiple threads and run a queue on each). The changes below seem to mostly address that nicely, but they use the same frame-rate for all outputs. Instead, make the --output flag configure individual outputs which you can then use as the basis for creating Paper objects.
  • Since as part of the changes the command-line options will change, please make sure to update the README too.

@bigFin bigFin force-pushed the multi-output branch 4 times, most recently from 73cf679 to e69a7f0 Compare December 1, 2025 02:17
@bigFin
Copy link
Contributor Author

bigFin commented Dec 2, 2025

TYVM for the review and many thanks for taking the time to help me clean up the slop. I divided the chungus squash commit to be atomic. Independent configurable frame rates works. I tested this on Niri and Hyprland and both work for me.

@bigFin
Copy link
Contributor Author

bigFin commented Dec 4, 2025

I tried a few things help the user manage gpu utilization

  • Add a render scaling parameter so the user can choose to render at a lower res. Some combination of this and lower fps can significantly reduce gpu use.
  • shared OpenGL context for multi output rendering. Im not sure if this really makes a big impact in practice.

@bigFin bigFin requested a review from sin-ack December 4, 2025 03:21
@bigFin bigFin force-pushed the multi-output branch 2 times, most recently from a48baac to 9322251 Compare December 5, 2025 23:45
@bigFin bigFin force-pushed the multi-output branch 3 times, most recently from 47d0801 to f07eb8b Compare December 6, 2025 00:34
- Integrates `nixgl` via overlay to ensure pure evaluation.
- Adds `nixGLIntel` wrapper for x86 Linux, enabling execution on non-NixOS distros (like Arch).
- Drops the generic `nixGL` auto-detect wrapper as it relies on `builtins.currentTime`, which is impure and fails `nix flake check` in CI.
- Updates `flake.lock` with `nixgl` input.
@sin-ack
Copy link
Owner

sin-ack commented Dec 6, 2025

Sorry for the lack of a review, I was on a trip. I'll take a closer look at this on Sunday.

Add a render scaling parameter so the user can choose to render at a lower res.

The --resolution argument already does this. The result will be scaled to the output's size but the surface size is set to the argument's value so that rendering at a smaller resolution is possible.

shared OpenGL context for multi output rendering.

Some benchmarks for render time would be cool!

@bigFin
Copy link
Contributor Author

bigFin commented Dec 7, 2025

Hope you had a good trip! No rush. As you can see the scope of this PR has extended a bit over the week as I tinkered and tried a few ideas, which has been fun and a good learning experience. Thanks for the feedback and for working with me on this!

Good point about the resolution argument, this type of optimisation was always possible. The uniform scale is just a convenience.

Benchmarks on the shared OpenGL context was interesting technically but not so impactful in my case and probably for most desktop users. Sharing the context reduces the cost of switching surfaces on the driver side ~5% reduction fromeglMakeCurrent (-2.4 us). This shifts the work to the render to swap the viewport +8% (+1.5 us). So maybe we save 1 us per frame with the shared context on my system, seems negligible to me but maybe it adds up to something when using this to drive a lot of outputs.

Commit: 58d7610 (Shared Context + Telemetry) vs Reverted HEAD (Independent Contexts + Telemetry)
Methodology:

  • Control: Independent GLContext per Output.
  • Experiment: Single Shared GLContext across all Outputs.
  • Metric: Internal Telemetry (average nanoseconds per frame) + perf stat.
  • Workload: shaders/stochastic-asym-quads.glsl on 2 outputs (4K@119Hz, UWQHD@100Hz).
  • Duration: Fixed 600 frames per run (approx 3 seconds).

Results (Average of 3 runs, Normalized per Render)

Metric Independent Contexts (Control) Shared Context (Experiment) Diff (ns) % Diff
Total System CPU Time (ms) ~704 ~713 +9 +1.3%
MakeCurrent (Context Switch) ~47,978 ~45,601 -2,377 -5.0%
Render (GPU Submission) ~17,517 ~18,985 +1,468 +8.4%
SwapBuffers (Present) ~1,780,501 ~1,563,170 -217,331 -12.2%
OS Context Switches (total) ~8,790 ~9,007 +217 +2.5%

edit----
ran the perf test a bit longer and also added GPU metrics- variance was whack on the short runs now its <1% across all metrics. Lower VRAM use is also nice.

  • Duration: Fixed 12000 frames per run (approx 60 seconds) for maximum stability.

Results (Average of 3 runs, Normalized per Render)

Metric Independent Contexts (Control) Shared Context (Experiment) Diff % Diff
VRAM Usage (Delta) 405 MiB 333 MiB -72 MiB -17.8%
Avg MakeCurrent (ns) ~44,943 ~42,587 -2,356 -5.2%
Avg Render (ns) ~15,983 ~15,709 -274 -1.7%
Avg SwapBuffers (ns) ~2,649,089 ~2,841,274 +192,185 +7.3%
Total System CPU Time (ms) ~9,431 ~9,305 -126 -1.3%

Seems like a win to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants