Skip to content

Conversation

@aevyrie
Copy link
Member

@aevyrie aevyrie commented Dec 29, 2025

Objective

  • Speed up collect_meshes_for_gpu_building, a bottleneck for scenes with many moving meshes.

Solution

  • Parallelize the gather step for mesh collection.
  • Immediately start up a task for serial collection of meshes, which cannot be parallelized.
  • Spawn many tasks for gathering meshes, and send batches of these to the collection task
  • This allows the serial collection step to start immediately, instead of being delayed until after all collection is finished.

Testing

  • Built a new bevymark_3d stress test for benchmarking dynamic 3d mesh scenes. This is not currently covered by our stress tests. Bevymark 3D #22298
  • With 200k meshes, this drops total frame times from 16.4ms to 12.3ms (-4.1ms)
image
  • Mesh collection itself drops from 7.9ms to 3.6ms (-4.3ms)
image

@aevyrie aevyrie mentioned this pull request Dec 29, 2025
@IceSentry IceSentry added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 29, 2025
@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 29, 2025
@alice-i-cecile
Copy link
Member

CI failures are real, but should be easy to fix.

@tychedelia tychedelia self-requested a review December 29, 2025 21:27
github-merge-queue bot pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Add a stress test that exercises the 3d mesh pipeline for dynamic
scenes.
- Large static scenes like caldera hotel don't expose performance issues
when many meshes are moving.
- Give us a way to benchmark PRs like
   - #22297
   - #22281
   - #22226

## Solution

- Make a 3d version of `bevymark`, sticking to the existing patterns as
closely as possible.

## Testing

<img width="1072" height="684" alt="image"
src="https://github.com/user-attachments/assets/41214ba9-ffad-471d-a320-1f007490dead"
/>

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

@alice-i-cecile g2g now

@aevyrie aevyrie added this to the 0.18 milestone Dec 30, 2025
@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

Added to the milestone as it seems about equivalent to my others perf PRs that were also added.

@aevyrie
Copy link
Member Author

aevyrie commented Dec 30, 2025

This PR needs more thorough testing before I'd feel comfortable merging. Parallelizing isn't always a speedup and can increase the total amount of CPU work needed even if throughput increases.

So far, things are still looking promising after my latest round of commits

cargo rer bevymark_3d --features=debug,trace_tracy -- --benchmark --waves 250 --per-wave 1000

comparing this branch to main

frametime

image

collect_meshes_for_gpu_building

image

@alice-i-cecile alice-i-cecile removed this from the 0.18 milestone Jan 1, 2026
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks fine, but I think with some different factoring this would be easier to maintain and check. I'm not 100% sure the refactoring is viable, but I'd like to see if we can try.

Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I love this. This will help so much with making other parts of the system parallel, and addons and apps should be able to use this for increased parallelism too. In fact, it's essentially a big upgrade for the ECS, allowing easy parallelism in situations where par_iter() on a query isn't enough.

Thanks a bunch for taking the time to refactor it!

@aevyrie
Copy link
Member Author

aevyrie commented Jan 2, 2026

Revisiting benches after visibility optimizations merged, the improvements are still reproducible, and overall frametimes are improved thanks to the optimizations on main.

image

@alice-i-cecile alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 3, 2026
@alice-i-cecile alice-i-cecile added this to the 0.19 milestone Jan 14, 2026
@alice-i-cecile alice-i-cecile added A-Tasks Tools for parallel and async work X-Uncontroversial This work is generally agreed upon D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Jan 14, 2026
@Aceeri
Copy link
Member

Aceeri commented Jan 16, 2026

Screenshot 2026-01-16 at 5 16 28 AM

Profiled the changes here vs not and got some decent results on my project. The main important peaks are those 2 at the end which this PR reduces from ~22ms to ~10ms.

The faster portions are a result of flying out to lower LODs so not super representative, but maybe shows a bit of the overhead for smaller amounts of meshes (left 2 peaks are 18 meshes), increasing it from 25µs to ~35-50µs.

Copy link
Member

@Aceeri Aceeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, the buffered channel seems useful for my own parallel code and the recycling is something I have been doing as well but making it generic is very nice.

Solid performance improvement on my project as well: 22ms -> 10ms in the expected case.

@aevyrie
Copy link
Member Author

aevyrie commented Jan 17, 2026

Thanks for testing in your project - that gives me way more confidence this is a positive change. I'm pretty happy with how the buffered channel turned out!

@github-actions
Copy link
Contributor

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

@aevyrie aevyrie force-pushed the par-mesh-collection branch from b3cb07b to ae85d1f Compare January 17, 2026 02:00
@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen A-Tasks Tools for parallel and async work C-Performance A change motivated by improving speed, memory usage or compile times D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it X-Uncontroversial This work is generally agreed upon

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants