Skip to content

Add torch.cuda.synchronize() before empty_cache() to fix Blackwell crash#116

Open
cuzelac wants to merge 1 commit intovisualbruno:mainfrom
cuzelac:fix/sync-before-empty-cache
Open

Add torch.cuda.synchronize() before empty_cache() to fix Blackwell crash#116
cuzelac wants to merge 1 commit intovisualbruno:mainfrom
cuzelac:fix/sync-before-empty-cache

Conversation

@cuzelac
Copy link

@cuzelac cuzelac commented Mar 8, 2026

Summary

torch.cuda.empty_cache() can free GPU memory that still has pending async work, causing CUDA error: illegal memory access on Blackwell GPUs (RTX 5090, sm_120). Adding torch.cuda.synchronize() before each empty_cache() call ensures all GPU work completes before memory is released.

Related: CuMesh also needs stream-awareness fixes — see visualbruno/CuMesh#2. Both fixes are required; neither alone is sufficient.

Why this is needed

torch.cuda.empty_cache() releases unused cached memory from PyTorch's allocator back to the GPU. However, it does not synchronize pending GPU operations first. On Blackwell GPUs, where PyTorch uses cudaStreamNonBlocking streams, this can race with in-flight kernels that are still using that memory.

Changes

  • trellis2/pipelines/trellis2_image_to_3d.py — add torch.cuda.synchronize() before all torch.cuda.empty_cache() calls in the pipeline
  • trellis2/models/sc_vaes/sparse_unet_vae.py — add torch.cuda.synchronize() before torch.cuda.empty_cache() in the VAE decoder

Testing

  • Tested on RTX 5090 (sm_120), PyTorch 2.10.0+cu130, CUDA 13.0, Windows
  • Full Trellis2 image-to-3D pipeline including mesh generation, refinement, and texturing
  • Verified this fix is independently necessary: reverting it (with CuMesh fix in place) reproduces the crash

torch.cuda.empty_cache() does not synchronize — it can free memory
that in-flight async kernels are still using, causing "illegal memory
access" errors on high-concurrency GPUs (e.g. RTX 5090 / Blackwell).

Add synchronize() before each empty_cache() to ensure all GPU work
completes before memory is released.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant