Skip to content

Conversation

@nikhilJain17
Copy link
Contributor

This diff frees wgpu::Buffers and in buffer pools on shutdown to prevent memory leaks on GPU. It also fixes memory leaks on the heap, where we allocate backend, backend_ctx, buffer_ctx, and decisions on the heap but never delete them. These are either explicitly deleted (wrt ggml lifecycle) or changed to be smart pointers.

We implement destructors for our buffer pool structs, webgpu_context struct and webgpu_global_context struct. Since webgpu_global_context is a refcounted smart pointer, it will destruct automatically when all thread contexts have been destroyed.

Screenshot 2026-02-03 at 3 56 11 PM

We call free on all the buffers we allocate, and we explicitly free our buffer pools and debug/error/staging buffers.
Also, since we explicitly wait on all our callbacks, we do not have to worry about waiting for callbacks while shutting down.

Screenshot 2026-02-03 at 8 14 21 PM

Memory leak on the heap before.

Screenshot 2026-02-03 at 8 15 30 PM

No memory leak on the heap after.

ggml_webgpu_processed_shader result;
result.wgsl = preprocessor.preprocess(shader_src, defines);
result.variant = variant;
ggml_webgpu_flash_attn_shader_decisions * decisions = new ggml_webgpu_flash_attn_shader_decisions();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this into a shared_ptr because this was leaking on the heap since we never deallocated it.

this->get_tensor_staging_buf.Destroy();
#ifdef GGML_WEBGPU_DEBUG
debug_host_buf.Destroy();
debug_dev_buf.Destroy();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe other wgpu members, like Instance, Device, Queue, and Pipeline, are refcounted and delete automatically when all references are deleted. But Buffers need to be explicitly destroyed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since webgpu_global_context is a shared_ptr, its destructor is automatically called once all references to it are deleted.

#endif

delete ctx;
delete backend;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are both allocated on the heap and leak if we don't delete them. Maybe we could turn them into shared_ptr but I don't know how it would behave once the pointer is passed around in the ggml lifecycle.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant