-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[WebGPU] Plug memory leaks and free resources on shutdown #19315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WebGPU] Plug memory leaks and free resources on shutdown #19315
Conversation
| ggml_webgpu_processed_shader result; | ||
| result.wgsl = preprocessor.preprocess(shader_src, defines); | ||
| result.variant = variant; | ||
| ggml_webgpu_flash_attn_shader_decisions * decisions = new ggml_webgpu_flash_attn_shader_decisions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this into a shared_ptr because this was leaking on the heap since we never deallocated it.
| this->get_tensor_staging_buf.Destroy(); | ||
| #ifdef GGML_WEBGPU_DEBUG | ||
| debug_host_buf.Destroy(); | ||
| debug_dev_buf.Destroy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe other wgpu members, like Instance, Device, Queue, and Pipeline, are refcounted and delete automatically when all references are deleted. But Buffers need to be explicitly destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also since webgpu_global_context is a shared_ptr, its destructor is automatically called once all references to it are deleted.
| #endif | ||
|
|
||
| delete ctx; | ||
| delete backend; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are both allocated on the heap and leak if we don't delete them. Maybe we could turn them into shared_ptr but I don't know how it would behave once the pointer is passed around in the ggml lifecycle.
This diff frees
wgpu::Buffers and in buffer pools on shutdown to prevent memory leaks on GPU. It also fixes memory leaks on the heap, where we allocatebackend,backend_ctx,buffer_ctx, anddecisionson the heap but never delete them. These are either explicitly deleted (wrt ggml lifecycle) or changed to be smart pointers.We implement destructors for our buffer pool structs,
webgpu_contextstruct andwebgpu_global_contextstruct. Sincewebgpu_global_contextis a refcounted smart pointer, it will destruct automatically when all thread contexts have been destroyed.We call free on all the buffers we allocate, and we explicitly free our buffer pools and debug/error/staging buffers.
Also, since we explicitly wait on all our callbacks, we do not have to worry about waiting for callbacks while shutting down.
Memory leak on the heap before.
No memory leak on the heap after.