Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Jan 14, 2026

Summary

  • Remove noexcept from C++ resource handle functions that can throw
  • Update Cython declarations to use nogil except+ for proper exception translation
  • Ensure all custom deleters are exception-safe

Changes

1. Remove incorrect noexcept from throwing functions

Functions that allocate memory (via new, std::make_shared, or container operations) can throw std::bad_alloc. Marking them noexcept causes std::terminate on OOM instead of allowing graceful error handling.

Remove noexcept from 22 functions:

  • All create_* functions (context, stream, event, mempool, deviceptr)
  • All deviceptr_alloc* functions
  • get_primary_context, get_current_context, get_device_mempool
  • get_legacy_stream, get_per_thread_stream

Keep noexcept on 5 functions that truly cannot throw:

  • get_last_error, peek_last_error, clear_last_error
  • deallocation_stream, set_deallocation_stream

2. Update Cython declarations

Change from noexcept nogil to nogil except+ for all throwing functions, allowing Cython to translate C++ exceptions to Python exceptions.

3. Fix exception-unsafe deleters

Custom shared_ptr deleters must not throw (undefined behavior per C++ standard). Two deleters had potential issues:

  • clear_mempool_peer_access: Uses std::vector which can throw std::bad_alloc. Wrapped in try-catch.
  • IPC cache deleter: Uses std::lock_guard which can throw std::system_error. Wrapped cache operations in try-catch.

@Andy-Jost Andy-Jost added bug Something isn't working cuda.core Everything related to the cuda.core module labels Jan 14, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 14, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 4e70324

@Andy-Jost Andy-Jost marked this pull request as draft January 14, 2026 18:31
@github-actions

This comment has been minimized.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I did not scrutinize the case-by-case keep/remove decisions for the noexcepts. I assume an LLM was used to figure out those details.

@cpcloud
Copy link
Contributor

cpcloud commented Jan 14, 2026

Should we add some cpp linting now that we have a significant amount of code to maintain there?

Not in this PR of course :)

Functions that allocate memory (via new, std::make_shared, or container
operations) can throw std::bad_alloc. Marking them noexcept causes
std::terminate on OOM instead of allowing graceful error handling.

Changes:
- Remove noexcept from 22 C++ functions that can throw
- Keep noexcept on 5 functions that truly cannot throw:
  get_last_error, peek_last_error, clear_last_error,
  deallocation_stream, set_deallocation_stream
- Update Cython .pxd/.pyx to use "nogil except+" for throwing functions
- Wrap clear_mempool_peer_access in try-catch (std::vector can throw)
- Wrap IPC cache deleter operations in try-catch (std::lock_guard can throw)

This allows C++ exceptions to be translated to Python exceptions by
Cython, enabling proper error handling instead of process termination.

Based on feedback from Leo Fang, Lawrence Mitchell, and Vyas Ramasubramani.
@Andy-Jost Andy-Jost force-pushed the fix-noexcept-resource-handles branch from 4e70324 to 2e1a8dc Compare January 14, 2026 18:48
@Andy-Jost
Copy link
Contributor Author

Should we add some cpp linting now that we have a significant amount of code to maintain there?

Not in this PR of course :)

yes!

@Andy-Jost
Copy link
Contributor Author

/ok to test 2e1a8dc

@Andy-Jost Andy-Jost marked this pull request as ready for review January 14, 2026 18:53
@Andy-Jost Andy-Jost merged commit b8261e5 into NVIDIA:main Jan 14, 2026
80 checks passed
@Andy-Jost Andy-Jost deleted the fix-noexcept-resource-handles branch January 14, 2026 20:29
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang leofang added this to the cuda.core beta 12 milestone Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants