-
Notifications
You must be signed in to change notification settings - Fork 242
Cythonize _module.py #1520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Cythonize _module.py #1520
Conversation
|
/ok to test 82d92c9 |
|
|
/ok to test fde13ae |
|
/ok to test e9f2275 |
e9f2275 to
46056e6
Compare
|
/ok to test 46056e6 |
Convert Kernel, ObjectCode, and KernelOccupancy to cdef classes with proper .pxd declarations. This phase establishes the Cython structure while maintaining Python driver module usage. Changes: - Rename _module.py to _module.pyx - Create _module.pxd with cdef class declarations - Convert Kernel, ObjectCode, KernelOccupancy to cdef class - Remove _backend dict in favor of direct driver calls - Add _init_py() Python-accessible factory for ObjectCode - Update _program.py and _linker.py to use _init_py() - Fix test to handle cdef class property descriptors Phase 2b will convert driver calls to cydriver with nogil blocks. Phase 2c will add RAII handles to resource_handles.
- Use strong types in .pxd (ObjectCode, KernelOccupancy) - Remove cdef public - attributes now private to C level - Add Kernel.handle property for external access - Add ObjectCode.symbol_mapping property (symmetric with input) - Update _launcher.pyx, _linker.py, tests to use public APIs
- Module globals: _inited, _py_major_ver, _py_minor_ver, _driver_ver, _kernel_ctypes, _paraminfo_supported -> cdef typed - Module functions: _lazy_init, _get_py_major_ver, _get_py_minor_ver, _get_driver_ver, _get_kernel_ctypes, _is_paraminfo_supported, _make_dummy_library_handle -> cdef inline with exception specs - Module constant: _supported_code_type -> cdef tuple - Kernel._get_arguments_info -> cdef tuple Note: KernelAttributes remains a regular Python class due to segfaults when converted to cdef class (likely due to weakref interaction with cdef class properties).
Follow the _MemPoolAttributes pattern: - cdef class with inline cdef attributes (_kernel_weakref, _cache) - _init as @classmethod (not @staticmethod cdef) - _get_cached_attribute and _resolve_device_id use except? -1 - Explicit cast when dereferencing weakref
Extends the RAII handle system to support CUlibrary and CUkernel driver objects used in _module.pyx. This provides automatic lifetime management and proper cleanup for library and kernel handles. Changes: - Add LibraryHandle/KernelHandle types with factory functions - Update Kernel, ObjectCode, KernelOccupancy to use typed handles - Move KernelAttributes cdef block to .pxd for strong typing - Update _launcher.pyx to access kernel handle directly via cdef
Replaces Python-level driver API calls with low-level cydriver calls wrapped in nogil blocks for improved performance. This allows the GIL to be released during CUDA driver operations. Changes: - cuDriverGetVersion, cuKernelGetAttribute, cuKernelGetParamInfo - cuOccupancy* functions (with appropriate GIL handling for callbacks) - cuKernelGetLibrary - Update KernelAttributes._get_cached_attribute to use cydriver types
Remove type annotation from handle parameter to prevent Cython's automatic float-to-int coercion, which caused a segmentation fault. The manual isinstance check properly validates all non-int types.
- Change ObjectCode._init from cdef to @classmethod def, matching the pattern used by Buffer, Stream, Graph, and other classes - Remove _init_py wrapper (no longer needed) - Update callers in _program.py and _linker.py - Add test_kernel_keeps_library_alive to verify that a Kernel keeps its underlying library alive after ObjectCode goes out of scope
- Remove Kernel._module (ObjectCode reference no longer needed since KernelHandle keeps library alive via LibraryHandle dependency) - Simplify Kernel._from_obj signature (remove unused ObjectCode param) - KernelAttributes: store KernelHandle instead of weakref to Kernel - Rename get_kernel_from_library to create_kernel_handle for consistency - Remove fragile annotation introspection from test_saxpy_arguments
46056e6 to
8053ee5
Compare
|
/ok to test 8053ee5 |
Replace weakref pattern with direct MemoryPoolHandle storage in _MemPoolAttributes. The handle's shared_ptr keeps the underlying pool alive, so attributes remain accessible after the MR is deleted. Note: _MemPool retains __weakref__ because the IPC subsystem uses WeakValueDictionary to track memory resources across processes.
Zero-initialize CUlaunchConfig struct to prevent garbage values in hStream field when no stream is provided. The driver dereferences hStream even when querying occupancy, causing access violations on some platforms (observed on Windows with RTX Pro 6000).
|
/ok to test be74127 |
| return h ? *h : 0; | ||
| } | ||
|
|
||
| inline CUlibrary as_cu(const LibraryHandle& h) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Taking a LibraryHandle by const-ref feels wrong to me. My expectation is that a LibraryHandle should be map to cheap to copy type so pass-by-const-ref s
wouldn't be necessary or even be a regression as it would copy a larger pointer value.
Perhaps we could have a "handle type" be an index into a shared_ptr registry. That would enable you to do the cast operation to an object (eg. to_cu) from the handle type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure there’s much I can really do here. LibraryHandle is a std::shared_ptr. We went through a lot of design iteration on the resource handles and ended up with this implementation. It seems like the real concern is around what the term “Handle” (as opposed to something like “Holder”) implies to different people, but I don’t think we should make substantive implementation changes based on that alone.
The const & here avoids two unnecessary atomic operations, which is strictly better with no downsides. There’s no change from the cuda.core developer’s perspective (as_cu is used exactly the same way). This function is an internal implementation detail, invisible to end users, and something we can modify at any time — it doesn’t paint us into a corner.
Adding another level of indirection through a table of std::shared_ptrs seems like unnecessary complexity. How would we even modify that table? We’d just end up needing another level of locks. One of the nice things about std::shared_ptr is that it already handles everything cleanly; i.e., it works well right out of the box.
rparolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally lgtm, just added the 1 nit comment about the "handle type".
| cdef class _MemPoolAttributes: | ||
| cdef: | ||
| MemoryPoolHandle _h_pool | ||
|
|
||
| @staticmethod | ||
| cdef _MemPoolAttributes _init(MemoryPoolHandle h_pool) | ||
|
|
||
| cdef int _getattribute(self, cydriver.CUmemPool_attribute attr_enum, void* value) except? -1 | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to _MemPoolAttributes follow a pattern set in the change to KernelAttributes. We no longer need to rely on weak references to share ownership with an "associated" class, as handle classes can be used instead.
| return h ? *h : 0; | ||
| } | ||
|
|
||
| inline CUlibrary as_cu(const LibraryHandle& h) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure there’s much I can really do here. LibraryHandle is a std::shared_ptr. We went through a lot of design iteration on the resource handles and ended up with this implementation. It seems like the real concern is around what the term “Handle” (as opposed to something like “Holder”) implies to different people, but I don’t think we should make substantive implementation changes based on that alone.
The const & here avoids two unnecessary atomic operations, which is strictly better with no downsides. There’s no change from the cuda.core developer’s perspective (as_cu is used exactly the same way). This function is an internal implementation detail, invisible to end users, and something we can modify at any time — it doesn’t paint us into a corner.
Adding another level of indirection through a table of std::shared_ptrs seems like unnecessary complexity. How would we even modify that table? We’d just end up needing another level of locks. One of the nice things about std::shared_ptr is that it already handles everything cleanly; i.e., it works well right out of the box.
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
This change made me realize that the resource_handles design creates a dependency aggregation hot spot (all the dependencies of Stream, Context, etc. are forced together). But maybe that's OK in this case, assuming the number of cuStream etc. types is small-ish and will never expand by a lot? And they are probably highly dependent on each other already? — This is just to log the observation.
|
/ok to test e492d54 |
This is a really good observation, and it actually reflects an intentional design choice, even though we didn’t explicitly discuss it at the time. The cost you’re pointing out is real: these files do become dependency hot spots. The upside is that it becomes much easier to introduce changes that apply uniformly to all handles. For example, we could enable logging whenever a handle is created or destroyed, or track handle operations to generate resource-leak reports. Those kinds of features operate over handles in a generic way and can share machinery (classes, macros, etc.), so having all the handles in one place helps. Like any design decision, it comes with trade-offs, but in this case the benefits seemed to outweigh the costs. |
Summary
Converts
_module.pyto Cython (_module.pyx) for improved performance, adding RAII-based resource handle management forCUkernelandCUlibrarydriver objects.Kernel,ObjectCode,KernelOccupancy, andKernelAttributestocdef classLibraryHandleandKernelHandleto the resource_handles C++ infrastructurecydrivercalls wrapped innogilblocks.pxdfile for cross-modulecimportsupportChanges
_module.py→_module.pyxwithcdef classdefinitions_module.pxdwith typed attribute and method declarations_cpp/resource_handles.{hpp,cpp}with library/kernel handle types_resource_handles.{pxd,pyx}with new handle functions_launcher.pyxto directly access kernel handles viacimport_linker.py,_program.pyto use new factory methodsTest plan
test_module.pytests passtest_program.pytests pass