Skip to content

Add cudf table packing utilities with memory type support#843

Open
nirandaperera wants to merge 10 commits intorapidsai:mainfrom
nirandaperera:better_cudf_pack_usage2
Open

Add cudf table packing utilities with memory type support#843
nirandaperera wants to merge 10 commits intorapidsai:mainfrom
nirandaperera:better_cudf_pack_usage2

Conversation

@nirandaperera
Copy link
Contributor

@nirandaperera nirandaperera commented Feb 3, 2026

Introduces pack.hpp and pack.cpp with utilities for packing cudf tables into contiguous buffers with explicit memory type control. This enables packing directly to device, pinned host, or host memory using the rapidsmpf memory reservation system.

Depends on #840

Changes

New files:

  • cpp/include/rapidsmpf/integrations/cudf/pack.hpp
  • cpp/src/integrations/cudf/pack.cpp
  • cpp/tests/test_cudf_pack.cpp

API:

Function Description
detail::pack<MemoryType>() Templated pack with compile-time memory type
pack() Runtime dispatch based on reservation's memory type
chunked_pack() Uses cudf::chunked_pack with a bounce buffer for memory-efficient packing to host

Refactored:

  • TableChunk::copy() now uses detail::pack<> for host memory destinations

@nirandaperera nirandaperera requested review from a team as code owners February 3, 2026 23:41
@nirandaperera nirandaperera marked this pull request as draft February 3, 2026 23:41
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera force-pushed the better_cudf_pack_usage2 branch from 7dc13db to bdf915a Compare February 5, 2026 18:52
@nirandaperera nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 5, 2026
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
@nirandaperera nirandaperera marked this pull request as ready for review February 6, 2026 16:00
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned that the logic here is more complicated than it needs to be.

Do we really need special case code for tables less than 1MiB in size, for example?

Comment on lines +322 to +323
* @note If the @p data is empty, the resulting Buffer will be DEVICE memory type.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this an important property of the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was coming from cuda::is_host_accessible check. It was returning true for empty rmm::device_buffers.
I have replaced this with a new move for Host buffers. So, this will not be relevant anymore.

Comment on lines +312 to +316
* @brief Move rmm::device_buffer (resides in device or pinned host memory) into a
* Buffer.
*
* This operation is cheap; no copy is performed. The resulting Buffer resides in
* device memory.
* device memory or pinned host memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why do we need to document the possible locations of the device_buffer? This is also slightly a lie, right? If the buffer was allocated with a managed memory resource then it might physically be located in host memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to a separate method. So, this is not relevant now.

Comment on lines +71 to +82
/**
* @brief Get the memory types that are device accessible.
*
* @return A span of memory types that are device accessible.
*/
constexpr std::span<MemoryType const> device_accessible_memory_types() noexcept {
return std::span{MEMORY_TYPES}.first<2>();
}

static_assert(std::ranges::equal(
device_accessible_memory_types(),
std::array{MemoryType::DEVICE, MemoryType::PINNED_HOST}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why not just return a std::array of the two memory types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure that if we meddle with MEMORY_TYPES array in the future, we fail statically if somethings not right.

Comment on lines +116 to +123
cudf::chunked_pack cpack(
table, cudf_chunked_pack_min_buffer_size, stream, br->device_mr()
);
RAPIDSMPF_EXPECTS(
cpack.get_total_contiguous_size() == 0,
"empty table should have 0 contiguous size",
std::runtime_error
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cudf::pack_metadata

auto const est_size = std::max(raw_est_size, cudf_chunked_pack_min_buffer_size);

// max available memory for bounce buffer
auto max_availble = [](size_t res_sz, size_t ob) -> size_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

* 7. If all attempts fail, throw an error.
*/
template <>
std::unique_ptr<PackedData> pack<MemoryType::HOST>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no code sharing between the different templated pack functions. So I don't think it is worth it adding templates. Let's just call these pack_device, pack_pinned, pack_host.

Or, find ways of sharing code (e.g. above for pinned and device).

Comment on lines +176 to +177

if (br->is_pinned_memory_available()) { // try pinned memory as fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (br->is_pinned_memory_available()) { // try pinned memory as fallback
if (br->is_pinned_memory_available()) {

dev_res.clear();
return chunked_pack(table, stream, bounce_buffer, br->device_mr(), reservation);
}
dev_res.clear(); // Release unused device reservation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a scoped block and not have to do this manually.

auto dev_avail = max_availble(dev_res.size(), dev_ob);
if (dev_avail >= cudf_chunked_pack_min_buffer_size) {
rmm::device_buffer bounce_buffer(dev_avail, stream, br->device_mr());
dev_res.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will happen at the end of the block on the next line, or you could use br->allocate with your reservation and then get the device buffer out I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

br->allocate gives a Buffer. So, I'd have to move that again. Felt like that was too verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants