Skip to content

Support multi-threaded decompression in the GPU shuffle reader #11779

@marin-ma

Description

@marin-ma

Description

The shuffle read process includes data fetching (java) + decompression and deserialization (native). Currently, the GPU shuffle reading process can be blocked by a global GPU lock. Using multiple threads to read the shuffle streams and do the decompression work can accelerate the shuffle read process.

The asynchronous in the native shuffle read only parallelizes decompression and deserialization. The timing of data fetching still depends on when the Velox pipeline triggers the shuffle read.

Gluten version

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions