Enable NUMA-aware GPU worker binding, config updates after fixes by karthikeyann · Pull Request #271 · rapidsai/velox-testing

karthikeyann · 2026-03-13T04:25:37Z

Optimizes multi-GPU Presto worker performance through NUMA-aware process pinning, cuDF batching tuning, and enabling AUTOMATIC join distribution.

NUMA-aware GPU binding

Rewrote launch_presto_servers.sh to auto-detect the closest NUMA node per GPU via nvidia-smi topo and pin workers using numactl.
Installed numactl in the Docker image, granted containers privileged: true, and mounted /sys/devices/system/node for topology visibility.

cuDF performance tuning

Added to config_native.properties:

cudf.jit_expression_enabled=false — avoids JIT warmup penalty across workers.
cudf.intra_node_exchange=true — required for UCX to use NVLink.
cudf.partitioned_output_batch_rows=100000000 — reduces exchange overhead.
cudf.concat_optimization_enabled=true / cudf.batch_size_min_threshold=100000000 — enables rebatching before aggregations.

Join distribution

Removed forced PARTITIONED override in generate_presto_config.sh. UCX exchange now supports BROADCAST, so the optimizer can choose automatically.

Bug fix

Fixed start_presto_helper.sh to only set GPU_WORKER_SERVICE when SINGLE_CONTAINER=false.

Test Plan

Verify NUMA binding for single-GPU and multi-GPU deployments (check worker logs)
Run TPC-H/TPC-DS at SF1K+ — no regressions
Confirm AUTOMATIC join distribution selects BROADCAST where beneficial (EXPLAIN plans)
Validate batching settings don't cause OOM on large queries

presto/scripts/generate_presto_config.sh

karthikeyann added 4 commits March 12, 2026 00:26

batching, join-distribution=AUTOMATIC

80bad87

Enable NUMA binding

2841690

fix build service for single container

f2be0cb

update numa memory bind

744f4ed

karthikeyann requested review from GregoryKimball, devavret, misiugodfrey and quasiben March 13, 2026 04:26

devavret reviewed Mar 13, 2026

View reviewed changes

presto/scripts/generate_presto_config.sh Outdated Show resolved Hide resolved

devavret approved these changes Mar 13, 2026

View reviewed changes

remove join-distribution-type editing in multi-workers config.

6ce40e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable NUMA-aware GPU worker binding, config updates after fixes #271

Enable NUMA-aware GPU worker binding, config updates after fixes #271
karthikeyann wants to merge 5 commits intomainfrom
GTC_2026

karthikeyann commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karthikeyann commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NUMA-aware GPU binding

cuDF performance tuning

Join distribution

Bug fix

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karthikeyann commented Mar 13, 2026 •

edited

Loading