Add benchmark configuration output to support posting to DB by misiugodfrey · Pull Request #239 · rapidsai/velox-testing

misiugodfrey · 2026-02-16T18:09:23Z

This PR consolidates benchmark result reporting, unifies engine/GPU detection across Docker and SLURM environments, and refactors shared test infrastructure into a common module. This is for the purpose of providing an interface to automatically post benchmark results to the benchmarking DB via the post_results.py script.

Run context collection (run_context.py):
Automatically gathers engine type, GPU model, scale factor, worker count, and num_drivers at benchmark time and writes them into the "context" section of benchmark_result.json, which can then be parsed as part of post_results.py. E.g.:

  "context": {
    "iterations_count": 5,
    "schema_name": "sf1",
    "scale_factor": 1,
    "n_workers": 1,
    "kind": "single-node",
    "engine": "java",
    "gpu_count": 0,
    "gpu_name": "NA",
    "num_drivers": 32,
    "execution_number": 1,
    "timestamp": "2026-03-04T23:51:21Z",
    "benchmark": "tpch"
  },

Unified engine and GPU detection:
It was necessary to gather information about a running cluster to put into the benchmark context. To that end some changes were necessary to provide the appropriate information at runtime (rather than trying to predict context from things like image names) as well as to unify how we present this information across the docker and slurm scripts.
Engine detection via cluster-tag: All variants (GPU, CPU, Java) now consistently set cluster-tag in the coordinator config. run_context.py queries the Presto /v1/cluster API to determine the engine type, replacing the previous approach of parsing Docker image names or SLURM logs.
GPU model detection via worker logs: Docker worker containers now run nvidia-smi -L at startup and write output to worker_logs/worker_.log, matching the existing SLURM behavior. run_context.py reads these logs uniformly regardless of the deployment environment.
launch_presto_servers.sh rewritten to capture nvidia-smi output and redirect presto_server stdout/stderr to per-worker log files. Mounted as a read-only volume so changes don't require image rebuilds.
worker_logs volume mount added to all worker services (GPU, CPU, Java) in Docker Compose files.

TomAugspurger

The output sample you shared looks good to me.

paul-aiyedun

The overall idea makes sense to me. However, I had questions about some of the implementation details, Slurm specific logic, and assumptions being made.

Also, please update the PR title to reflect this update.

paul-aiyedun · 2026-02-17T18:16:39Z

presto/docker/config/template/etc_worker/config_native.properties

 coordinator=false
 # Worker REST/HTTP port for internal and admin endpoints.
-http-server.http.port=8080
+http-server.http.port=10000


Is this an intentional update?

Yes. The worker's default port should not be the same as the coordinator's port. This change makes things more clear and the defaults are in sync with both the multi-worker case and the slurm case (where these values will be overwritten).

Without this change, you can't access the worker node's API for single worker sessions, which can still be necessary to get some information.

After our discussion, I've reverted this change; although I may need to make further changes once testing can resume on slurm clusters.

paul-aiyedun · 2026-02-17T18:19:25Z

presto/scripts/run_benchmark.sh

    -m, --metrics           Collect detailed metrics from Presto REST API after each query.
                            Metrics are stored in query-specific directories.

+ENVIRONMENT:


Can we use command line arguments instead of environment variables?

I've only left the DEBUG option as it would be tedious to pass that through the entire chain of scripts. If we want to change that further, we can, but I think this is cleaner.

What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.

run_benchmark.sh is the top-level script, but the PRESTO_BENCHMARK_DEBUG context is used by the python scripts that are called by run_benchmark.sh.

As for LOGS, this value is supposed to reflect the hard-coded directory structure that the docker containers mount (although it can be changed in the slurm context). For now I'm removing this comment because it should not be changable unless we can also change it in the docker templates. Easier to just have one hard-coded path for now.

paul-aiyedun · 2026-02-17T18:19:48Z

presto/scripts/run_benchmark.sh

+    PRESTO_BENCHMARK_DEBUG   Set to 1 to print debug logs for worker/engine detection
+                             (e.g. node URIs, reachability, metrics, Docker containers).
+                             Use when engine is misdetected or the run fails.
+    Docker                  In Docker setups, engine is inferred from running worker


Can you please expand on this?

What's happening here is that some of the details of the run are inferred based on the name of the image that is running. If presto-native-worker-gpu is running, then it infers that this is a velox-gpu run. If docker is not running, then it checks if it's in a Slurm environment, in which case it uses other techniques based on how the slurm scripts work.

To clarify, this isn't an environment variable, just a note that the environment in which it is running will affect this path. I can make the comments more clear.

paul-aiyedun · 2026-02-17T18:21:21Z

common/testing/performance_benchmarks/benchmark_keys.py

@@ -32,3 +32,12 @@ class BenchmarkKeys(str, Enum):
    CONTEXT_KEY = "context"
    ITERATIONS_COUNT_KEY = "iterations_count"
    SCHEMA_NAME_KEY = "schema_name"
+    SCALE_FACTOR_KEY = "scale_factor"


Can we copy over the metadata.json file from the dataset instead of duplicating some of the details here?

Do you mean copying the metadata.json file directly, or referencing the data inside of it? Right now the scale factor can be overridden in our options when running benchmarks, so we either take the override if provided, or read the scale factor from the metadata.json.
I've removed this key since it is no longer used. The code to extract the scale factor is in run_context.py:get_scale_factor_from_schema()

paul-aiyedun · 2026-02-17T18:24:32Z

presto/testing/performance_benchmarks/conftest.py

+        "benchmark": benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types,
+        **run_config,
+    }
+    with open(f"{bench_output_dir}/benchmark_config.json", "w") as file:


The existing JSON file has a "context" field. Can we extend that instead of writing to a new file?

Sounds good.

paul-aiyedun · 2026-02-17T18:59:34Z

presto/testing/performance_benchmarks/run_context.py

+
+def get_gpu_name_from_slurm_logs() -> str | None:
+    """
+    When running under SLURM, workers run nvidia-smi -L and write to LOGS/worker_<id>.log.


If there is a need to log the output of nvidia-smi, then we should probably do this consistently (and not just for slurm).

The current approach get the gpu details by running nvidia-smi in both contexts, it's just that in the docker context we run it on the host, whereas in the slurm context, the login-nodes don't have nvidia-smi installed, so you need to run it on the worker node (i.e. in the worker's container).

We could potentially try to consolidate this, but it's more an issue of how the slurm clusters' encapsulate things that required extra steps in that context.

paul-aiyedun · 2026-02-17T19:02:54Z

presto/testing/performance_benchmarks/run_context.py

+
+def get_engine_from_slurm() -> str | None:
+    """
+    Infer engine when running under SLURM from nvidia-smi -L output in LOGS/worker_0.log.


Can we standardize how this is done instead of having a slurm specific solution?

paul-aiyedun · 2026-02-17T19:03:19Z

presto/testing/performance_benchmarks/run_context.py

+
+def get_gpu_name() -> str | None:
+    """
+    Return GPU model name. Under SLURM, read from LOGS/worker_<id>.log if LOGS is set;


Same comment as above.

paul-aiyedun · 2026-02-17T19:09:02Z

presto/testing/performance_benchmarks/run_context.py

+
+def get_worker_image() -> str | None:
+    """Return worker image name from env (set by cluster/container setup)."""
+    return os.environ.get("WORKER_IMAGE")


Where is this set?

This is only set in the slurm scripts. I'm removing this section since it doesn't make much sense to specify a WORKER_IMAGE name since we are detecting the engine/variant through other means.

paul-aiyedun · 2026-02-17T19:22:26Z

presto/testing/performance_benchmarks/run_context.py

+    return os.environ.get("WORKER_IMAGE")
+
+
+def _current_username() -> str:


We plan on adding the ability to run presto with existing images. I don't think we should make assumptions about image/tag names.

I've changed this to determine the engine/variant through the presto API, so this is all removed now.

… misiug/benchmarkjsonconfig

misiugodfrey · 2026-03-05T00:40:43Z

presto/docker/docker-compose/template/docker-compose.native-gpu.yml.jinja

    - ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties
    - ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
+    - ../../worker_logs:/opt/presto-server/logs
+    - ../../launch_presto_servers.sh:/opt/launch_presto_servers.sh:ro


Mounting this file was necessary so that older images can still benefit from the new version.

TomAugspurger

The changes to post_results.py look good.

paul-aiyedun

Changes overall look good to me. However, I had a few questions, code cleanup comments, and comments about logging.

paul-aiyedun · 2026-03-06T18:45:13Z

benchmark_reporting_tools/post_results.py

-    └── result_dir
-        └── benchmark_result.json
+    └── logs                     # optional
+        └── slurm-4575179.out


Nit: Should this still be Slurm specific?

paul-aiyedun · 2026-03-06T18:53:45Z

common/testing/performance_benchmarks/benchmark_keys.py

    SCHEMA_NAME_KEY = "schema_name"
+    # Run configuration (from run context; written to context in benchmark_result.json)
+    TIMESTAMP_KEY = "timestamp"
+    NUM_WORKERS_KEY = "n_workers"


Nit: Can this be worker_count (consistent with other keys)?

paul-aiyedun · 2026-03-06T19:34:24Z

presto/scripts/run_benchmark.sh

    -m, --metrics           Collect detailed metrics from Presto REST API after each query.
                            Metrics are stored in query-specific directories.

+ENVIRONMENT:


What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.

paul-aiyedun · 2026-03-06T19:46:35Z

presto/docker/docker-compose/template/docker-compose.native-gpu.yml.jinja

    - ../../config/generated/gpu/etc_worker/node.properties:/opt/presto-server/etc/node.properties
    - ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties
    - ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
+    - ../../worker_logs:/opt/presto-server/logs


Is there a reason for limiting this to just worker logs?

paul-aiyedun · 2026-03-06T19:50:44Z

presto/docker/docker-compose/template/docker-compose.native-gpu.yml.jinja

      - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/node.properties:/opt/presto-server/etc/node.properties
      - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/config_native.properties:/opt/presto-server/etc/config.properties
      - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties
+      - ../../worker_logs:/opt/presto-server/logs


Instead of duplicating this update in all the variant docker-compose files, can this be set in docker-compose.common.yml?

I've moved this to presto-base-volumes so that it is shared now.

paul-aiyedun · 2026-03-06T20:15:16Z

presto/testing/performance_benchmarks/run_context.py

+                pass
+
+
+def _parse_gpu_name_from_text(line: str) -> str | None:


Can we avoid this by directly printing the GPU name i.e. nvidia-smi --query-gpu=name --format=csv,noheader?

We could do this, but keep in mind that this needs to be run in the container and therefore the docker logs would still need to be parsed. If we use the above suggestion, it would be less obvious that we are getting nvidia-smi output (No GPU: anchor) and we would have to assume that the first line would give us this information.
I think it may be cleaner to use the nvidia-smi -L output as it's clear what that is, and if it's missing, the parse will fail, rather than assuming the first line of the logs will be a gpu name.

paul-aiyedun · 2026-03-06T20:22:52Z

presto/testing/performance_benchmarks/run_context.py

+}
+
+
+def _get_num_drivers(engine: str) -> int | None:


Can we also get this from the logs?

paul-aiyedun · 2026-03-06T20:23:49Z

presto/testing/performance_benchmarks/run_context.py

+
+    if n_workers is not None:
+        ctx["n_workers"] = n_workers
+        ctx["kind"] = "single-node" if n_workers == 1 else f"{n_workers}-node"


Out of curiosity, what is the purpose of the kind field?

It is one of the columns specified in the database. I believe the intention is to separate single-node from multi-node runs. With the information we are providing, I think it is a little redundant (since we have num_workers), but may be more necessary for other platforms.

paul-aiyedun · 2026-03-06T20:26:09Z

presto/testing/performance_benchmarks/conftest.py

+    return ctx
+
+
+def pytest_sessionfinish(session, exitstatus):


Is this duplicating logic in

velox-testing/common/testing/performance_benchmarks/conftest.py

Line 108 in f8d9835

def pytest_sessionfinish(session, exitstatus):

?

paul-aiyedun · 2026-03-06T20:34:07Z

benchmark_reporting_tools/post_results.py

@@ -15,27 +15,28 @@
 expected directory structure is:

    ../benchmark-root/


What script is creating a directory with this structure?

This occurs in conftest.py:pytest_sessionfinish(). The top level script would just be run_benchmark.sh

… misiug/benchmarkjsonconfig

paul-aiyedun

I had a couple of comments, but changes look good to me.

paul-aiyedun · 2026-03-13T18:07:02Z

common/testing/performance_benchmarks/conftest.py

+    if hasattr(session, "benchmark_results"):
+        benchmark_types = list(session.benchmark_results.keys())
+        json_result[BenchmarkKeys.CONTEXT_KEY]["benchmark"] = (
+            benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types


Is there a reason not to have the value be consistently a list?

paul-aiyedun · 2026-03-13T18:26:40Z

presto/testing/performance_benchmarks/conftest.py

+    hostname = session.config.getoption("--hostname")
+    port = session.config.getoption("--port")
+    user = session.config.getoption("--user")
+    schema_name = session.config.getoption("--schema-name")
+
+    ctx = gather_run_context(
+        hostname=hostname,
+        port=port,
+        user=user,
+        schema_name=schema_name,
+    )


These are presto specific options. I think we can set run_context in a presto specific fixture (similar to

velox-testing/common/testing/performance_benchmarks/common_fixtures.py

Line 11 in 3c09385

def benchmark_result_collector(request):

).

misiugodfrey added 3 commits February 16, 2026 09:34

Added context json for benchmark runs

470c0ca

debug multiple workers

50e3ec6

fix node count

0e72c33

misiugodfrey requested review from TomAugspurger and paul-aiyedun February 16, 2026 18:09

TomAugspurger reviewed Feb 17, 2026

View reviewed changes

TomAugspurger mentioned this pull request Feb 17, 2026

Add script for posting nightly benchmark results #203

Merged

paul-aiyedun reviewed Feb 17, 2026

View reviewed changes

misiugodfrey added 5 commits March 3, 2026 18:18

Merge branch 'main' of https://github.com/rapidsai/velox-testing into…

733068e

… misiug/benchmarkjsonconfig

fix import

ee385c2

change benchmark.json to benchmark_config.json

9121f75

add num_drivers to context

e8bda2a

remove result_dir expectation

2ba4ffc

misiugodfrey changed the title ~~Misiug/benchmarkjsonconfig~~ Add benchmark configuration output to support posting to DB Mar 4, 2026

misiugodfrey added 4 commits March 4, 2026 15:19

major refactor

4741a1b

address minor issues

a7d365c

More minor fixes

951b04b

pre-commit

b6e633a

misiugodfrey requested review from TomAugspurger and paul-aiyedun March 5, 2026 00:16

remove worker_image from context

a96a1f5

misiugodfrey requested a review from mattgara March 5, 2026 00:34

misiugodfrey commented Mar 5, 2026

View reviewed changes

TomAugspurger approved these changes Mar 5, 2026

View reviewed changes

misiugodfrey added 2 commits March 5, 2026 13:14

remove WORKER_IMAGE_KEY

db14057

Changed engine names to include 'presto'

ce11e7a

paul-aiyedun reviewed Mar 6, 2026

View reviewed changes

PR feedback

289dcd3

misiugodfrey requested a review from paul-aiyedun March 9, 2026 19:49

More PR feedback

e5aae73

misiugodfrey added 7 commits March 9, 2026 16:13

Merge branch 'main' of https://github.com/rapidsai/velox-testing into…

1e256f2

… misiug/benchmarkjsonconfig

PR updates

44b6cc0

self review

961ed8c

Refactor logging

8c8d608

updated comments

73eddc7

fixed java path

abd0cd6

Fixed cpu logging

b0695af

paul-aiyedun approved these changes Mar 13, 2026

View reviewed changes

Final PR update

dcf83b7

misiugodfrey requested a review from a team as a code owner March 14, 2026 00:47

gpu vs node count

ace9f21

		return os.environ.get("WORKER_IMAGE")


		def _current_username() -> str:

		@@ -15,27 +15,28 @@
		expected directory structure is:

		../benchmark-root/

Conversation

misiugodfrey commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misiugodfrey Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misiugodfrey Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misiugodfrey commented Feb 16, 2026 •

edited

Loading

misiugodfrey Mar 4, 2026 •

edited

Loading

misiugodfrey Mar 4, 2026 •

edited

Loading