Add script for posting nightly benchmark results by TomAugspurger · Pull Request #203 · rapidsai/velox-testing

TomAugspurger · 2026-02-03T16:38:34Z

This adds a script for posting the results of a benchmark run to a database. Each run would result in

A new "QueryEngine" configuration, which identifies the exact software versions used to run the benchmark
A new "BenchmarkRun", which identifies the hardware, query engine, storage configuration, benchmark definition, query engine configuration, and more
Many new "QueryLogs", one per individual query iteration executed

Here's a sample of the document we'd post:

{
  "sku_name": "pdx-h100",
  "storage_configuration_name": "test",
  "benchmark_definition_name": "tpch-100",
  "cache_state": "warm",
  "query_engine": {
    "engine_name": "velox",
    "identifier_hash": "121e1e62b5426195",
    "version": "unknown",
    "commit_hash": "unknown"
  },
  "run_at": "2026-01-28T18:38:02+00:00",
  "node_count": 1,
  "query_logs": [
    {
      "query_name": "1",
      "execution_order": 0,
      "runtime_ms": 3982.0,
      "status": "success",
      "extra_info": {
        "execution_number": 1
      }
    },
    {
      "query_name": "1",
      "execution_order": 1,
      "runtime_ms": 1509.0,
      "status": "success",
      "extra_info": {
        "execution_number": 2
      }
      // repeated for each execution of each query
    }
  ],
  "concurrency_streams": 1,
  "engine_config": {
    "coordinator": {
      "coordinator": "true",
      "node-scheduler.include-coordinator": "false",
      "http-server.http.port": "9200",
      "discovery-server.enabled": "true",
      "discovery.uri": "http://gpu-h100-0017:9200"
      // ...
    },
    "worker": {
      "coordinator": "false",
      "http-server.http.port": "9000",
      "discovery.uri": "http://gpu-h100-0017:9200",
      "presto.version": "testversion",
      "system-memory-gb": "240",
      "query-memory-gb": "228",
      // ...
    }
  },
  "extra_info": {
    "kind": "single-node",
    "gpu_count": 1,
    "gpu_name": "H100",
    "num_drivers": 2,
    "worker_image": "presto-native-worker-gpu",
    "execution_number": 1
  },
  "is_official": false
}

benchmark_data_tools/post_results.py

benchmark_reporting_tools/post_results.py

benchmark_data_tools/post_results.py

misiugodfrey

Thanks for putting this together, Tom.

I think most of this should work as-is. The only issue is that we haven't standardized some of this expected output in upstream main yet (the output this is based on is from an experimental set of branches). This should be easy now that we have an upstream schema to store in.

I think we just need to standardize our output and then sync this on top.

benchmark_data_tools/post_results.py

misiugodfrey · 2026-02-03T23:37:08Z

benchmark_reporting_tools/post_results.py

+
+        Expects coordinator.config and worker.config files.
+        """
+        coordinator_config = parse_config_file(configs_dir / "coordinator.config")


In our presto environments these files are going to be in a configs directory under <configs>/etc_worker/config_native.properties and <configs>/etc_coordinator/config_native.properties

benchmark_data_tools/post_results.py

benchmark_reporting_tools/post_results.py

TomAugspurger · 2026-02-12T14:46:46Z

This is updated after #211 was changed to write these results to the benchmark_result.json file under theraw_times_ms key.

TomAugspurger · 2026-02-17T16:36:33Z

This should just be waiting on #239 and possibly #240, which will generate the benchmark.json and configs with some information we'd like to store.

After manually running those, we can post the results:

export BENCHMARK_API_URL=...
export BENCHMARK_API_KEY=...

uv run benchmark_data_tools/post_results.py \
	/mnt/data/toaugspurger/velox-testing/ \
	--sku-name coreweave-gb200-nvl72 \
	--storage-configuration-name coreweave-use13a-slurm-data-tpch-rs \
	--benchmark-definition-name tpch-rs-1000 \
	--cache-state warm

TomAugspurger · 2026-02-27T13:09:14Z

This should be good to go, at least as far as getting basic timings & status posted.

TomAugspurger · 2026-02-27T18:08:17Z

benchmark_reporting_tools/post_results.py

+# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# /// script


I'd recommend running this with uv run benchmark_data_tools/post_results.py and the dependencies will be automatically satisfied. But LMK if there's a spot I should add this dep (I didn't see any groups in https://github.com/rapidsai/velox-testing/blob/main/pyproject.toml).

paul-aiyedun

Changes overall look good to me. Although, I think there is a mismatch between the expected benchmark output and what is currently generated.

benchmark_reporting_tools/post_results.py

paul-aiyedun · 2026-02-27T20:38:07Z

benchmark_reporting_tools/post_results.py

+This script operates on the parsed output of the benchmark runner. The
+expected directory structure is:
+
+    ../benchmark-root/


The current benchmarking script only creates a benchmark_result.json file, so this expectation does not match what is implemented.

I was looking at

velox-testing/presto/slurm/presto-nvl72/functions.sh

Line 370 in 2a157b5

}' > ${OUTPUT_PREFIX}/benchmark.json

. You're saying that's not always available?

If not, I'll add CLI options and make the benchmark.json optional.

No, I think we need to consolidate some of the updates in that script (cc @misiugodfrey).

paul-aiyedun · 2026-02-27T20:42:54Z

benchmark_reporting_tools/post_results.py

+        help="Storage configuration name",
+    )
+    parser.add_argument(
+        "--cache-state",


Can you please expand on this?

It's part of the results schema currently, but I don't think it's well defined. At least it isn't clear to me :/

paul-aiyedun · 2026-02-27T20:43:55Z

benchmark_reporting_tools/post_results.py

+    parser.add_argument(
+        "--identifier-hash",
+        default=None,
+        help="Unique identifier hash for the query engine version",
+    )
+    parser.add_argument(
+        "--version",
+        default=None,
+        help="Version string for the query engine",
+    )


What is the difference between these two parameters?

Oops I messed up the description of identifier-hash. It's supposed to be a description of the entire software environment, something like the container image digest. I'll get that fixed, thanks.

Got it. Thanks.

TomAugspurger · 2026-02-27T21:35:56Z

I've added CLI options for all the stuff we would hope to get from #239. In general, I'd strongly recommend that we get those out of the values recorded by the actual benchmark execution. That way there's no possibility of accidentally passing one thing to the benchmark runner and something else to the thing recording the results.

Here's an example using those new flags:

uv run benchmark_reporting_tools/post_results.py data --sku-name h100 --storage-configuration-name storage --benchmark-name=benchmark --identifier-hash=idhash --dry-run --kind single --benchmark tpch --timestamp 2026-01-01T00:00:00 --n-workers=1 --scale-factor=1 --gpu-count=1 --gpu-name=h100 --num-drivers=1 --engine=name=velox
-cudf --cache-state=warm

Which successfully validates against the jsonschema for our benchmark submit request.

TomAugspurger · 2026-02-27T21:38:20Z

benchmark_reporting_tools/post_results.py

+
+    if not benchmark_json_path.exists():
+        missing_args = []
+        if kind is None:


Kinda verbose :/ but I'm hoping we can delete this once the benchmark runner records all this information.

TomAugspurger · 2026-02-27T21:46:19Z

@paul-aiyedun thanks for the review. I think that I've addressed everything.

mattgara

Overall looks good to me 👍

Thanks @TomAugspurger for working on this.

My only serious feedback would be to look at the one indentation (likely bug) that may really mess things up. Other than that just had a few comments.

mattgara · 2026-02-27T21:58:05Z

benchmark_reporting_tools/post_results.py

+            benchmark_metadata = BenchmarkMetadata.from_file(benchmark_dir / "benchmark.json")
+        except (ValueError, json.JSONDecodeError, FileNotFoundError) as e:
+            print(f"  Error loading metadata: {e}", file=sys.stderr)
+        return 1


I believe this return 1 should be inside the except block? This looks to be the pattern below. ⬇️

Maybe it's worth refactoring this pattern into a function call :)

mattgara · 2026-02-27T22:00:47Z

benchmark_reporting_tools/post_results.py

+        if query_name not in raw_times:
+            query_logs.append(
+                {
+                    "query_name": query_name_stripped,


Hmmm, I think query_name_stripped is stale here and at risk being the wrong value.

Yep, I messed that up (and my test data didn't have any failures).

mattgara · 2026-02-27T22:02:07Z

benchmark_reporting_tools/post_results.py

+        help="Upload *.log files from the benchmark directory as assets (default: True). Use --no-upload-logs to skip.",
+    )
+    parser.add_argument(
+        "--benchmark-name",


Are we okay with None for benchmark name? If not, perhaps we should set a default, check for None, or set it as required=True.

Nope, it needs to be provided. I'll set it to required.

mattgara · 2026-02-27T22:03:10Z

benchmark_reporting_tools/post_results.py

+        └── benchmark_result.json
+
+Usage:
+    python benchmark_data_tools/post_results.py /path/to/benchmark/dir \


Hmm it looks like this reference to benchmark_data_tools may now be stale?

mattgara · 2026-02-27T22:04:28Z

benchmark_reporting_tools/post_results.py

+    def from_file(cls, file_path: Path) -> "BenchmarkResults":
+        data = json.loads(file_path.read_text())
+
+        if "tpch" not in data.keys():


Out of curiosity, why do we enforce tpch benchmark results only? AFAICT this seems general enough to support other types of benchmarks (such as tpcds?)

Hmm somewhere I thought I saw the enum this was coming from only had "tpch". But now I'm not finding it. I can try threading through a "benchmark-type" option that defaults to tpch.

Done in 55ab534.

mattgara · 2026-02-27T22:05:37Z

benchmark_reporting_tools/post_results.py

+        "run_at": benchmark_metadata.timestamp.isoformat(),
+        "node_count": benchmark_metadata.n_workers,
+        "query_logs": query_logs,
+        "concurrency_streams": 1,


nit: Hardcoded value, is this expected?

mattgara

LGTM

TomAugspurger · 2026-03-02T12:33:24Z

Thanks for the approvals. I don't have write permissions here, so someone else will need to /merge it.

Add script for posting nigthly benchmark results

1b05703

TomAugspurger commented Feb 3, 2026

View reviewed changes

misiugodfrey reviewed Feb 4, 2026

View reviewed changes

TomAugspurger added 3 commits February 10, 2026 15:56

Merge remote-tracking branch 'upstream/main' into tom/nightly-benchmark

b2fed7d

WIP - upload pre-aggregated results

ea77ceb

async

d35e0b3

This comment was marked as outdated.

Sign in to view

TomAugspurger added 4 commits February 10, 2026 16:55

lint

7f7fc1c

Merge remote-tracking branch 'upstream/main' into tom/nightly-benchmark

712cebb

Read from results file

1de7df5

fixup

580711f

TomAugspurger marked this pull request as ready for review February 12, 2026 14:46

TomAugspurger added 4 commits February 12, 2026 10:30

Typo in file benchmark_results filename

c1a417b

Optional engine config

336f585

url normalization

b39899d

benchmark definition name handling

a0b8235

TomAugspurger added 2 commits February 17, 2026 12:53

Merge remote-tracking branch 'upstream/main' into tom/nightly-benchmark

3148fad

Nullable runtime

47946c3

TomAugspurger changed the title ~~Add script for posting nigthly benchmark results~~ Add script for posting nightly benchmark results Feb 17, 2026

TomAugspurger added 8 commits February 17, 2026 15:02

One more failed -> None

b9e422f

Remove pre-aggregated results

f750122

Merge remote-tracking branch 'upstream/main' into tom/nightly-benchmark

95ac85b

cleanup docs

62ddd72

type fixes

dc21cdd

refactor http client builder

bb2f412

fixup

7d4689c

fixup

420e43f

TomAugspurger commented Feb 27, 2026

View reviewed changes

paul-aiyedun reviewed Feb 27, 2026

View reviewed changes

TomAugspurger added 5 commits February 27, 2026 14:50

Move

eadc391

identifier-hash

5819ebb

CLI options for benchmark metadata

23e3f13

Require cache-state

ac8d440

Merge remote-tracking branch 'upstream/main' into tom/nightly-benchmark

72333a4

clarify version

9dad927

TomAugspurger commented Feb 27, 2026

View reviewed changes

mattgara reviewed Feb 27, 2026

View reviewed changes

TomAugspurger added 4 commits February 27, 2026 16:51

fix name

8e54868

concurrency-streams

abdea28

fix early return

678491b

Thread through benchmark name

55ab534

paul-aiyedun approved these changes Feb 28, 2026

View reviewed changes

mattgara approved these changes Feb 28, 2026

View reviewed changes

paul-aiyedun merged commit 47ae102 into rapidsai:main Mar 2, 2026
5 checks passed

Conversation

TomAugspurger commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

misiugodfrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

TomAugspurger commented Feb 12, 2026

Uh oh!

TomAugspurger commented Feb 17, 2026

Uh oh!

TomAugspurger commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-aiyedun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Feb 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Feb 27, 2026

Uh oh!

mattgara left a comment

Choose a reason for hiding this comment

Uh oh!

mattgara Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattgara Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mattgara Feb 27, 2026 •

edited

Loading

mattgara Feb 27, 2026 •

edited

Loading