Skip to content

Add script for posting nightly benchmark results#203

Merged
paul-aiyedun merged 32 commits intorapidsai:mainfrom
TomAugspurger:tom/nightly-benchmark
Mar 2, 2026
Merged

Add script for posting nightly benchmark results#203
paul-aiyedun merged 32 commits intorapidsai:mainfrom
TomAugspurger:tom/nightly-benchmark

Conversation

@TomAugspurger
Copy link
Contributor

This adds a script for posting the results of a benchmark run to a database. Each run would result in

  1. A new "QueryEngine" configuration, which identifies the exact software versions used to run the benchmark
  2. A new "BenchmarkRun", which identifies the hardware, query engine, storage configuration, benchmark definition, query engine configuration, and more
  3. Many new "QueryLogs", one per individual query iteration executed

Here's a sample of the document we'd post:

{
  "sku_name": "pdx-h100",
  "storage_configuration_name": "test",
  "benchmark_definition_name": "tpch-100",
  "cache_state": "warm",
  "query_engine": {
    "engine_name": "velox",
    "identifier_hash": "121e1e62b5426195",
    "version": "unknown",
    "commit_hash": "unknown"
  },
  "run_at": "2026-01-28T18:38:02+00:00",
  "node_count": 1,
  "query_logs": [
    {
      "query_name": "1",
      "execution_order": 0,
      "runtime_ms": 3982.0,
      "status": "success",
      "extra_info": {
        "execution_number": 1
      }
    },
    {
      "query_name": "1",
      "execution_order": 1,
      "runtime_ms": 1509.0,
      "status": "success",
      "extra_info": {
        "execution_number": 2
      }
      // repeated for each execution of each query
    }
  ],
  "concurrency_streams": 1,
  "engine_config": {
    "coordinator": {
      "coordinator": "true",
      "node-scheduler.include-coordinator": "false",
      "http-server.http.port": "9200",
      "discovery-server.enabled": "true",
      "discovery.uri": "http://gpu-h100-0017:9200"
      // ...
    },
    "worker": {
      "coordinator": "false",
      "http-server.http.port": "9000",
      "discovery.uri": "http://gpu-h100-0017:9200",
      "presto.version": "testversion",
      "system-memory-gb": "240",
      "query-memory-gb": "228",
      // ...
    }
  },
  "extra_info": {
    "kind": "single-node",
    "gpu_count": 1,
    "gpu_name": "H100",
    "num_drivers": 2,
    "worker_image": "presto-native-worker-gpu",
    "execution_number": 1
  },
  "is_official": false
}

Copy link
Contributor

@misiugodfrey misiugodfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Tom.

I think most of this should work as-is. The only issue is that we haven't standardized some of this expected output in upstream main yet (the output this is based on is from an experimental set of branches). This should be easy now that we have an upstream schema to store in.

I think we just need to standardize our output and then sync this on top.


Expects coordinator.config and worker.config files.
"""
coordinator_config = parse_config_file(configs_dir / "coordinator.config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our presto environments these files are going to be in a configs directory under <configs>/etc_worker/config_native.properties and <configs>/etc_coordinator/config_native.properties

@TomAugspurger

This comment was marked as outdated.

@TomAugspurger
Copy link
Contributor Author

This is updated after #211 was changed to write these results to the benchmark_result.json file under theraw_times_ms key.

@TomAugspurger TomAugspurger marked this pull request as ready for review February 12, 2026 14:46
@TomAugspurger
Copy link
Contributor Author

This should just be waiting on #239 and possibly #240, which will generate the benchmark.json and configs with some information we'd like to store.

After manually running those, we can post the results:

export BENCHMARK_API_URL=...
export BENCHMARK_API_KEY=...

uv run benchmark_data_tools/post_results.py \
	/mnt/data/toaugspurger/velox-testing/ \
	--sku-name coreweave-gb200-nvl72 \
	--storage-configuration-name coreweave-use13a-slurm-data-tpch-rs \
	--benchmark-definition-name tpch-rs-1000 \
	--cache-state warm

@TomAugspurger TomAugspurger changed the title Add script for posting nigthly benchmark results Add script for posting nightly benchmark results Feb 17, 2026
@TomAugspurger
Copy link
Contributor Author

This should be good to go, at least as far as getting basic timings & status posted.

# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# /// script
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend running this with uv run benchmark_data_tools/post_results.py and the dependencies will be automatically satisfied. But LMK if there's a spot I should add this dep (I didn't see any groups in https://github.com/rapidsai/velox-testing/blob/main/pyproject.toml).

Copy link
Contributor

@paul-aiyedun paul-aiyedun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes overall look good to me. Although, I think there is a mismatch between the expected benchmark output and what is currently generated.

This script operates on the parsed output of the benchmark runner. The
expected directory structure is:

../benchmark-root/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current benchmarking script only creates a benchmark_result.json file, so this expectation does not match what is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at

}' > ${OUTPUT_PREFIX}/benchmark.json
. You're saying that's not always available?

If not, I'll add CLI options and make the benchmark.json optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we need to consolidate some of the updates in that script (cc @misiugodfrey).

help="Storage configuration name",
)
parser.add_argument(
"--cache-state",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the results schema currently, but I don't think it's well defined. At least it isn't clear to me :/

Comment on lines +196 to +205
parser.add_argument(
"--identifier-hash",
default=None,
help="Unique identifier hash for the query engine version",
)
parser.add_argument(
"--version",
default=None,
help="Version string for the query engine",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between these two parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I messed up the description of identifier-hash. It's supposed to be a description of the entire software environment, something like the container image digest. I'll get that fixed, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

@TomAugspurger
Copy link
Contributor Author

I've added CLI options for all the stuff we would hope to get from #239. In general, I'd strongly recommend that we get those out of the values recorded by the actual benchmark execution. That way there's no possibility of accidentally passing one thing to the benchmark runner and something else to the thing recording the results.

Here's an example using those new flags:

uv run benchmark_reporting_tools/post_results.py data --sku-name h100 --storage-configuration-name storage --benchmark-name=benchmark --identifier-hash=idhash --dry-run --kind single --benchmark tpch --timestamp 2026-01-01T00:00:00 --n-workers=1 --scale-factor=1 --gpu-count=1 --gpu-name=h100 --num-drivers=1 --engine=name=velox
-cudf --cache-state=warm

Which successfully validates against the jsonschema for our benchmark submit request.


if not benchmark_json_path.exists():
missing_args = []
if kind is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda verbose :/ but I'm hoping we can delete this once the benchmark runner records all this information.

@TomAugspurger
Copy link
Contributor Author

@paul-aiyedun thanks for the review. I think that I've addressed everything.

Copy link
Contributor

@mattgara mattgara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me 👍

Thanks @TomAugspurger for working on this.

My only serious feedback would be to look at the one indentation (likely bug) that may really mess things up. Other than that just had a few comments.

benchmark_metadata = BenchmarkMetadata.from_file(benchmark_dir / "benchmark.json")
except (ValueError, json.JSONDecodeError, FileNotFoundError) as e:
print(f" Error loading metadata: {e}", file=sys.stderr)
return 1
Copy link
Contributor

@mattgara mattgara Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this return 1 should be inside the except block? This looks to be the pattern below. ⬇️

Maybe it's worth refactoring this pattern into a function call :)

if query_name not in raw_times:
query_logs.append(
{
"query_name": query_name_stripped,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think query_name_stripped is stale here and at risk being the wrong value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I messed that up (and my test data didn't have any failures).

help="Upload *.log files from the benchmark directory as assets (default: True). Use --no-upload-logs to skip.",
)
parser.add_argument(
"--benchmark-name",
Copy link
Contributor

@mattgara mattgara Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay with None for benchmark name? If not, perhaps we should set a default, check for None, or set it as required=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it needs to be provided. I'll set it to required.

└── benchmark_result.json

Usage:
python benchmark_data_tools/post_results.py /path/to/benchmark/dir \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it looks like this reference to benchmark_data_tools may now be stale?

def from_file(cls, file_path: Path) -> "BenchmarkResults":
data = json.loads(file_path.read_text())

if "tpch" not in data.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we enforce tpch benchmark results only? AFAICT this seems general enough to support other types of benchmarks (such as tpcds?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm somewhere I thought I saw the enum this was coming from only had "tpch". But now I'm not finding it. I can try threading through a "benchmark-type" option that defaults to tpch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 55ab534.

"run_at": benchmark_metadata.timestamp.isoformat(),
"node_count": benchmark_metadata.n_workers,
"query_logs": query_logs,
"concurrency_streams": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Hardcoded value, is this expected?

Copy link
Contributor

@mattgara mattgara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TomAugspurger
Copy link
Contributor Author

Thanks for the approvals. I don't have write permissions here, so someone else will need to /merge it.

@paul-aiyedun paul-aiyedun merged commit 47ae102 into rapidsai:main Mar 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants