Add script for posting nightly benchmark results#203
Add script for posting nightly benchmark results#203paul-aiyedun merged 32 commits intorapidsai:mainfrom
Conversation
misiugodfrey
left a comment
There was a problem hiding this comment.
Thanks for putting this together, Tom.
I think most of this should work as-is. The only issue is that we haven't standardized some of this expected output in upstream main yet (the output this is based on is from an experimental set of branches). This should be easy now that we have an upstream schema to store in.
I think we just need to standardize our output and then sync this on top.
|
|
||
| Expects coordinator.config and worker.config files. | ||
| """ | ||
| coordinator_config = parse_config_file(configs_dir / "coordinator.config") |
There was a problem hiding this comment.
In our presto environments these files are going to be in a configs directory under <configs>/etc_worker/config_native.properties and <configs>/etc_coordinator/config_native.properties
This comment was marked as outdated.
This comment was marked as outdated.
|
This is updated after #211 was changed to write these results to the |
|
This should just be waiting on #239 and possibly #240, which will generate the After manually running those, we can post the results: |
|
This should be good to go, at least as far as getting basic timings & status posted. |
| # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # /// script |
There was a problem hiding this comment.
I'd recommend running this with uv run benchmark_data_tools/post_results.py and the dependencies will be automatically satisfied. But LMK if there's a spot I should add this dep (I didn't see any groups in https://github.com/rapidsai/velox-testing/blob/main/pyproject.toml).
paul-aiyedun
left a comment
There was a problem hiding this comment.
Changes overall look good to me. Although, I think there is a mismatch between the expected benchmark output and what is currently generated.
| This script operates on the parsed output of the benchmark runner. The | ||
| expected directory structure is: | ||
|
|
||
| ../benchmark-root/ |
There was a problem hiding this comment.
The current benchmarking script only creates a benchmark_result.json file, so this expectation does not match what is implemented.
There was a problem hiding this comment.
I was looking at
. You're saying that's not always available?If not, I'll add CLI options and make the benchmark.json optional.
There was a problem hiding this comment.
No, I think we need to consolidate some of the updates in that script (cc @misiugodfrey).
| help="Storage configuration name", | ||
| ) | ||
| parser.add_argument( | ||
| "--cache-state", |
There was a problem hiding this comment.
Can you please expand on this?
There was a problem hiding this comment.
It's part of the results schema currently, but I don't think it's well defined. At least it isn't clear to me :/
| parser.add_argument( | ||
| "--identifier-hash", | ||
| default=None, | ||
| help="Unique identifier hash for the query engine version", | ||
| ) | ||
| parser.add_argument( | ||
| "--version", | ||
| default=None, | ||
| help="Version string for the query engine", | ||
| ) |
There was a problem hiding this comment.
What is the difference between these two parameters?
There was a problem hiding this comment.
Oops I messed up the description of identifier-hash. It's supposed to be a description of the entire software environment, something like the container image digest. I'll get that fixed, thanks.
|
I've added CLI options for all the stuff we would hope to get from #239. In general, I'd strongly recommend that we get those out of the values recorded by the actual benchmark execution. That way there's no possibility of accidentally passing one thing to the benchmark runner and something else to the thing recording the results. Here's an example using those new flags: Which successfully validates against the jsonschema for our benchmark submit request. |
|
|
||
| if not benchmark_json_path.exists(): | ||
| missing_args = [] | ||
| if kind is None: |
There was a problem hiding this comment.
Kinda verbose :/ but I'm hoping we can delete this once the benchmark runner records all this information.
|
@paul-aiyedun thanks for the review. I think that I've addressed everything. |
mattgara
left a comment
There was a problem hiding this comment.
Overall looks good to me 👍
Thanks @TomAugspurger for working on this.
My only serious feedback would be to look at the one indentation (likely bug) that may really mess things up. Other than that just had a few comments.
| benchmark_metadata = BenchmarkMetadata.from_file(benchmark_dir / "benchmark.json") | ||
| except (ValueError, json.JSONDecodeError, FileNotFoundError) as e: | ||
| print(f" Error loading metadata: {e}", file=sys.stderr) | ||
| return 1 |
There was a problem hiding this comment.
I believe this return 1 should be inside the except block? This looks to be the pattern below. ⬇️
Maybe it's worth refactoring this pattern into a function call :)
| if query_name not in raw_times: | ||
| query_logs.append( | ||
| { | ||
| "query_name": query_name_stripped, |
There was a problem hiding this comment.
Hmmm, I think query_name_stripped is stale here and at risk being the wrong value.
There was a problem hiding this comment.
Yep, I messed that up (and my test data didn't have any failures).
| help="Upload *.log files from the benchmark directory as assets (default: True). Use --no-upload-logs to skip.", | ||
| ) | ||
| parser.add_argument( | ||
| "--benchmark-name", |
There was a problem hiding this comment.
Are we okay with None for benchmark name? If not, perhaps we should set a default, check for None, or set it as required=True.
There was a problem hiding this comment.
Nope, it needs to be provided. I'll set it to required.
| └── benchmark_result.json | ||
|
|
||
| Usage: | ||
| python benchmark_data_tools/post_results.py /path/to/benchmark/dir \ |
There was a problem hiding this comment.
Hmm it looks like this reference to benchmark_data_tools may now be stale?
| def from_file(cls, file_path: Path) -> "BenchmarkResults": | ||
| data = json.loads(file_path.read_text()) | ||
|
|
||
| if "tpch" not in data.keys(): |
There was a problem hiding this comment.
Out of curiosity, why do we enforce tpch benchmark results only? AFAICT this seems general enough to support other types of benchmarks (such as tpcds?)
There was a problem hiding this comment.
Hmm somewhere I thought I saw the enum this was coming from only had "tpch". But now I'm not finding it. I can try threading through a "benchmark-type" option that defaults to tpch.
| "run_at": benchmark_metadata.timestamp.isoformat(), | ||
| "node_count": benchmark_metadata.n_workers, | ||
| "query_logs": query_logs, | ||
| "concurrency_streams": 1, |
There was a problem hiding this comment.
nit: Hardcoded value, is this expected?
|
Thanks for the approvals. I don't have write permissions here, so someone else will need to |
This adds a script for posting the results of a benchmark run to a database. Each run would result in
Here's a sample of the document we'd post:
{ "sku_name": "pdx-h100", "storage_configuration_name": "test", "benchmark_definition_name": "tpch-100", "cache_state": "warm", "query_engine": { "engine_name": "velox", "identifier_hash": "121e1e62b5426195", "version": "unknown", "commit_hash": "unknown" }, "run_at": "2026-01-28T18:38:02+00:00", "node_count": 1, "query_logs": [ { "query_name": "1", "execution_order": 0, "runtime_ms": 3982.0, "status": "success", "extra_info": { "execution_number": 1 } }, { "query_name": "1", "execution_order": 1, "runtime_ms": 1509.0, "status": "success", "extra_info": { "execution_number": 2 } // repeated for each execution of each query } ], "concurrency_streams": 1, "engine_config": { "coordinator": { "coordinator": "true", "node-scheduler.include-coordinator": "false", "http-server.http.port": "9200", "discovery-server.enabled": "true", "discovery.uri": "http://gpu-h100-0017:9200" // ... }, "worker": { "coordinator": "false", "http-server.http.port": "9000", "discovery.uri": "http://gpu-h100-0017:9200", "presto.version": "testversion", "system-memory-gb": "240", "query-memory-gb": "228", // ... } }, "extra_info": { "kind": "single-node", "gpu_count": 1, "gpu_name": "H100", "num_drivers": 2, "worker_image": "presto-native-worker-gpu", "execution_number": 1 }, "is_official": false }