Add benchmark configuration output to support posting to DB#239
Add benchmark configuration output to support posting to DB#239misiugodfrey wants to merge 26 commits intomainfrom
Conversation
TomAugspurger
left a comment
There was a problem hiding this comment.
The output sample you shared looks good to me.
paul-aiyedun
left a comment
There was a problem hiding this comment.
The overall idea makes sense to me. However, I had questions about some of the implementation details, Slurm specific logic, and assumptions being made.
Also, please update the PR title to reflect this update.
| coordinator=false | ||
| # Worker REST/HTTP port for internal and admin endpoints. | ||
| http-server.http.port=8080 | ||
| http-server.http.port=10000 |
There was a problem hiding this comment.
Is this an intentional update?
There was a problem hiding this comment.
Yes. The worker's default port should not be the same as the coordinator's port. This change makes things more clear and the defaults are in sync with both the multi-worker case and the slurm case (where these values will be overwritten).
Without this change, you can't access the worker node's API for single worker sessions, which can still be necessary to get some information.
There was a problem hiding this comment.
After our discussion, I've reverted this change; although I may need to make further changes once testing can resume on slurm clusters.
presto/scripts/run_benchmark.sh
Outdated
| -m, --metrics Collect detailed metrics from Presto REST API after each query. | ||
| Metrics are stored in query-specific directories. | ||
|
|
||
| ENVIRONMENT: |
There was a problem hiding this comment.
Can we use command line arguments instead of environment variables?
There was a problem hiding this comment.
I've only left the DEBUG option as it would be tedious to pass that through the entire chain of scripts. If we want to change that further, we can, but I think this is cleaner.
There was a problem hiding this comment.
What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.
There was a problem hiding this comment.
run_benchmark.sh is the top-level script, but the PRESTO_BENCHMARK_DEBUG context is used by the python scripts that are called by run_benchmark.sh.
As for LOGS, this value is supposed to reflect the hard-coded directory structure that the docker containers mount (although it can be changed in the slurm context). For now I'm removing this comment because it should not be changable unless we can also change it in the docker templates. Easier to just have one hard-coded path for now.
presto/scripts/run_benchmark.sh
Outdated
| PRESTO_BENCHMARK_DEBUG Set to 1 to print debug logs for worker/engine detection | ||
| (e.g. node URIs, reachability, metrics, Docker containers). | ||
| Use when engine is misdetected or the run fails. | ||
| Docker In Docker setups, engine is inferred from running worker |
There was a problem hiding this comment.
Can you please expand on this?
There was a problem hiding this comment.
What's happening here is that some of the details of the run are inferred based on the name of the image that is running. If presto-native-worker-gpu is running, then it infers that this is a velox-gpu run. If docker is not running, then it checks if it's in a Slurm environment, in which case it uses other techniques based on how the slurm scripts work.
To clarify, this isn't an environment variable, just a note that the environment in which it is running will affect this path. I can make the comments more clear.
| @@ -32,3 +32,12 @@ class BenchmarkKeys(str, Enum): | |||
| CONTEXT_KEY = "context" | |||
| ITERATIONS_COUNT_KEY = "iterations_count" | |||
| SCHEMA_NAME_KEY = "schema_name" | |||
| SCALE_FACTOR_KEY = "scale_factor" | |||
There was a problem hiding this comment.
Can we copy over the metadata.json file from the dataset instead of duplicating some of the details here?
There was a problem hiding this comment.
Do you mean copying the metadata.json file directly, or referencing the data inside of it? Right now the scale factor can be overridden in our options when running benchmarks, so we either take the override if provided, or read the scale factor from the metadata.json.
I've removed this key since it is no longer used. The code to extract the scale factor is in run_context.py:get_scale_factor_from_schema()
| "benchmark": benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types, | ||
| **run_config, | ||
| } | ||
| with open(f"{bench_output_dir}/benchmark_config.json", "w") as file: |
There was a problem hiding this comment.
The existing JSON file has a "context" field. Can we extend that instead of writing to a new file?
|
|
||
| def get_gpu_name_from_slurm_logs() -> str | None: | ||
| """ | ||
| When running under SLURM, workers run nvidia-smi -L and write to LOGS/worker_<id>.log. |
There was a problem hiding this comment.
If there is a need to log the output of nvidia-smi, then we should probably do this consistently (and not just for slurm).
There was a problem hiding this comment.
The current approach get the gpu details by running nvidia-smi in both contexts, it's just that in the docker context we run it on the host, whereas in the slurm context, the login-nodes don't have nvidia-smi installed, so you need to run it on the worker node (i.e. in the worker's container).
We could potentially try to consolidate this, but it's more an issue of how the slurm clusters' encapsulate things that required extra steps in that context.
|
|
||
| def get_engine_from_slurm() -> str | None: | ||
| """ | ||
| Infer engine when running under SLURM from nvidia-smi -L output in LOGS/worker_0.log. |
There was a problem hiding this comment.
Can we standardize how this is done instead of having a slurm specific solution?
|
|
||
| def get_gpu_name() -> str | None: | ||
| """ | ||
| Return GPU model name. Under SLURM, read from LOGS/worker_<id>.log if LOGS is set; |
There was a problem hiding this comment.
Same comment as above.
|
|
||
| def get_worker_image() -> str | None: | ||
| """Return worker image name from env (set by cluster/container setup).""" | ||
| return os.environ.get("WORKER_IMAGE") |
There was a problem hiding this comment.
This is only set in the slurm scripts. I'm removing this section since it doesn't make much sense to specify a WORKER_IMAGE name since we are detecting the engine/variant through other means.
| return os.environ.get("WORKER_IMAGE") | ||
|
|
||
|
|
||
| def _current_username() -> str: |
There was a problem hiding this comment.
We plan on adding the ability to run presto with existing images. I don't think we should make assumptions about image/tag names.
There was a problem hiding this comment.
I've changed this to determine the engine/variant through the presto API, so this is all removed now.
| - ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties | ||
| - ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties | ||
| - ../../worker_logs:/opt/presto-server/logs | ||
| - ../../launch_presto_servers.sh:/opt/launch_presto_servers.sh:ro |
There was a problem hiding this comment.
Mounting this file was necessary so that older images can still benefit from the new version.
TomAugspurger
left a comment
There was a problem hiding this comment.
The changes to post_results.py look good.
paul-aiyedun
left a comment
There was a problem hiding this comment.
Changes overall look good to me. However, I had a few questions, code cleanup comments, and comments about logging.
| └── result_dir | ||
| └── benchmark_result.json | ||
| └── logs # optional | ||
| └── slurm-4575179.out |
There was a problem hiding this comment.
Nit: Should this still be Slurm specific?
| SCHEMA_NAME_KEY = "schema_name" | ||
| # Run configuration (from run context; written to context in benchmark_result.json) | ||
| TIMESTAMP_KEY = "timestamp" | ||
| NUM_WORKERS_KEY = "n_workers" |
There was a problem hiding this comment.
Nit: Can this be worker_count (consistent with other keys)?
presto/scripts/run_benchmark.sh
Outdated
| -m, --metrics Collect detailed metrics from Presto REST API after each query. | ||
| Metrics are stored in query-specific directories. | ||
|
|
||
| ENVIRONMENT: |
There was a problem hiding this comment.
What is the chain of scripts in this case (I believe run_benchmark.sh is a top-level script)? Besides the debug variable, there is also LOGS below.
| - ../../config/generated/gpu/etc_worker/node.properties:/opt/presto-server/etc/node.properties | ||
| - ../../config/generated/gpu/etc_worker/config_native.properties:/opt/presto-server/etc/config.properties | ||
| - ../../config/generated/gpu/etc_worker/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties | ||
| - ../../worker_logs:/opt/presto-server/logs |
There was a problem hiding this comment.
Is there a reason for limiting this to just worker logs?
| - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/node.properties:/opt/presto-server/etc/node.properties | ||
| - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/config_native.properties:/opt/presto-server/etc/config.properties | ||
| - ../../config/generated/gpu/etc_worker_{{ gpu_id }}/catalog/hive.properties:/opt/presto-server/etc/catalog/hive.properties | ||
| - ../../worker_logs:/opt/presto-server/logs |
There was a problem hiding this comment.
Instead of duplicating this update in all the variant docker-compose files, can this be set in docker-compose.common.yml?
There was a problem hiding this comment.
I've moved this to presto-base-volumes so that it is shared now.
| pass | ||
|
|
||
|
|
||
| def _parse_gpu_name_from_text(line: str) -> str | None: |
There was a problem hiding this comment.
Can we avoid this by directly printing the GPU name i.e. nvidia-smi --query-gpu=name --format=csv,noheader?
There was a problem hiding this comment.
We could do this, but keep in mind that this needs to be run in the container and therefore the docker logs would still need to be parsed. If we use the above suggestion, it would be less obvious that we are getting nvidia-smi output (No GPU: anchor) and we would have to assume that the first line would give us this information.
I think it may be cleaner to use the nvidia-smi -L output as it's clear what that is, and if it's missing, the parse will fail, rather than assuming the first line of the logs will be a gpu name.
| } | ||
|
|
||
|
|
||
| def _get_num_drivers(engine: str) -> int | None: |
There was a problem hiding this comment.
Can we also get this from the logs?
|
|
||
| if n_workers is not None: | ||
| ctx["n_workers"] = n_workers | ||
| ctx["kind"] = "single-node" if n_workers == 1 else f"{n_workers}-node" |
There was a problem hiding this comment.
Out of curiosity, what is the purpose of the kind field?
There was a problem hiding this comment.
It is one of the columns specified in the database. I believe the intention is to separate single-node from multi-node runs. With the information we are providing, I think it is a little redundant (since we have num_workers), but may be more necessary for other platforms.
| return ctx | ||
|
|
||
|
|
||
| def pytest_sessionfinish(session, exitstatus): |
There was a problem hiding this comment.
Is this duplicating logic in
?| @@ -15,27 +15,28 @@ | |||
| expected directory structure is: | |||
|
|
|||
| ../benchmark-root/ | |||
There was a problem hiding this comment.
What script is creating a directory with this structure?
There was a problem hiding this comment.
This occurs in conftest.py:pytest_sessionfinish(). The top level script would just be run_benchmark.sh
paul-aiyedun
left a comment
There was a problem hiding this comment.
I had a couple of comments, but changes look good to me.
| if hasattr(session, "benchmark_results"): | ||
| benchmark_types = list(session.benchmark_results.keys()) | ||
| json_result[BenchmarkKeys.CONTEXT_KEY]["benchmark"] = ( | ||
| benchmark_types[0] if len(benchmark_types) == 1 else benchmark_types |
There was a problem hiding this comment.
Is there a reason not to have the value be consistently a list?
| hostname = session.config.getoption("--hostname") | ||
| port = session.config.getoption("--port") | ||
| user = session.config.getoption("--user") | ||
| schema_name = session.config.getoption("--schema-name") | ||
|
|
||
| ctx = gather_run_context( | ||
| hostname=hostname, | ||
| port=port, | ||
| user=user, | ||
| schema_name=schema_name, | ||
| ) |
There was a problem hiding this comment.
These are presto specific options. I think we can set run_context in a presto specific fixture (similar to
This PR consolidates benchmark result reporting, unifies engine/GPU detection across Docker and SLURM environments, and refactors shared test infrastructure into a common module. This is for the purpose of providing an interface to automatically post benchmark results to the benchmarking DB via the post_results.py script.
Run context collection (run_context.py):
Automatically gathers engine type, GPU model, scale factor, worker count, and num_drivers at benchmark time and writes them into the "context" section of benchmark_result.json, which can then be parsed as part of post_results.py. E.g.:
Unified engine and GPU detection:
It was necessary to gather information about a running cluster to put into the benchmark context. To that end some changes were necessary to provide the appropriate information at runtime (rather than trying to predict context from things like image names) as well as to unify how we present this information across the docker and slurm scripts.
Engine detection via cluster-tag: All variants (GPU, CPU, Java) now consistently set cluster-tag in the coordinator config. run_context.py queries the Presto /v1/cluster API to determine the engine type, replacing the previous approach of parsing Docker image names or SLURM logs.
GPU model detection via worker logs: Docker worker containers now run nvidia-smi -L at startup and write output to worker_logs/worker_.log, matching the existing SLURM behavior. run_context.py reads these logs uniformly regardless of the deployment environment.
launch_presto_servers.sh rewritten to capture nvidia-smi output and redirect presto_server stdout/stderr to per-worker log files. Mounted as a read-only volume so changes don't require image rebuilds.
worker_logs volume mount added to all worker services (GPU, CPU, Java) in Docker Compose files.