From 88dce4fec492fbec64bb153cfaba3637a721e720 Mon Sep 17 00:00:00 2001 From: Samuel Bohman Date: Fri, 25 Jul 2025 23:23:34 +0200 Subject: [PATCH] Update tags in docs + some proofreading --- README.md | 66 ++++++++++++++++++-------------------- docs/how-to-quick-start.md | 10 +++--- 2 files changed, 36 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index 28907fcb..dda4cc9c 100644 --- a/README.md +++ b/README.md @@ -3,17 +3,17 @@ Set of interfaces to run various AI models from public MAD. # What is madengine? -An AI Models automation and dashboarding command-line tool to run LLMs and Deep Learning models locally or remotelly with CI. +An AI Models automation and dashboarding command-line tool to run LLMs and Deep Learning models locally or remotely with CI. The madengine library is to support AI automation having following features: - AI Models run reliably on supported platforms and drive software quality -- Simple, minimalistic, out-of-the-box solution that enable confidence on hardware and software stack +- Simple, minimalistic, out-of-the-box solution that enables confidence on hardware and software stack - Real-time, audience-relevant AI Models performance metrics tracking, presented in clear, intuitive manner -- Best-practices for handling internal projects and external open-source projects +- Best practices for handling internal projects and external open-source projects # Installation -madengine is meant to be used in conjunction with [MAD](https://github.com/ROCm/MAD). Below are the steps to set it up and run it using the command line interface (CLI). +madengine is meant to be used in conjunction with [MAD](https://github.com/ROCm/MAD). Below are the steps to set it up and run it using the command-line interface (CLI). ## Clone MAD ``` @@ -29,7 +29,7 @@ cd MAD # Create virtual environment if necessary python3 -m venv venv -# Active the virtual environment venv +# Activate the virtual environment venv source venv/bin/activate # Clone madengine @@ -51,8 +51,6 @@ You can also install the madengine library directly from the Github repository. pip install git+https://github.com/ROCm/madengine.git@main ``` -## Clone - # Run madengine CLI How to run madengine CLI on your local machine. @@ -79,7 +77,7 @@ Commands: ## Run models locally -Command to run LLMs and Deep Learning Models on container. +Command to run LLMs and deep learning models in a container. ``` # An example CLI command to run a model @@ -98,13 +96,13 @@ Run LLMs and Deep Learning models on container optional arguments: -h, --help show this help message and exit --tags TAGS [TAGS ...] - tags to run (can be multiple). + tags to run (space-separated). --timeout TIMEOUT time out for model run in seconds; Overrides per-model timeout if specified or default timeout of 7200 (2 hrs). Timeout of 0 will never timeout. --live-output prints output in real-time directly on STDOUT --clean-docker-cache rebuild docker image without using cache --additional-context-file ADDITIONAL_CONTEXT_FILE - additonal context, as json file, to filter behavior of workloads. Overrides detected contexts. + additional context, as json file, to filter behavior of workloads. Overrides detected contexts. --additional-context ADDITIONAL_CONTEXT additional context, as string representation of python dict, to filter behavior of workloads. Overrides detected contexts and additional- context-file. @@ -116,26 +114,24 @@ optional arguments: generate system config env details by default --force-mirror-local FORCE_MIRROR_LOCAL Path to force all relevant dataproviders to mirror data locally on. - --keep-alive keep Docker container alive after run; will keep model directory after run + --keep-alive keep Docker container alive after run; keeps model directory after run --keep-model-dir keep model directory after run --skip-model-run skips running the model; will not keep model directory after run unless specified through keep-alive or keep-model-dir --disable-skip-gpu-arch - disables skipping model based on gpu architecture + disables skipping model based on GPU architecture -o OUTPUT, --output OUTPUT output file ``` -For each model in models.json, the script -- builds docker images associated with each model. The images are named 'ci-$(model_name)', and are not removed after the script completes. -- starts the docker container, with name, 'container_$(model_name)'. The container should automatically be stopped and removed whenever the script exits. +For each model in models.json, the script: +- builds Docker images associated with each model. The images are named 'ci-$(model_name)', and are not removed after the script completes. +- starts the Docker container, with name, 'container_$(model_name)'. The container should automatically be stopped and removed whenever the script exits. - clones the git 'url', and runs the 'script' -- compiles the final perf.csv and perf.html +- generates the final perf.csv and perf.html ### Tag functionality for running model -With the tag functionality, the user can select a subset of the models, that have the corresponding tags matching user specified tags, to be run. User specified tags can be specified with the `--tags` argument. If multiple tags are specified, all models that match any tag is selected. -Each model name in models.json is automatically a tag that can be used to run that model. Tags are also supported in comma-separated form as a Jenkins parameter. - +The tag functionality enables users to select and run specific models whose tags match those provided with the `--tags` argument. Multiple tags can be specified in a space-separated format, and any model matching at least one tag will be selected (logical OR). Each model name in `models.json` is automatically available as a tag. #### Search models with tags @@ -161,36 +157,36 @@ Place Model Files: Copy the model files into the appropriate directory within th (venv) test-node:~/MAD$ madengine run --tags dummy3:dummy_3:batch_size=512:in=32:out=16 --live-output ``` -The configs of batch_size512:in32:out16 will be pass to environment variables and build arguments of docker. +The configs of batch_size512:in32:out16 will be pass to environment variables and build arguments of Docker. ### Custom timeouts -The default timeout for model run is 2 hrs. This can be overridden if the model in models.json contains a `'timeout' : TIMEOUT` entry. Both the default timeout and/or timeout specified in models.json can be overridden using `--timeout TIMEOUT` command line argument. Having `TIMEOUT` set to 0 means that the model run will never timeout. +The default timeout for model run is 2 hours. This can be overridden if the model in models.json contains a `'timeout' : TIMEOUT` entry. Both the default timeout and/or timeout specified in models.json can be overridden using `--timeout TIMEOUT` command-line argument. Setting `TIMEOUT` to 0 means that the model run will never time out. ### Live output functionality -By default, `madengine` is silent. The output is piped into log files. By specifying `--live-output`, the output is printed in real-time to STDOUT. +By default, `madengine` is silent. The output is piped to log files. By specifying `--live-output`, the output is printed in real-time to STDOUT. ### Contexts -Contexts are run-time parameters that change how the model is executed. Some contexts are auto-detected. Detected contexts may be over-ridden. Contexts are also used to filter Dockerfile used in model. +Contexts are run-time parameters that change how the model is executed. Some contexts are auto-detected. Detected contexts may be overridden. Contexts are also used to filter Dockerfile used in model. -For more details, see [How to provide contexts](docs/how-to-provide-contexts.md) +For more details, see [How to provide contexts](docs/how-to-provide-contexts.md). ### Credentials -Credentials to clone model git urls are provided in a centralized `credential.json` file. Models that require special credentials for cloning have a special `cred` field in the model definition in `models.json`. This field denotes the specific credential in `credential.json` to use. Public models repositories can skip the `cred` field. +Credentials to clone model git URLs are provided in a centralized `credential.json` file. Models that require special credentials for cloning have a special `cred` field in the model definition in `models.json`. This field denotes the specific credential in `credential.json` to use. Public models repositories can skip the `cred` field. There are several types of credentials supported. -1. For HTTP/HTTPS git urls, `username` and `password` should be provided in the credential. For Source Code Management(SCM) systems that support Access Tokens, the token can be substituted for the `password` field. The `username` and `password` will be passed as a docker build argument and a container environment variable in the docker build and run steps. Fore example, for `"cred":"AMD_GITHUB"` field in `models.json` and entry `"AMD_GITHUB": { "username": "github_username", "password":"pass" }` in `credential.json` the following docker build arguments and container environment variables will be added: `AMD_GITHUB_USERNAME="github_username"` and `AMD_GITHUB_PASSWORD="pass"`. +1. For HTTP/HTTPS git URLs, `username` and `password` should be provided in the credential. For Source Code Management (SCM) systems that support Access Tokens, the token can be substituted for the `password` field. The `username` and `password` will be passed as a Docker build argument and a container environment variable in the Docker build and run steps. For example, for `"cred":"AMD_GITHUB"` field in `models.json` and entry `"AMD_GITHUB": { "username": "github_username", "password":"pass" }` in `credential.json` the following Docker build arguments and container environment variables will be added: `AMD_GITHUB_USERNAME="github_username"` and `AMD_GITHUB_PASSWORD="pass"`. -2. For SSH git urls, `username` and `ssh_key_file` should be provided in the credential. The `username` is the SSH username, and `ssh_key_file` is the private ssh key, that has been registed with the SCM system. -Due to legal requirements, the Credentials to access all models is not provided by default in DLM. Please contact the model owner if you wish to access and run the model. +2. For SSH git URLs, `username` and `ssh_key_file` should be provided in the credential. The `username` is the SSH username, and `ssh_key_file` is the private ssh key, that has been registered with the SCM system. +Due to legal requirements, credentials to access all models are not provided by default in DLM. Please contact the model owner if you wish to access and run the model. -3. For NAS urls, `HOST`, `PORT`, `USERNAME`, and `PASSWORD` should be provided in the credential. Please check env variables starting with NAS in [Environment Variables] (https://github.com/ROCm/madengine/blob/main/README.md#environment-variables) +3. For NAS URLs, `HOST`, `PORT`, `USERNAME`, and `PASSWORD` should be provided in the credential. Please check env variables starting with NAS in [Environment Variables](https://github.com/ROCm/madengine/blob/main/README.md#environment-variables) -3. For AWS S3 urls, `USERNAME`, and `PASSWORD` should be provided in the credential with var name as MAD_AWS_S3 as mentioned in [Environment Variables] (https://github.com/ROCm/madengine/blob/main/README.md#environment-variables) +3. For AWS S3 URLs, `USERNAME`, and `PASSWORD` should be provided in the credential with variable name as MAD_AWS_S3 as mentioned in [Environment Variables](https://github.com/ROCm/madengine/blob/main/README.md#environment-variables) ### Local data provider -The DLM user may wish to run a model locally multiple times, with the input data downloaded once, and reused subsquently. This functionality is only supported on models that support the Data Provider functionality. That is, the model specification in `models.json` have the `data` field, which points to a data specification in `data.json`. +The DLM user may wish to run a model locally multiple times, with the input data downloaded once, and reused subsequently. This functionality is only supported on models that support the Data Provider functionality. That is, the model specification in `models.json` have the `data` field, which points to a data specification in `data.json`. To use existing data on a local path, add to the data specification, using a `local` field within `data.json`. By default, this path is mounted read-only. To change this path to read-write, specify the `readwrite` field to `'true'` in the data configuration. @@ -392,7 +388,7 @@ Profile GPU usage of running LLMs and Deep Learning models. ### Tools - Trace Libraries of ROCm -Trace library usage of running LLMs and Deep Learning models. A demo of running model with tracing rocBlas. +Trace library usage of running LLMs and Deep Learning models. A demo of running model with tracing rocBLAS. ``` (venv) test-node:~/MAD$ madengine run --tags pyt_huggingface_bert --additional-context "{'guest_os': 'UBUNTU','tools': [{'name':'rocblas_trace'}]}" @@ -400,7 +396,7 @@ Trace library usage of running LLMs and Deep Learning models. A demo of running ## Environment Variables -Madengine also exposes environment variables to allow for models location setting or data loading at DLM/MAD runtime. +Madengine also exposes environment variables to allow for model location setting or data loading at DLM/MAD runtime. | Field | Description | |-----------------------------| ----------------------------------------------------------------------------------| @@ -414,12 +410,12 @@ Examples for running models using environment variables. # Apply AWS S3 MAD_AWS_S3='{"USERNAME":"username","PASSWORD":"password"}' madengine run --tags dummy_data_aws --live-output -# Apply customized NAS +# Apply custom NAS NAS_NODES=[{"HOST":"hostname","PORT":"22","USERNAME":"username","PASSWORD":"password"}] madengine run --tags dummy_data_austin_nas --live-output ``` ## Unit Test -Run pytest to validate unit tests of MAD Engine. +Run pytest to validate unit tests of MAD engine. ``` pytest -v -s diff --git a/docs/how-to-quick-start.md b/docs/how-to-quick-start.md index 241c048b..28adb561 100644 --- a/docs/how-to-quick-start.md +++ b/docs/how-to-quick-start.md @@ -38,13 +38,13 @@ Run LLMs and Deep Learning models on container optional arguments: -h, --help show this help message and exit --tags TAGS [TAGS ...] - tags to run (can be multiple). + tags to run (space-separated). --timeout TIMEOUT time out for model run in seconds; Overrides per-model timeout if specified or default timeout of 7200 (2 hrs). Timeout of 0 will never timeout. --live-output prints output in real-time directly on STDOUT --clean-docker-cache rebuild docker image without using cache --additional-context-file ADDITIONAL_CONTEXT_FILE - additonal context, as json file, to filter behavior of workloads. Overrides detected contexts. + additional context, as json file, to filter behavior of workloads. Overrides detected contexts. --additional-context ADDITIONAL_CONTEXT additional context, as string representation of python dict, to filter behavior of workloads. Overrides detected contexts and additional- context-file. @@ -78,13 +78,13 @@ Commands for generating reports. ```shell (venv) test-node:~/MAD$ madengine report --help usage: madengine report [-h] {update-perf,to-html,to-email} ... - + optional arguments: -h, --help show this help message and exit - + Report Commands: Available commands for generating reports. - + {update-perf,to-html,to-email} update-perf Update perf.csv to database to-html Convert CSV to HTML report of models