Downgrade NumPy < 2.0 in pyt_huggingface.ubuntu.amd.Dockerfile #73
the-matrix/Matrix - arch = 'gfx908'/models/pyt_huggingface_bert-gfx908: error in 'error' step
the-matrix / Matrix - arch = 'gfx908' / Matrix - arch = 'gfx908' / models / pyt_huggingface_gpt2-gfx908 / pyt_huggingface_gpt2-gfx908 / Shell Script
Error in sh step, with arguments madengine run --tags pyt_huggingface_gpt2 --live-output -o perf_gfx908.csv 2>&1 | tee madengine.run.log if grep -i -e '= EXCEPTION =' -e 'unrecognized arguments:' -e 'RuntimeError:' madengine.run.log 1>/dev/null; then echo Found error/exception during madengine command run exit 1 fi .
script returned exit code 1
Build log
[2026-02-06T17:17:36.314Z] + madengine run --tags pyt_huggingface_gpt2 --live-output -o+ perf_gfx908.csv
[2026-02-06T17:17:36.314Z] tee madengine.run.log
[2026-02-06T17:17:36.759Z] MAD_MINIO environment variable is not set.
[2026-02-06T17:17:36.759Z] MAD_MINIO is using default values.
[2026-02-06T17:17:36.759Z] Running models on container
[2026-02-06T17:17:36.759Z] > if [ -f 'ctx_test' ]; then cat ctx_test; else echo 'None'; fi || true
[2026-02-06T17:17:36.759Z] > if [ -f "$(which apt)" ]; then echo 'HOST_UBUNTU'; elif [ -f "$(which yum)" ]; then echo 'HOST_CENTOS'; elif [ -f "$(which zypper)" ]; then echo 'HOST_SLES'; elif [ -f "$(which tdnf)" ]; then echo 'HOST_AZURE'; else echo 'Unable to detect Host OS'; fi || true
[2026-02-06T17:17:36.759Z] > cat /proc/sys/kernel/numa_balancing || true
[2026-02-06T17:17:36.759Z] Warning: numa balancing is OFF ...
[2026-02-06T17:17:36.759Z] > bash -c 'if [[ -f /usr/bin/nvidia-smi ]] && $(/usr/bin/nvidia-smi > /dev/null 2>&1); then echo "NVIDIA"; elif [[ -f /opt/rocm/bin/amd-smi ]]; then echo "AMD"; elif [[ -f /usr/local/bin/amd-smi ]]; then echo "AMD"; else echo "Unable to detect GPU vendor"; fi || true'
[2026-02-06T17:17:36.759Z] > amd-smi list --csv | tail -n +3 | wc -l
[2026-02-06T17:17:36.759Z] > /opt/rocm/bin/rocminfo |grep -o -m 1 'gfx.*'
[2026-02-06T17:17:36.759Z] > amd-smi static -g 0 | grep MARKET_NAME: | cut -d ':' -f 2
[2026-02-06T17:17:37.199Z] > hipconfig --version | cut -d'.' -f1,2
[2026-02-06T17:17:37.199Z] > /opt/rocm/bin/rocminfo |grep -o -m 1 'gfx.*'
[2026-02-06T17:17:37.199Z] > amd-smi static -g 0 | grep MARKET_NAME: | cut -d ':' -f 2
[2026-02-06T17:17:37.632Z] > cat /opt/rocm/.info/version | cut -d'-' -f1
[2026-02-06T17:17:37.632Z] > grep -r drm_render_minor /sys/devices/virtual/kfd/kfd/topology/nodes
[2026-02-06T17:17:37.632Z] > grep -r unique_id /sys/devices/virtual/kfd/kfd/topology/nodes
[2026-02-06T17:17:37.632Z] > rocm-smi --showuniqueid | grep 'Unique.*:'
[2026-02-06T17:17:37.632Z] Traceback (most recent call last):
[2026-02-06T17:17:37.632Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 462, in get_gpu_renderD_nodes
[2026-02-06T17:17:37.632Z] raise KeyError(f"Unique ID '{unique_id}' from rocm-smi not found in KFD mapping")
[2026-02-06T17:17:37.632Z] KeyError: "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:37.633Z]
[2026-02-06T17:17:37.633Z] During handling of the above exception, another exception occurred:
[2026-02-06T17:17:37.633Z]
[2026-02-06T17:17:37.633Z] Traceback (most recent call last):
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 465, in get_gpu_renderD_nodes
[2026-02-06T17:17:37.633Z] raise RuntimeError(f"Failed to map unique ID from line '{line}': {e}")
[2026-02-06T17:17:37.633Z] RuntimeError: Failed to map unique ID from line 'GPU[0] : Unique ID: N/A': "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:37.633Z]
[2026-02-06T17:17:37.633Z] The above exception was the direct cause of the following exception:
[2026-02-06T17:17:37.633Z]
[2026-02-06T17:17:37.633Z] Traceback (most recent call last):
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/bin/madengine", line 6, in <module>
[2026-02-06T17:17:37.633Z] sys.exit(main())
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/mad.py", line 283, in main
[2026-02-06T17:17:37.633Z] result = args.func(args)
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/mad.py", line 37, in run_models
[2026-02-06T17:17:37.633Z] run_models = RunModels(args=args)
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/tools/run_models.py", line 157, in __init__
[2026-02-06T17:17:37.633Z] self.context = Context(
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 123, in __init__
[2026-02-06T17:17:37.633Z] self.ctx["gpu_renderDs"] = self.get_gpu_renderD_nodes()
[2026-02-06T17:17:37.633Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 524, in get_gpu_renderD_nodes
[2026-02-06T17:17:37.633Z] raise RuntimeError(f"Error in get_gpu_renderD_nodes: {e}") from e
[2026-02-06T17:17:37.633Z] RuntimeError: Error in get_gpu_renderD_nodes: Failed to map unique ID from line 'GPU[0] : Unique ID: N/A': "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:37.633Z] + grep -i -e = EXCEPTION = -e unrecognized arguments: -e RuntimeError: madengine.run.log
[2026-02-06T17:17:37.633Z] + echo Found error/exception during madengine command run
[2026-02-06T17:17:37.633Z] Found error/exception during madengine command run
[2026-02-06T17:17:37.633Z] + exit 1
the-matrix / Matrix - arch = 'gfx908' / Matrix - arch = 'gfx908' / models / pyt_huggingface_gpt2-gfx908 / pyt_huggingface_gpt2-gfx908 / Error signal
Error in error step, with arguments pyt_huggingface_gpt2-gfx908 threw "hudson.AbortException: script returned exit code 1"..
pyt_huggingface_gpt2-gfx908 threw "hudson.AbortException: script returned exit code 1".
the-matrix / Matrix - arch = 'gfx908' / Matrix - arch = 'gfx908' / models / pyt_huggingface_bert-gfx908 / pyt_huggingface_bert-gfx908 / Shell Script
Error in sh step, with arguments madengine run --tags pyt_huggingface_bert --live-output -o perf_gfx908.csv 2>&1 | tee madengine.run.log if grep -i -e '= EXCEPTION =' -e 'unrecognized arguments:' -e 'RuntimeError:' madengine.run.log 1>/dev/null; then echo Found error/exception during madengine command run exit 1 fi .
script returned exit code 1
Build log
[2026-02-06T17:17:48.663Z] + madengine run --tags pyt_huggingface_bert --live-output -o perf_gfx908.csv
[2026-02-06T17:17:48.663Z] + tee madengine.run.log
[2026-02-06T17:17:49.094Z] MAD_MINIO environment variable is not set.
[2026-02-06T17:17:49.094Z] MAD_MINIO is using default values.
[2026-02-06T17:17:49.094Z] Running models on container
[2026-02-06T17:17:49.094Z] > if [ -f 'ctx_test' ]; then cat ctx_test; else echo 'None'; fi || true
[2026-02-06T17:17:49.094Z] > if [ -f "$(which apt)" ]; then echo 'HOST_UBUNTU'; elif [ -f "$(which yum)" ]; then echo 'HOST_CENTOS'; elif [ -f "$(which zypper)" ]; then echo 'HOST_SLES'; elif [ -f "$(which tdnf)" ]; then echo 'HOST_AZURE'; else echo 'Unable to detect Host OS'; fi || true
[2026-02-06T17:17:49.094Z] > cat /proc/sys/kernel/numa_balancing || true
[2026-02-06T17:17:49.094Z] Warning: numa balancing is OFF ...
[2026-02-06T17:17:49.094Z] > bash -c 'if [[ -f /usr/bin/nvidia-smi ]] && $(/usr/bin/nvidia-smi > /dev/null 2>&1); then echo "NVIDIA"; elif [[ -f /opt/rocm/bin/amd-smi ]]; then echo "AMD"; elif [[ -f /usr/local/bin/amd-smi ]]; then echo "AMD"; else echo "Unable to detect GPU vendor"; fi || true'
[2026-02-06T17:17:49.094Z] > amd-smi list --csv | tail -n +3 | wc -l
[2026-02-06T17:17:49.094Z] > /opt/rocm/bin/rocminfo |grep -o -m 1 'gfx.*'
[2026-02-06T17:17:49.094Z] > amd-smi static -g 0 | grep MARKET_NAME: | cut -d ':' -f 2
[2026-02-06T17:17:49.532Z] > hipconfig --version | cut -d'.' -f1,2
[2026-02-06T17:17:49.532Z] > /opt/rocm/bin/rocminfo |grep -o -m 1 'gfx.*'
[2026-02-06T17:17:49.532Z] > amd-smi static -g 0 | grep MARKET_NAME: | cut -d ':' -f 2
[2026-02-06T17:17:49.962Z] > cat /opt/rocm/.info/version | cut -d'-' -f1
[2026-02-06T17:17:49.962Z] > grep -r drm_render_minor /sys/devices/virtual/kfd/kfd/topology/nodes
[2026-02-06T17:17:49.962Z] > grep -r unique_id /sys/devices/virtual/kfd/kfd/topology/nodes
[2026-02-06T17:17:49.962Z] > rocm-smi --showuniqueid | grep 'Unique.*:'
[2026-02-06T17:17:49.962Z] Traceback (most recent call last):
[2026-02-06T17:17:49.962Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 462, in get_gpu_renderD_nodes
[2026-02-06T17:17:49.962Z] raise KeyError(f"Unique ID '{unique_id}' from rocm-smi not found in KFD mapping")
[2026-02-06T17:17:49.962Z] KeyError: "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:49.962Z]
[2026-02-06T17:17:49.962Z] During handling of the above exception, another exception occurred:
[2026-02-06T17:17:49.962Z]
[2026-02-06T17:17:49.962Z] Traceback (most recent call last):
[2026-02-06T17:17:49.962Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 465, in get_gpu_renderD_nodes
[2026-02-06T17:17:49.962Z] raise RuntimeError(f"Failed to map unique ID from line '{line}': {e}")
[2026-02-06T17:17:49.962Z] RuntimeError: Failed to map unique ID from line 'GPU[0] : Unique ID: N/A': "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:49.962Z]
[2026-02-06T17:17:49.962Z] The above exception was the direct cause of the following exception:
[2026-02-06T17:17:49.962Z]
[2026-02-06T17:17:49.962Z] Traceback (most recent call last):
[2026-02-06T17:17:49.962Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/bin/madengine", line 6, in <module>
[2026-02-06T17:17:49.963Z] sys.exit(main())
[2026-02-06T17:17:49.963Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/mad.py", line 283, in main
[2026-02-06T17:17:49.963Z] result = args.func(args)
[2026-02-06T17:17:49.963Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/mad.py", line 37, in run_models
[2026-02-06T17:17:49.963Z] run_models = RunModels(args=args)
[2026-02-06T17:17:49.963Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/tools/run_models.py", line 157, in __init__
[2026-02-06T17:17:49.963Z] self.context = Context(
[2026-02-06T17:17:49.963Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 123, in __init__
[2026-02-06T17:17:49.963Z] self.ctx["gpu_renderDs"] = self.get_gpu_renderD_nodes()
[2026-02-06T17:17:49.963Z] File "/home/jenkins/workspace/DLM_Public-MAD-CI_PR-73/venv/lib/python3.10/site-packages/madengine/core/context.py", line 524, in get_gpu_renderD_nodes
[2026-02-06T17:17:49.963Z] raise RuntimeError(f"Error in get_gpu_renderD_nodes: {e}") from e
[2026-02-06T17:17:49.963Z] RuntimeError: Error in get_gpu_renderD_nodes: Failed to map unique ID from line 'GPU[0] : Unique ID: N/A': "Unique ID 'N/A' from rocm-smi not found in KFD mapping"
[2026-02-06T17:17:50.098Z] + grep -i -e = EXCEPTION = -e unrecognized arguments: -e RuntimeError: madengine.run.log
[2026-02-06T17:17:50.098Z] + echo Found error/exception during madengine command run
[2026-02-06T17:17:50.098Z] Found error/exception during madengine command run
[2026-02-06T17:17:50.098Z] + exit 1
the-matrix / Matrix - arch = 'gfx908' / Matrix - arch = 'gfx908' / models / pyt_huggingface_bert-gfx908 / pyt_huggingface_bert-gfx908 / Error signal
Error in error step, with arguments pyt_huggingface_bert-gfx908 threw "hudson.AbortException: script returned exit code 1"..
pyt_huggingface_bert-gfx908 threw "hudson.AbortException: script returned exit code 1".
Details
- Declarative: Checkout SCM (30 sec)
- resetbuild (1.5 sec)
- the-matrix (2 min 45 sec)
- Matrix - arch = 'gfx908' (6 ms)
- Matrix - arch = 'gfx908' (2 min 28 sec)
- models (1 min 56 sec)
- pyt_huggingface_gpt2-gfx908 (8 ms)
- pyt_huggingface_bert-gfx908 (45 sec)
- models (1 min 56 sec)
- Matrix - arch = 'gfx908' (2 min 28 sec)
- Matrix - arch = 'MI250' (6 ms)
- Matrix - arch = 'MI250' (34 sec)
- models (14 sec)
- Matrix - arch = 'MI250' (34 sec)
- Matrix - arch = 'MI250_CA' (8 ms)
- Matrix - arch = 'MI250_CA' (34 sec)
- models (14 sec)
- Matrix - arch = 'MI250_CA' (34 sec)
- Matrix - arch = 'MI250X-A1' (5 ms)
- Matrix - arch = 'MI250X-A1' (33 sec)
- models (14 sec)
- Matrix - arch = 'MI250X-A1' (33 sec)
- Matrix - arch = 'MI300X_BANFF' (6 ms)
- Matrix - arch = 'MI300X_BANFF' (33 sec)
- models (14 sec)
- Matrix - arch = 'MI300X_BANFF' (33 sec)
- Matrix - arch = 'MI300X_GT' (6 ms)
- Matrix - arch = 'MI300X_GT' (33 sec)
- models (14 sec)
- Matrix - arch = 'MI300X_GT' (33 sec)
- Matrix - arch = 'gfx1100' (7 ms)
- Matrix - arch = 'gfx1100' (33 sec)
- models (14 sec)
- Matrix - arch = 'gfx1100' (33 sec)
- Matrix - arch = 'A100' (7 ms)
- Matrix - arch = 'A100' (33 sec)
- models (14 sec)
- Matrix - arch = 'A100' (33 sec)
- Matrix - arch = 'H100' (8 ms)
- Matrix - arch = 'H100' (33 sec)
- models (14 sec)
- Matrix - arch = 'H100' (33 sec)
- Matrix - arch = 'dlmodels' (7 ms)
- Matrix - arch = 'dlmodels' (33 sec)
- models (14 sec)
- Matrix - arch = 'dlmodels' (33 sec)
- Matrix - arch = 'oci-64' (50 sec)
- Matrix - arch = 'oci-64' (33 sec)
- models (14 sec)
- Matrix - arch = 'oci-64' (33 sec)
- Matrix - arch = 'gfx908' (6 ms)