Skip to content

RuntimeError: operator torchvision::nms does not exist #9435

@mgiessing

Description

@mgiessing

🐛 Describe the bug

I built pytorch and torchvision from source against CUDA12.4.1 because that is the latest version available for IBM Power9 ppc64le with V100 GPU.

python3 -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA version: {torch.version.cuda}'); print(f'CUDA available: {torch.cuda.is_available()}')"

PyTorch version: 2.10.0
CUDA version: 12.4
CUDA available: True

It isn't clearly documented how to build the wheel so I tried to recreate it based on the gh actions log:

Inside a manylinux_2_28_ppc64le container with CUDA12.4.1 installed:

curl -L https://micro.mamba.pm/api/micromamba/linux-ppc64le/latest | tar -xvj
export MAMBA_ROOT_PREFIX=$HOME/.micromamba  # optional, defaults to ~/micromamba
eval "$(./bin/micromamba shell hook -s posix)"

yum install -y libjpeg-turbo-devel libwebp-devel freetype gnutls zip

export rel=v0.25.0
export ver=cp312-cp312

export BUILD_VERSION=${rel:1}
export PYTORCH_BUILD_NUMBER=0
export PYTORCH_BUILD_VERSION=${rel:1}
export FORCE_CUDA=1

git clone --depth 1 -b ${rel} --recursive https://github.com/pytorch/vision.git && cd vision
curl -LO https://raw.githubusercontent.com/pytorch/test-infra/refs/heads/main/.github/scripts/repair_manylinux_2_28.sh
chmod +x repair_manylinux_2_28.sh
sed -i "s/aarch64/ppc64le/g" packaging/post_build_script.sh

export BUILD_VERSION=${rel:1}
export PYTORCH_BUILD_NUMBER=0
export PYTORCH_BUILD_VERSION=${rel:1}

micromamba create -n py-${ver} -c conda-forge python=${ver} conda libwebp libjpeg-turbo -y
micromamba activate py-${ver}
bash packaging/pre_build_script.sh
pip3 install Cython "auditwheel<6.3" numpy future ninja pyyaml http://10.x.x.x/whl/torch/cu124/torch-2.10.0-cp${ver//./}-cp${ver//./}-manylinux_2_28_ppc64le.whl --upgrade setuptools==72.1.0 

python3 setup.py clean
python3 setup.py bdist_wheel
./repair_manylinux_2_28.sh /vision/$(ls dist/*whl)

bash packaging/post_build_script.sh

Then wheel is then uploaded to the 10.x.x.x server from where I install it.

When I try to install it into a python:3.12-slim container I get this error:

export TORCH_VER=2.10.0
export PY_VER=cp312-cp312

pip3 install numpy \
  http://10.x.x.x/whl/torch/cu124/torch-${TORCH_VER}-${PY_VER}-manylinux_2_28_ppc64le.whl \
  http://10.x.x.x/whl/torchvision/cu124/torchvision-0.25.0-${PY_VER}-manylinux_2_28_ppc64le.whl

python3 -c "import torchvision"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/library.py", line 1073, in register
    use_lib._register_fake(
  File "/usr/local/lib/python3.12/site-packages/torch/library.py", line 203, in _register_fake
    handle = entry.fake_impl.register(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist

Versions

python collect_env.py
Collecting environment information...
PyTorch version: 2.10.0
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 13 (trixie) (ppc64le)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.41

Python version: 3.12.13 (main, Mar  3 2026, 20:38:43) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-553.36.1.el8_10.ppc64le-ppc64le-with-glibc2.41
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 550.54.15
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False
Caching allocator config: N/A

CPU:
Architecture:                         ppc64le
Byte Order:                           Little Endian
CPU(s):                               160
On-line CPU(s) list:                  0-159
Model name:                           POWER9, altivec supported
Model:                                2.3 (pvr 004e 1203)
Thread(s) per core:                   4
Core(s) per socket:                   20
Socket(s):                            2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   100%
CPU max MHz:                          3800.0000
CPU min MHz:                          2300.0000
L1d cache:                            1.3 MiB (40 instances)
L1i cache:                            1.3 MiB (40 instances)
L2 cache:                             10 MiB (20 instances)
L3 cache:                             200 MiB (20 instances)
NUMA node(s):                         6
NUMA node0 CPU(s):                    0-79
NUMA node8 CPU(s):                    80-159
NUMA node252 CPU(s):                  
NUMA node253 CPU(s):                  
NUMA node254 CPU(s):                  
NUMA node255 CPU(s):                  
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:             Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.4.3
[pip3] torch==2.10.0
[pip3] torchvision==0.25.0
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions