Skip to content

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' #208

@paklui

Description

@paklui

Describe the Bug

A clear and concise description of what the bug is.

Using the latest git rev 656b66c, I run into the TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' error.
Earlier git rev a4dbc72, I was able to get PARAM comms running.
I suspect the version of Python used in my CentOS Stream 9 could be related, as there are changes in certain python syntax in newer python, or differences between Python 3.9 vs Python 3.10+.

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ python --version
Python 3.9.21

Steps to Reproduce

Steps to reproduce the behavior.
Please include the version information where the bug was observed.

steps:

cd param-656b66c/
cd train/compute/python/
pip install .
cd ../../comms/pt/
pip install .

To run:

ROCM_PATH=${ROCM_PATH:-/opt/rocm}
NFS_PATH=/share2/amd-share
OMPI_INSTALL_DIR=${NFS_PATH}/ompi4-install
RCCL_INSTALL_DIR=${NFS_PATH}/rccl_develop/build/release
RCCL_TESTS_INSTALL_DIR=${NFS_PATH}/rccl-tests/build
export PATH=${OMPI_INSTALL_DIR}/bin:$PATH
export LD_LIBRARY_PATH=${RCCL_INSTALL_DIR}:${OMPI_INSTALL_DIR}/lib:$LD_LIBRARY_PATH
source /share2/PARAMcomms/venv-param/bin/activate

To run:

mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=INFO -x PYTHONPATH=/usr/bin/python3 -host hostname-1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl ^openib -x PATH=${PATH} -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} -x NCCL_IB_GID_INDEX=3 -x RCCL_ENABLE_INTRANET=1 -x NCCL_IB_HCA=bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --master-ip hostname-1e707-b05-2 -b 1 -e 1G -n 10 -f 2 -z 0 --collective all_reduce --data-type float32 

python version:

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ python --version
Python 3.9.21

pip version:

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ pip list
Package                  Version
------------------------ ----------------------------
apex                     1.6.0+rocm6.5.0.git004991b6
fbgemm_gpu               1.2.0
filelock                 3.18.0
fsspec                   2025.3.2
future                   1.0.0
gitdb                    4.0.12
GitPython                3.1.44
Jinja2                   3.1.6
MarkupSafe               3.0.2
mpmath                   1.3.0
networkx                 3.2.1
numpy                    2.0.2
parambench-train-comms   0.0.0
parambench-train-compute 1.0.0+git.1747955991
pillow                   11.2.1
pip                      25.1.1
pydot                    4.0.0
pyparsing                3.2.3
pytorch-triton-rocm      3.2.0+rocm6.5.0.git6da9e660
scipy                    1.13.1
setuptools               53.0.0
smmap                    5.0.2
sympy                    1.13.1
torch                    2.6.0+rocm6.5.0.gitcf65c6f2
torchaudio               2.6.0+rocm6.5.0.gitd8831425
torchvision              0.21.0+rocm6.5.0.git7af69879
typing_extensions        4.13.2

Expected Behavior

A clear and concise description of what you expected to happen.

Expect to run. If I use the older version, such as a4dbc72, I was able to run.


+ mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=VERSION -x PYTHONPATH=/usr/bin/python3 -host hostname-1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl '^openib' -x PATH=/share2/PARAMcomms/venv-param/bin:/share2/amd-share/ompi4-install/bin:/share2/PARAMcomms/venv-param/bin:/home/amd/.local/bin:/home/amd/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin -x LD_LIBRARY_PATH=/share2/amd-share/rccl_develop/build/release:/share2/amd-share/ompi4-install/lib: -x NCCL_IB_GID_INDEX=3 -x RCCL_ENABLE_INTRANET=1 -x NCCL_IB_HCA=bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --master-ip hostname-1e707-b05-2 --b 4G --e 4G --n 100 --f 2 --z 0 --collective all_reduce --data-type float32
         PARAM COMM environment: {'world_size': 8, 'local_size': 8, 'global_rank': 0, 'local_rank': 0}
         backend: nccl nw-stack: pytorch-dist args.data_types: ['float32'] args.b: 4G args.e: 4G args.f: 2 args.z: 0 args.master_ip: hostname-1e707-b05-2
Hello from Rank 0: [Rank   0] host hostname-1e707-b05-2, device: cuda:0, local_rank: 0 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 1: [Rank   1] host hostname-1e707-b05-2, device: cuda:1, local_rank: 1 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 2: [Rank   2] host hostname-1e707-b05-2, device: cuda:2, local_rank: 2 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 3: [Rank   3] host hostname-1e707-b05-2, device: cuda:3, local_rank: 3 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 4: [Rank   4] host hostname-1e707-b05-2, device: cuda:4, local_rank: 4 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 5: [Rank   5] host hostname-1e707-b05-2, device: cuda:5, local_rank: 5 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 6: [Rank   6] host hostname-1e707-b05-2, device: cuda:6, local_rank: 6 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 7: [Rank   7] host hostname-1e707-b05-2, device: cuda:7, local_rank: 7 world_size: 8, master_ip: hostname-1e707-b05-2
RCCL version : 2.24.3-HEAD:2c0eecf
HIP version  : 6.5.50421-a90f5536a
ROCm version : 6.5.0.0-990-de37842
Hostname     : hostname-1e707-b05-2
Librccl path : /share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/torch/lib/librccl.so
[Rank   0] allSizes: [4294967296] element_size: 4 local_rank: 0, num_pg 1, groupSize 8
         collective=all_reduce, src_ranks=None, dst_ranks=None

        COMMS-RES                          total-size (B)  nElementsPerRank  nElementsPairPerRank   Latency(us):p50         p75         p95         Min         Max    AlgBW(GB/s) BusBW(GB/s)
        COMMS-RES-all_reduce-float32        4294967296        1073741824           ...

Screenshots

If applicable, add screenshots to help explain your problem.

+ mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=INFO -x PYTHONPATH=/usr/bin/python3 -host 1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl '^openib' -x PATH=/share2/PARAMcomms/venv-param/bin:/share2/amd-share/ompi4-install/bin:/share2/PARAMcomms/venv-param/bin:/home/amd/.local/bin:/home/amd/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin -x LD_LIBRARY_PATH=/share2/amd-share/rccl_develop/build/release:/share2/amd-share/ompi4-install/lib: -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --backend nccl --master-ip hostname-1e707-b05-2 -b 1 -e 1G -n 10 -f 2 -z 0 --collective all_reduce --data-type float32
CollectiveArgsMixin does not exist or module not found. Default to empty class.
Traceback (most recent call last):
  File "/share2/PARAMcomms/param/train/comms/pt/comms.py", line 19, in <module>
    from param_bench.train.comms.pt import comms_utils
  File "/share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/param_bench/train/comms/pt/comms_utils.py", line 25, in <module>
    from param_bench.train.comms.pt.pytorch_backend_utils import (
  File "/share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/param_bench/train/comms/pt/pytorch_backend_utils.py", line 392, in <module>
    device: str | None = None,
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[503,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions