Skip to content

Unable to run tests #19

@IgorZhiltsoff

Description

@IgorZhiltsoff

I tried running

./tools/dist_test_map.sh ./projects/configs/hrmapnet/hrmapnet_maptrv2_nusc_r50_110ep.py ./ckpts/hrmapnet_maptrv2_nuscenes_ep110.pth 1

(the checkpoint is downloaded from the link in repo's README)

and got an error

Traceback (most recent call last):
  File "./tools/test.py", line 264, in <module>
    main()
  File "./tools/test.py", line 229, in main
    model = MMDistributedDataParallel(
  File "/root/miniconda3/envs/smth/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 496, in __init__
    dist._verify_model_across_ranks(self.process_group, parameters)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.

Any ideas how to fix this?


My package versions are not exactly the ones you specified in the installation guide; namely, I

  1. Downgraded av2 to minimum,
  2. Downgared numpy to 1.23.0,
  3. Installed gcc-multilib,
  4. Upgraded gcc to 7 (https://anaconda.org/gouarin/gcc-7),
  5. Upgraded networkx to 3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions