-
Notifications
You must be signed in to change notification settings - Fork 395
Description
Because my system doesn't support Docker, I had to manually configure the conda environment. I'm not sure if the error is due to the environment. The error message is as follows:
-------------------- end of arguments ---------------------
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2025-12-31 18:04:10] initialize.py:70 - > setting random seeds to 1234 ...
building HuggingFaceTokenizer tokenizer ...
[2025-12-31 18:04:10] megatron_tokenizer.py:27 - You’re using the legacy tokenizer system, which is deprecated and will be removed in a future release. Please migrate to the new tokenizer system (megatron.core.tokenizers.MegatronTokenizer).
[2025-12-31 18:04:10] num_microbatches_calculator.py:228 - setting number of microbatches to constant 1
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/zzli/zxz/slime/tools/convert_hf_to_torch_dist.py", line 137, in
[rank0]: main()
[rank0]: File "/home/zzli/zxz/slime/tools/convert_hf_to_torch_dist.py", line 109, in main
[rank0]: model = get_model(get_model_provider_func(args), ModelType.encoder_or_decoder, wrap_with_ddp=False)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/zzli/zxz/slime/Megatron-LM/megatron/training/training.py", line 967, in get_model
[rank0]: model = build_model()
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/zzli/zxz/slime/Megatron-LM/megatron/training/training.py", line 954, in build_model
[rank0]: model = model_provider_func(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: get_model_provider_func..model_provider() got an unexpected keyword argument 'config'
[rank0]:[W1231 18:04:10.131537144 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())`
My conda environment is
absl-py 2.3.1
accelerate 1.12.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiohttp-cors 0.8.1
aiosignal 1.4.0
airportsdata 20250909
annotated-doc 0.0.4
annotated-types 0.7.0
anthropic 0.75.0
antlr4-python3-runtime 4.9.3
anyio 4.12.0
apache-tvm-ffi 0.1.7
asttokens 3.0.1
attrs 25.4.0
av 16.0.1
blobfile 3.0.0
build 1.3.0
cachetools 6.2.4
certifi 2025.11.12
cffi 2.0.0
charset-normalizer 3.4.4
click 8.3.1
cloudpickle 3.1.2
cmake 4.2.1
colorful 0.5.8
compressed-tensors 0.13.0
cryptography 46.0.3
cuda-bindings 13.1.1
cuda-pathfinder 1.3.3
cuda-python 13.1.1
datasets 4.4.2
decorator 5.2.1
decord2 3.0.0
dill 0.4.0
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
docstring_parser 0.17.0
einops 0.8.1
executing 2.2.1
fastapi 0.128.0
filelock 3.20.0
flash_attn 2.8.1
flashinfer-cubin 0.5.3
flashinfer-python 0.5.3
frozenlist 1.8.0
fsspec 2025.10.0
gguf 0.17.1
gitdb 4.0.12
GitPython 3.1.45
google-api-core 2.28.1
google-auth 2.45.0
googleapis-common-protos 1.72.0
grpcio 1.75.1
grpcio-health-checking 1.75.1
grpcio-reflection 1.75.1
grpcio-tools 1.75.1
h11 0.16.0
h2 4.3.0
hf_transfer 0.1.9
hf-xet 1.2.0
hpack 4.1.0
httpcore 1.0.9
httpx 0.28.1
httpx-sse 0.4.3
huggingface-hub 0.36.0
hyperframe 6.1.0
idna 3.11
importlib_metadata 8.7.1
interegular 0.3.3
ipython 9.8.0
ipython_pygments_lexers 1.1.1
jedi 0.19.2
Jinja2 3.1.6
jiter 0.12.0
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lark 1.3.1
linkify-it-py 2.0.3
llguidance 0.7.30
loguru 0.7.3
lxml 6.0.2
Markdown 3.10
markdown-it-py 4.0.0
MarkupSafe 2.1.5
matplotlib-inline 0.2.1
mbridge 0.15.1
mcp 1.25.0
mdit-py-plugins 0.5.0
mdurl 0.1.2
memray 1.19.1
ml_dtypes 0.5.4
modelscope 1.33.0
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.20.0
multidict 6.7.0
multiprocess 0.70.18
nest-asyncio 1.6.0
networkx 3.6.1
ninja 1.13.0
numpy 1.26.4
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.17.0
nvidia-cufft-cu12 11.4.1.4
nvidia-cufile-cu12 1.14.1.1
nvidia-curand-cu12 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.2.1
nvidia-ml-py 13.590.44
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvshmem-cu12 3.3.20
nvidia-nvtx-cu12 12.9.79
omegaconf 2.3.0
onnx 1.20.0
onnx-ir 0.1.13
onnxscript 0.5.7
openai 2.6.1
openai-harmony 0.0.4
opencensus 0.11.4
opencensus-context 0.1.3
opentelemetry-api 1.39.1
opentelemetry-exporter-prometheus 0.60b1
opentelemetry-proto 1.39.1
opentelemetry-sdk 1.39.1
opentelemetry-semantic-conventions 0.60b1
orjson 3.11.5
outlines 0.1.11
outlines_core 0.1.26
packaging 25.0
pandas 2.3.3
parso 0.8.5
partial-json-parser 0.2.1.1.post7
pexpect 4.9.0
pillow 12.0.0
pip 25.3
platformdirs 4.5.1
prometheus_client 0.23.1
prompt_toolkit 3.0.52
propcache 0.4.1
proto-plus 1.27.0
protobuf 6.33.2
psutil 7.2.1
ptyprocess 0.7.0
pure_eval 0.2.3
py-spy 0.4.1
pyarrow 22.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.2
pybase64 1.4.3
pycountry 24.6.1
pycparser 2.23
pycryptodomex 3.23.0
pydantic 2.12.5
pydantic_core 2.41.5
pydantic-settings 2.12.0
Pygments 2.19.2
PyJWT 2.10.1
pylatexenc 2.10
pyproject_hooks 1.2.0
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-multipart 0.0.21
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
qwen-vl-utils 0.0.14
ray 2.53.0
referencing 0.37.0
regex 2025.11.3
requests 2.32.5
rich 14.2.0
ring-flash-attn 0.1.8
rpds-py 0.30.0
rsa 4.9.1
safetensors 0.7.0
scipy 1.16.3
sentencepiece 0.2.1
sentry-sdk 2.48.0
setproctitle 1.3.7
setuptools 80.9.0
sgl-kernel 0.3.19
sglang 0.5.6.post2
sglang-router 0.3.0
shellingham 1.5.4
six 1.17.0
slime 0.2.1
smart_open 7.5.0
smmap 5.0.2
sniffio 1.3.1
soundfile 0.13.1
sse-starlette 3.1.2
stack-data 0.6.3
starlette 0.50.0
sympy 1.14.0
tabulate 0.9.0
tensorboard 2.20.0
tensorboard-data-server 0.7.2
textual 6.11.0
tiktoken 0.12.0
timm 1.0.16
tokenizers 0.22.1
torch 2.9.1+cu129
torch_memory_saver 0.0.9
torchao 0.9.0
torchaudio 2.9.1+cu129
torchcodec 0.8.0
torchvision 0.24.1+cu129
tqdm 4.67.1
traitlets 5.14.3
transformer_engine 2.10.0
transformer_engine_cu12 2.10.0
transformer_engine_torch 2.10.0
transformers 4.57.1
triton 3.5.1
typer 0.21.0
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.3
uc-micro-py 1.0.3
urllib3 2.6.2
uvicorn 0.40.0
uvloop 0.22.1
virtualenv 20.35.4
wandb 0.23.1
wcwidth 0.2.14
Werkzeug 3.1.4
wheel 0.45.1
wrapt 2.0.1
xgrammar 0.1.27
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0