Skip to content

An error occurred while converting the Huggingface to a form that can be loaded by Megatron. #1301

@SmileShaun

Description

@SmileShaun

Because my system doesn't support Docker, I had to manually configure the conda environment. I'm not sure if the error is due to the environment. The error message is as follows:

-------------------- end of arguments ---------------------
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[2025-12-31 18:04:10] initialize.py:70 - > setting random seeds to 1234 ...
building HuggingFaceTokenizer tokenizer ...
[2025-12-31 18:04:10] megatron_tokenizer.py:27 - You’re using the legacy tokenizer system, which is deprecated and will be removed in a future release. Please migrate to the new tokenizer system (megatron.core.tokenizers.MegatronTokenizer).
[2025-12-31 18:04:10] num_microbatches_calculator.py:228 - setting number of microbatches to constant 1
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/zzli/zxz/slime/tools/convert_hf_to_torch_dist.py", line 137, in
[rank0]: main()
[rank0]: File "/home/zzli/zxz/slime/tools/convert_hf_to_torch_dist.py", line 109, in main
[rank0]: model = get_model(get_model_provider_func(args), ModelType.encoder_or_decoder, wrap_with_ddp=False)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/zzli/zxz/slime/Megatron-LM/megatron/training/training.py", line 967, in get_model
[rank0]: model = build_model()
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/zzli/zxz/slime/Megatron-LM/megatron/training/training.py", line 954, in build_model
[rank0]: model = model_provider_func(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: TypeError: get_model_provider_func..model_provider() got an unexpected keyword argument 'config'
[rank0]:[W1231 18:04:10.131537144 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())`

My conda environment is

absl-py 2.3.1
accelerate 1.12.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiohttp-cors 0.8.1
aiosignal 1.4.0
airportsdata 20250909
annotated-doc 0.0.4
annotated-types 0.7.0
anthropic 0.75.0
antlr4-python3-runtime 4.9.3
anyio 4.12.0
apache-tvm-ffi 0.1.7
asttokens 3.0.1
attrs 25.4.0
av 16.0.1
blobfile 3.0.0
build 1.3.0
cachetools 6.2.4
certifi 2025.11.12
cffi 2.0.0
charset-normalizer 3.4.4
click 8.3.1
cloudpickle 3.1.2
cmake 4.2.1
colorful 0.5.8
compressed-tensors 0.13.0
cryptography 46.0.3
cuda-bindings 13.1.1
cuda-pathfinder 1.3.3
cuda-python 13.1.1
datasets 4.4.2
decorator 5.2.1
decord2 3.0.0
dill 0.4.0
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
docstring_parser 0.17.0
einops 0.8.1
executing 2.2.1
fastapi 0.128.0
filelock 3.20.0
flash_attn 2.8.1
flashinfer-cubin 0.5.3
flashinfer-python 0.5.3
frozenlist 1.8.0
fsspec 2025.10.0
gguf 0.17.1
gitdb 4.0.12
GitPython 3.1.45
google-api-core 2.28.1
google-auth 2.45.0
googleapis-common-protos 1.72.0
grpcio 1.75.1
grpcio-health-checking 1.75.1
grpcio-reflection 1.75.1
grpcio-tools 1.75.1
h11 0.16.0
h2 4.3.0
hf_transfer 0.1.9
hf-xet 1.2.0
hpack 4.1.0
httpcore 1.0.9
httpx 0.28.1
httpx-sse 0.4.3
huggingface-hub 0.36.0
hyperframe 6.1.0
idna 3.11
importlib_metadata 8.7.1
interegular 0.3.3
ipython 9.8.0
ipython_pygments_lexers 1.1.1
jedi 0.19.2
Jinja2 3.1.6
jiter 0.12.0
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lark 1.3.1
linkify-it-py 2.0.3
llguidance 0.7.30
loguru 0.7.3
lxml 6.0.2
Markdown 3.10
markdown-it-py 4.0.0
MarkupSafe 2.1.5
matplotlib-inline 0.2.1
mbridge 0.15.1
mcp 1.25.0
mdit-py-plugins 0.5.0
mdurl 0.1.2
memray 1.19.1
ml_dtypes 0.5.4
modelscope 1.33.0
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.20.0
multidict 6.7.0
multiprocess 0.70.18
nest-asyncio 1.6.0
networkx 3.6.1
ninja 1.13.0
numpy 1.26.4
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.17.0
nvidia-cufft-cu12 11.4.1.4
nvidia-cufile-cu12 1.14.1.1
nvidia-curand-cu12 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.2.1
nvidia-ml-py 13.590.44
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvshmem-cu12 3.3.20
nvidia-nvtx-cu12 12.9.79
omegaconf 2.3.0
onnx 1.20.0
onnx-ir 0.1.13
onnxscript 0.5.7
openai 2.6.1
openai-harmony 0.0.4
opencensus 0.11.4
opencensus-context 0.1.3
opentelemetry-api 1.39.1
opentelemetry-exporter-prometheus 0.60b1
opentelemetry-proto 1.39.1
opentelemetry-sdk 1.39.1
opentelemetry-semantic-conventions 0.60b1
orjson 3.11.5
outlines 0.1.11
outlines_core 0.1.26
packaging 25.0
pandas 2.3.3
parso 0.8.5
partial-json-parser 0.2.1.1.post7
pexpect 4.9.0
pillow 12.0.0
pip 25.3
platformdirs 4.5.1
prometheus_client 0.23.1
prompt_toolkit 3.0.52
propcache 0.4.1
proto-plus 1.27.0
protobuf 6.33.2
psutil 7.2.1
ptyprocess 0.7.0
pure_eval 0.2.3
py-spy 0.4.1
pyarrow 22.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.2
pybase64 1.4.3
pycountry 24.6.1
pycparser 2.23
pycryptodomex 3.23.0
pydantic 2.12.5
pydantic_core 2.41.5
pydantic-settings 2.12.0
Pygments 2.19.2
PyJWT 2.10.1
pylatexenc 2.10
pyproject_hooks 1.2.0
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-multipart 0.0.21
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
qwen-vl-utils 0.0.14
ray 2.53.0
referencing 0.37.0
regex 2025.11.3
requests 2.32.5
rich 14.2.0
ring-flash-attn 0.1.8
rpds-py 0.30.0
rsa 4.9.1
safetensors 0.7.0
scipy 1.16.3
sentencepiece 0.2.1
sentry-sdk 2.48.0
setproctitle 1.3.7
setuptools 80.9.0
sgl-kernel 0.3.19
sglang 0.5.6.post2
sglang-router 0.3.0
shellingham 1.5.4
six 1.17.0
slime 0.2.1
smart_open 7.5.0
smmap 5.0.2
sniffio 1.3.1
soundfile 0.13.1
sse-starlette 3.1.2
stack-data 0.6.3
starlette 0.50.0
sympy 1.14.0
tabulate 0.9.0
tensorboard 2.20.0
tensorboard-data-server 0.7.2
textual 6.11.0
tiktoken 0.12.0
timm 1.0.16
tokenizers 0.22.1
torch 2.9.1+cu129
torch_memory_saver 0.0.9
torchao 0.9.0
torchaudio 2.9.1+cu129
torchcodec 0.8.0
torchvision 0.24.1+cu129
tqdm 4.67.1
traitlets 5.14.3
transformer_engine 2.10.0
transformer_engine_cu12 2.10.0
transformer_engine_torch 2.10.0
transformers 4.57.1
triton 3.5.1
typer 0.21.0
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.3
uc-micro-py 1.0.3
urllib3 2.6.2
uvicorn 0.40.0
uvloop 0.22.1
virtualenv 20.35.4
wandb 0.23.1
wcwidth 0.2.14
Werkzeug 3.1.4
wheel 0.45.1
wrapt 2.0.1
xgrammar 0.1.27
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions