-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
I am using an NVIDIA H20 GPU (Hopper architecture).
When I try to run the evaluation script streamvln_eval_multi_gpu.sh, during the model.generate phase, the program crashes with a Fatal Python error: Floating point exception.
The traceback indicates the crash occurs within a Linear layer forward pass inside the Qwen2 model. I suspect this might be related to bfloat16/fp16 numerical instability or overflow issues specific to the Hopper architecture (sm_90).
So I manually set the model loaded in float32 and cancel the flash_attn(which only support fp16 and bf16), it works, but running very slow. Do you have any good comments?
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Fatal Python error: Floating point exception
Thread 0x00007f89797fa700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 312 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_timeout.py", line 43 in _on_run
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__
Thread 0x00007f8e72bb3700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__
Thread 0x00007f8fe7d01700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__
Thread 0x00007f9af671e700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 325 in _on_run
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Thread 0x00007f9af6f1f700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 279 in _on_run
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Thread 0x00007f9af77a0700 (most recent call first):
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 227 in _read_line
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 245 in _on_run
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Thread 0x00007f9af7fa1700 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
File "/root/miniconda3/envs/streamvln/lib/python3.9/queue.py", line 180 in get
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 390 in _on_run
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
Current thread 0x00007f9af97d14c0 (most recent call first):
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114 in forward
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 430 in forward
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 702 in forward
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 976 in forward
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1167 in forward
File "/data/stream_vln/StreamVLN/streamvln/model/stream_video_vln.py", line 340 in forward
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/generation/utils.py", line 3008 in _sample
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/generation/utils.py", line 2048 in generate
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/data/stream_vln/StreamVLN/streamvln/model/stream_video_vln.py", line 402 in generate
File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 342 in eval_action
File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 563 in evaluate
File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 544 in eval
File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 594 in <module>
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118 in _run_code
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127 in _run_module_code
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310 in run_path
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 358 in run_file
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 508 in main
File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71 in <module>
File "/root/miniconda3/envs/streamvln/lib/python3.9/runpy.py", line 87 in _run_code
File "/root/miniconda3/envs/streamvln/lib/python3.9/runpy.py", line 197 in _run_module_as_mainMetadata
Metadata
Assignees
Labels
No labels