Skip to content

[Error] Floating point exception #70

@ChillTerry

Description

@ChillTerry

I am using an NVIDIA H20 GPU (Hopper architecture).

When I try to run the evaluation script streamvln_eval_multi_gpu.sh, during the model.generate phase, the program crashes with a Fatal Python error: Floating point exception.

The traceback indicates the crash occurs within a Linear layer forward pass inside the Qwen2 model. I suspect this might be related to bfloat16/fp16 numerical instability or overflow issues specific to the Hopper architecture (sm_90).

So I manually set the model loaded in float32 and cancel the flash_attn(which only support fp16 and bf16), it works, but running very slow. Do you have any good comments?

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Fatal Python error: Floating point exception

Thread 0x00007f89797fa700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 312 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_timeout.py", line 43 in _on_run
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f8e72bb3700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f8fe7d01700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/tqdm/_monitor.py", line 60 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 1134 in __call__

Thread 0x00007f9af671e700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 325 in _on_run
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007f9af6f1f700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 581 in wait
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 279 in _on_run
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007f9af77a0700 (most recent call first):
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 227 in _read_line
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 245 in _on_run
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007f9af7fa1700 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 316 in wait
  File "/root/miniconda3/envs/streamvln/lib/python3.9/queue.py", line 180 in get
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 390 in _on_run
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py", line 53 in run
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/root/miniconda3/envs/streamvln/lib/python3.9/threading.py", line 937 in _bootstrap

Current thread 0x00007f9af97d14c0 (most recent call first):
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114 in forward
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 430 in forward
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 702 in forward
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 976 in forward
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1167 in forward
  File "/data/stream_vln/StreamVLN/streamvln/model/stream_video_vln.py", line 340 in forward
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/generation/utils.py", line 3008 in _sample
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/transformers/generation/utils.py", line 2048 in generate
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/data/stream_vln/StreamVLN/streamvln/model/stream_video_vln.py", line 402 in generate
  File "/root/miniconda3/envs/streamvln/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 342 in eval_action
  File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 563 in evaluate
  File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 544 in eval
  File "/data/stream_vln/StreamVLN/streamvln/streamvln_eval.py", line 594 in <module>
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118 in _run_code
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127 in _run_module_code
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310 in run_path
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 358 in run_file
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 508 in main
  File "/root/.cursor-server/extensions/ms-python.debugpy-2025.14.1-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71 in <module>
  File "/root/miniconda3/envs/streamvln/lib/python3.9/runpy.py", line 87 in _run_code
  File "/root/miniconda3/envs/streamvln/lib/python3.9/runpy.py", line 197 in _run_module_as_main

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions