-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Intel Compute Runtime Bug Report: Bindless Heaps Crash on Linux Kernel 6.18+
Date: 2026-01-17
Reporter: Dallas Marlow
Status: Workaround found, upstream fix needed
Summary
Intel compute runtime 25.48.36300.8 crashes with UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY or aborts in bindless_heaps_helper.cpp when used with Linux kernel 6.18+ and the xe driver on Intel Lunar Lake GPUs. The crash occurs during PyTorch XPU tensor allocation, preventing GPU-accelerated workloads.
Environment
Hardware
| Component | Details |
|---|---|
| CPU | Intel Core Ultra (Lunar Lake) |
| GPU | Intel Arc Graphics 130V/140V (Lunar Lake integrated) |
| Device ID | 8086:64A0 |
| RAM | 32GB |
Software Versions
| Component | Version |
|---|---|
| OS | Fedora 43 |
| Kernel (broken) | 6.18.5-200.fc43.x86_64 |
| Kernel (working) | 6.17.12-300.fc43.x86_64 |
| Intel Compute Runtime | 25.48.36300.8 |
| GPU Driver | xe (not i915) |
| PyTorch | 2.9.1+xpu |
| Python | 3.13.x |
| Level Zero | (installed via oneapi-level-zero) |
Installed Intel Packages
intel-compute-runtime-25.48.36300.8
intel-level-zero
intel-opencl
oneapi-level-zero
Symptoms
Primary Error: Abort in bindless_heaps_helper.cpp
When starting any application that uses PyTorch XPU (Intel GPU acceleration), the process immediately crashes with:
Abort was called at 70 line in file:
/builddir/build/BUILD/intel-compute-runtime-25.48.36300.8-build/compute-runtime-25.48.36300.8/shared/source/helpers/bindless_heaps_helper.cpp
This error occurs before any user code executes, during the PyTorch/Level Zero initialization phase.
Secondary Error: Out of Device Memory
With the UseBindlessMode=0 workaround applied, a different error surfaces:
RuntimeError: Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)
This occurs when PyTorch attempts to move model tensors to the XPU device via model.to("xpu:0").
Key observation: The GPU shows zero memory usage when this error occurs, indicating the memory allocation is failing at the driver/runtime level before any actual VRAM is consumed.
Kernel Log Context
[drm] Xe DRM-xe kernel driver loaded
xe 0000:00:02.0: [drm] Using HuC firmware from xe/lnl_huc.bin
xe 0000:00:02.0: [drm] Using GuC firmware from xe/lnl_guc_80.bin
xe 0000:00:02.0: [drm] Using GSC firmware from xe/lnl_gsc_1.bin
The system correctly uses the xe driver (not i915) for Lunar Lake.
Reproduction Steps
Minimal Reproduction
# test_xpu.py
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"XPU available: {torch.xpu.is_available()}")
if torch.xpu.is_available():
print(f"Device: {torch.xpu.get_device_name(0)}")
# This line triggers the crash:
tensor = torch.zeros(1).to("xpu:0")
print("Success!")Execution
# On kernel 6.18.5 - crashes immediately
python test_xpu.py
# On kernel 6.17.12 - works correctly
python test_xpu.pyFull Reproduction with Embedding Model
# test_embeddings.py
from sentence_transformers import SentenceTransformer
# Crashes during model.to(device) call
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", device="xpu:0")Investigation Timeline
Step 1: Initial Crash
After system update to kernel 6.18.5, MCP documentation server (using PyTorch XPU for embeddings) failed to start with the bindless_heaps_helper.cpp abort.
Step 2: Driver Analysis
Confirmed the xe driver is in use (expected for Lunar Lake):
$ journalctl -k --no-pager | grep -i "xe\|drm"
xe 0000:00:02.0: [drm] Using HuC firmware from xe/lnl_huc.binStep 3: Workaround Attempt - Disable Bindless Mode
Applied Intel NEO debug environment variables:
export NEOReadDebugKeys=1
export UseBindlessMode=0Result: The bindless_heaps_helper.cpp crash was resolved, but a new error appeared: UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY.
Step 4: Memory Analysis
Checked GPU memory status while error occurred:
$ cat /sys/class/drm/card0/device/power/runtime_status
activeGPU was active but reporting no memory usage when the out-of-memory error occurred. This indicates the failure is in the memory allocation path within the compute runtime, not actual memory exhaustion.
Step 5: Kernel Rollback
Rolled back to kernel 6.17.12-300.fc43.x86_64:
$ sudo grubby --set-default /boot/vmlinuz-6.17.12-300.fc43.x86_64
$ rebootResult: All functionality restored. XPU works correctly without any workarounds.
Root Cause Analysis
Hypothesis
Linux kernel 6.18 introduced changes to the xe driver's bindless heaps handling that are incompatible with Intel compute runtime 25.48.x. Specifically:
- The xe driver's memory management or heap allocation interface changed
- The compute runtime's
bindless_heaps_helper.cppcode assumes behavior that no longer holds - When bindless mode is disabled, a fallback memory allocation path is used, but this path also fails with incorrect error reporting (claims OOM when GPU memory is unused)
Evidence
- Same hardware, same compute runtime - only the kernel changed
- xe driver firmware loading succeeds - driver itself initializes correctly
- Crash in compute runtime, not driver - the abort is in userspace compute-runtime code
- Workaround partially works - disabling bindless mode bypasses the first crash but reveals a second issue
- Clean rollback - kernel 6.17.12 works perfectly with identical userspace stack
Likely Kernel Commits
The issue is likely related to xe driver changes in the 6.18 merge window. Relevant areas:
- xe driver memory management
- Bindless heap allocation
- Level Zero integration
- VRAM/GTT allocation paths
Workaround
Temporary: Kernel Pinning
Pin to kernel 6.17.x until upstream fix is available:
# Set default kernel
sudo grubby --set-default /boot/vmlinuz-6.17.12-300.fc43.x86_64
# Prevent kernel updates
sudo dnf versionlock add kernel kernel-core kernel-modulesPartial: Disable Bindless Mode (NOT RECOMMENDED)
These environment variables bypass the first crash but cause memory allocation failures:
export NEOReadDebugKeys=1
export UseBindlessMode=0Warning: This workaround causes UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY errors and is not a viable solution.
Requested Fix
- Investigate xe driver compatibility with compute runtime 25.48.x on kernel 6.18+
- Fix bindless_heaps_helper.cpp to handle new xe driver behavior
- Fix memory allocation fallback when bindless mode is disabled
- Test with Lunar Lake hardware specifically, as this is an integrated GPU with shared memory
Files to Reference
Crash Location
/builddir/build/BUILD/intel-compute-runtime-25.48.36300.8-build/
compute-runtime-25.48.36300.8/shared/source/helpers/bindless_heaps_helper.cpp
Line: 70
Related Components
- Intel Compute Runtime: https://github.com/intel/compute-runtime
- Level Zero: https://github.com/oneapi-src/level-zero
- xe driver (Linux kernel): drivers/gpu/drm/xe/
System Information Commands
For reproducing or gathering additional information:
# Kernel version
uname -r
# Intel GPU info
lspci | grep -i "intel.*graphics"
# Compute runtime version
rpm -qa | grep intel-compute-runtime
# xe driver in use
journalctl -k | grep -i "xe\|drm" | head -20
# GPU memory (if sysfs available)
cat /sys/class/drm/card0/device/mem_info_vram_used 2>/dev/null
# PyTorch XPU detection
python -c "import torch; print(torch.__version__); print(torch.xpu.is_available())"
# Level Zero devices
# (requires level-zero tools)
ze_info 2>/dev/null || echo "ze_info not available"Contact
- GitHub: [Your GitHub handle]
- Email: [Your email]
Appendix: Full Error Logs
Bindless Heaps Abort (Kernel 6.18.5)
Abort was called at 70 line in file:
/builddir/build/BUILD/intel-compute-runtime-25.48.36300.8-build/compute-runtime-25.48.36300.8/shared/source/helpers/bindless_heaps_helper.cpp
Out of Device Memory (with UseBindlessMode=0)
Traceback (most recent call last):
File "<string>", line 9, in <module>
File ".../vector_store.py", line 272, in __init__
self.embeddings = LocalEmbeddings(model_name=model_name)
File ".../vector_store.py", line 69, in __init__
self.model = SentenceTransformer(...)
File ".../SentenceTransformer.py", line 367, in __init__
self.to(device)
File ".../module.py", line 1371, in to
return self._apply(convert)
...
File ".../module.py", line 1357, in convert
return t.to(device, ...)
RuntimeError: Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)
Successful Run (Kernel 6.17.12)
PyTorch version: 2.9.1+xpu
Has XPU attr: True
XPU available: True
Device count: 1
Device name: Intel(R) Arc(TM) Graphics