Skip to content

Fix GC stack scanning bugs: memcpy direction in gc_save_context and volatile stopping_world#909

Closed
Copilot wants to merge 4 commits intomasterfrom
copilot/check-hxcoro-hl-segfaults
Closed

Fix GC stack scanning bugs: memcpy direction in gc_save_context and volatile stopping_world#909
Copilot wants to merge 4 commits intomasterfrom
copilot/check-hxcoro-hl-segfaults

Conversation

Copy link
Contributor

Copilot AI commented Mar 12, 2026

Intermittent segfaults in multithreaded workloads (reproduced via hxcoro tests, ~40% failure rate on multi-core, 0% on single core). Crash is always in hl_to_virtual accessing a GC-collected object on a worker thread.

Changes

  • src/gc.c — Fix gc_save_context memcpy source address: The memcpy copied from prev_stack (caller's variable address) upward, capturing data above the gap that's already covered by the main gc_mark_stack(stack_cur, stack_top) scan. The actual callee-saved register spill area lives between stack_cur and prev_stack — that's what needs preserving in extra_stack_data before it gets overwritten by subsequent calls (e.g. hl_mutex_acquire). Also added a size < 0 guard.
// Before: copies wrong region (above the gap, redundant with main stack scan)
memcpy(t->extra_stack_data, prev_stack, size*sizeof(void*));

// After: copies the actual gap (callee-saved register area)
memcpy(t->extra_stack_data, stack_cur, size*sizeof(void*));
  • src/hl.h — Make stopping_world volatile: This flag is written by the GC thread and read by worker threads in hl_blocking(false) and hl_gc_safepoint(). Without volatile, the compiler is free to cache the value.

Known remaining issue

There is a deeper race in hl_blocking(false): a thread can transition gc_blocking from 1→0 after gc_stop_world has already passed it in its sequential check. In that window the thread runs JIT code with GC roots only in scratch registers, invisible to the stale gc_regs/stack_cur. The proposed fix is to spin-wait in hl_blocking(false) while stopping_world is true before allowing gc_blocking to reach 0, but this is not yet implemented in this PR.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 3 commits March 12, 2026 19:13
Co-authored-by: Simn <634365+Simn@users.noreply.github.com>
Co-authored-by: Simn <634365+Simn@users.noreply.github.com>
Co-authored-by: Simn <634365+Simn@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate HL segfaults in hxcoro tests Fix GC stack scanning bugs: memcpy direction in gc_save_context and volatile stopping_world Mar 12, 2026
Copilot AI requested a review from Simn March 12, 2026 19:49
Copilot finished work on behalf of Simn March 12, 2026 19:49
@Simn
Copy link
Member

Simn commented Mar 12, 2026

I have pushed the volatile stopping_world change to master because that already came up in #899 and was greenlighted there.

Comment on lines +242 to +245
if( size < 0 ) size = 0;
if( size > HL_MAX_EXTRA_STACK ) hl_fatal("GC_SAVE_CONTEXT");
t->extra_stack_size = size;
memcpy(t->extra_stack_data, prev_stack, size*sizeof(void*));
memcpy(t->extra_stack_data, stack_cur, size*sizeof(void*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncannasse Please check this part!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll stop with the AI stuff for HL because it doesn't appear to actually help here. Thanks for checking!

@Simn Simn closed this Mar 14, 2026
@Simn Simn deleted the copilot/check-hxcoro-hl-segfaults branch March 14, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants