Skip to content

Conversation

@ThinkerDreamer
Copy link
Member

Summary

This PR fixes a race condition in the background receive task that caused measure_many to hang indefinitely when using tokio's current_thread runtime.

Closes #1

Problem

The original implementation had a race condition:

  1. The background task would process pending subscriptions via try_recv()
  2. Then block on socket.recv().await
  3. If a subscription arrived while blocked on socket recv, it wouldn't be processed
  4. The ICMP reply would arrive and be dropped because the subscriber wasn't registered yet

With multi_thread runtime this worked by coincidence (concurrent execution), but with current_thread it caused a deadlock.

Solution

Replaced the sequential polling approach with a unified poll_fn that polls both the subscription channel and socket receive in the same waker context. This ensures the task wakes on either event, which is required for single-threaded runtimes.

Key changes:

  • Use poll_fn to combine subscription and socket polling into one future
  • Keep try_recv() as a fast-path optimization (~2x faster when messages are queued)
  • Set socket to non-blocking mode to enable proper polling
  • Maintain buffer outside poll_fn to persist across polls

Changes

  • src/pinger.rs - Refactored background receive loop to use unified polling
  • src/socket/base.rs - Set socket to non-blocking
  • tests/runtime_compatibility.rs - Added regression tests for both runtime flavors

Testing

This was previously fixed in
bf98c06
but accidentally reverted during a refactor.
@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch from f4c70c9 to 6553498 Compare January 20, 2026 17:57
@ThinkerDreamer
Copy link
Member Author

Apparently the tests I added don't run on CI.

@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch from d070f46 to 08c4ac1 Compare January 20, 2026 18:15
@ThinkerDreamer
Copy link
Member Author

I also included a fix for hardcoded stable Rust branch in the cargo test CI.

@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch 2 times, most recently from 3b2e491 to bc700c2 Compare January 21, 2026 13:48
@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch 2 times, most recently from 41ff7f9 to 731d430 Compare January 21, 2026 18:21
@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch from 731d430 to 4f0a6d1 Compare January 21, 2026 18:25
The background receive task previously had a sequential flow:
1. Process pending subscriptions via try_recv() loop
2. Block on socket recv().await

In a single-threaded runtime, if a subscription message arrived while
the task was blocked on socket recv(), it would never be processed.
When the ICMP reply arrived, it was discarded because the subscriber
wasn't registered yet.

This fix uses poll_fn to poll both the subscription channel and socket
within the same waker context, ensuring we wake on either event. This
is required for single-threaded runtimes where we can't rely on
concurrent execution of the background task.

Closes M4SS-Code#1
Add integration tests to prevent regression of issue M4SS-Code#1, where
measure_many would hang with single-threaded Tokio runtimes.

Tests:
- ping_localhost_current_thread: Core regression test for issue M4SS-Code#1
- ping_localhost_multi_thread: Baseline test for comparison
- ping_sequential_current_thread: Multiple sequential ping operations

Also adds rt-multi-thread to dev-dependencies for the baseline test.

Note: These tests require ICMP socket access, which depends on the
net.ipv4.ping_group_range sysctl setting. Tests gracefully skip if
ICMP sockets are unavailable, allowing CI to pass while still
providing coverage on systems that support rootless ping.
The test job was hardcoded to use @stable instead of respecting
the matrix-defined Rust versions (stable, beta, nightly, 1.85).
…e on invalid data

Invalid data including truncated packets, wrong ICMP type, and destination unreachable.
Also includes a test asserting a valid echo reply is correctly parsed.
@ThinkerDreamer ThinkerDreamer force-pushed the fix/current-thread-runtime-hang branch from 4f0a6d1 to 9e781bd Compare January 22, 2026 14:27
@paolobarbolini paolobarbolini merged commit 13698ba into M4SS-Code:main Jan 23, 2026
6 checks passed
@ThinkerDreamer ThinkerDreamer deleted the fix/current-thread-runtime-hang branch January 23, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

measure_many hangs with current_thread tokio runtime

3 participants