Skip to content

Conversation

@sebsto
Copy link
Collaborator

@sebsto sebsto commented Jan 15, 2026

This PR fixes a race condition in LambdaRuntimeClient that causes a fatal crash when an old channel's closeFuture callback fires after a new connection has been established. The fix adds proper channel lifecycle tracking and replaces the fatal error with graceful handling.

Problem

Crash Location: LambdaRuntimeClient.swift:270 in channelClosed()

Error Message:

Fatal error: Invalid state: connected(SocketChannel { ... }), closed

Root Cause: Race condition where:

  1. An old channel's closeFuture callback fires
  2. AFTER a new connection has been established (connectionState = .connected)
  3. BUT closingState is .closed from a previous close operation
  4. The code asserted this was impossible and crashed with fatalError

This can occur when:

  • Network conditions cause delayed channel cleanup
  • Connection is recycled quickly (old channel still closing while new one connects)
  • Timing issues between channel close callbacks and new connection establishment

Solution

Key Changes

  1. Added channel identity tracking:

    private var channelsBeingClosed: Set<ObjectIdentifier> = []

    Tracks which channels are in the process of closing to distinguish old channels from the current one.

  2. Enhanced connectionWillClose():

    • Marks channels as "being closed" using ObjectIdentifier
    • Adds logging when old channels close while new connection is active
  3. Rewrote channelClosed() with defensive logic:

    • Early return for tracked old channels: Handles them gracefully without affecting current connection
    • Replaced fatalError with warning log: The (_, .closed) case now logs a warning instead of crashing
    • Channel identity checks: Only transitions state if the closing channel is the CURRENT channel
    • Removed unconditional state change: Previously set connectionState = .disconnected for ANY channel close, now only for the current channel

Why This Fixes the Bug

The fix addresses the race condition by:

  • Distinguishing between "current channel closing" vs "old channel closing"
  • Handling old channel closes gracefully without crashing or corrupting state
  • Not overwriting connection state when old channels close
  • Providing visibility through logging when the race condition occurs

Changes

Modified Files

  • Sources/AWSLambdaRuntime/HTTPClient/LambdaRuntimeClient.swift
    • Added channelsBeingClosed: Set<ObjectIdentifier> property
    • Enhanced connectionWillClose() with channel tracking
    • Rewrote channelClosed() with defensive logic and identity checks
    • Replaced fatalError with warning log for unexpected states
    • Removed unconditional state change in closeFuture callback

Lines Changed: ~150 lines modified/added

Backward Compatibility: ✅ Fully compatible, no API changes

Testing

✅ All Existing Tests Pass

swift test
# Result: 91 tests passed in 14 suites

All original functionality is preserved with no regressions.

⚠️ Note on Test Coverage

While we cannot reproduce the exact race condition from bug #624 in a deterministic test (it requires specific network timing), the fix:

  • Is logically sound for the described race condition
  • Improves defensive programming around channel lifecycle
  • Replaces a fatal crash with graceful handling + logging
  • Should prevent the crash by properly tracking channel identity

Related Issues

Fixes #624

@sebsto sebsto added the 🔨 semver/patch No public API change. label Jan 15, 2026
@sebsto sebsto self-assigned this Jan 15, 2026
@sebsto
Copy link
Collaborator Author

sebsto commented Jan 15, 2026

@fabianfett I believe this fixes the problem #624 you reported

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

Sources/AWSLambdaRuntime/HTTPClient/LambdaRuntimeClient.swift:152

  • The close() method checks only closingConnections.isEmpty but doesn't check channelsBeingClosed.isEmpty. This could cause the continuation to resume prematurely when there are still channels being closed, leading to incomplete cleanup. The condition should be if self.closingConnections.isEmpty && channelsBeingClosed.isEmpty to be consistent with the other close completion checks.
                if self.closingConnections.isEmpty {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sebsto sebsto requested a review from adam-fowler January 18, 2026 19:50

// Track channels that are in the process of closing to handle race conditions
// where an old channel's closeFuture fires after a new connection is established
private var channelsBeingClosed: Set<ObjectIdentifier> = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between closingConnections and channelsBeingClosed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh, in my view, channelsBeingClosed provides early detection at the top of channelClosed() to identify old channels and handle them gracefully without touching the main state machine. closingConnections ensures we wait for all channels (both current and old) to fully close before completing shutdown.

But you're question makes me think more about this (thank you) and maybe, these can be merged to keep only one source of truth.

If we keep just closingConnections, we will write

// Instead of:
if channelsBeingClosed.contains(channelID) { ... }

// We could do:
if self.closingConnections.contains(where: { $0 === channel }) { ... }

Let me test that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-fowler Thank you for the question. I simplified and now use only closingConnections to keep track of closingc onnections. One single source of truth and a simpler mental model

@sebsto sebsto merged commit 102f92a into main Jan 27, 2026
44 checks passed
@sebsto sebsto deleted the sebsto/fix624 branch January 27, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash in channelClosed

2 participants