Fix Test hangs in Lambda+LocalServer (#630) #631
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix test hangs caused by Pool cancellation race conditions
Summary
This PR fixes two related race conditions in
Lambda+LocalServer+Pool.swiftthat were causing the test suite to hang approximately 10% of the time.Problem
The test suite exhibited intermittent hangs (~10% frequency) due to two bugs in the Pool implementation:
Individual task cancellation bug: When one task waiting for a specific
requestIdwas cancelled, the cancellation handler would incorrectly cancel ALL waiting tasks instead of just the cancelled one.Server shutdown hang: When the server shut down, waiting continuations in the pools were never cancelled, causing handlers to wait indefinitely for responses that would never arrive.
Root Causes
Root Cause #1: Cancellation Handler Removes ALL Continuations
The
onCancelhandler inPool._next()was removing all continuations from thewaitingForSpecificdictionary when any single task was cancelled:This caused unrelated concurrent invocations to fail with
CancellationErrorwhen one client cancelled their request.Root Cause #2: No Pool Cleanup During Server Shutdown
When the server shut down (e.g., test completes), the task group was cancelled but the pools' waiting continuations were never notified. The
/invokeendpoint handlers would continue waiting for responses that would never arrive because the Lambda function had stopped.Solution
Fix #1: Only Remove Specific Continuation on Cancellation
Modified the cancellation handler to only remove the continuation for the specific cancelled task:
Fix #2: Add Pool Cleanup During Server Shutdown
Added
cancelAll()method to the Pool class and call it during server shutdown:Called during server shutdown:
Changes
Modified Files
Sources/AWSLambdaRuntime/HTTPServer/Lambda+LocalServer+Pool.swift
_next()to only remove specific continuationcancelAll()method for server shutdown cleanupSources/AWSLambdaRuntime/HTTPServer/Lambda+LocalServer.swift
cancelAll()on both pools during server shutdownNew Files
testCancellationOnlyAffectsOwnTask: Verifies only the cancelled task receives CancellationErrortestConcurrentInvocationsWithCancellation: Tests real-world scenario with 5 concurrent invocationstestFIFOModeCancellation: Ensures FIFO mode cancellation works correctlyTesting
Before Fix
CancellationErrorAfter Fix
CancellationErrorTest Coverage
The new test suite reproduces both bugs and verifies the fixes: