pgconn: fix panic on Pipeline.Close when server sends multiple FATAL errors#2501
pgconn: fix panic on Pipeline.Close when server sends multiple FATAL errors#2501veeceey wants to merge 2 commits intojackc:masterfrom
Conversation
…errors When the server sends multiple FATAL ErrorResponse messages that are buffered together (as PgBouncer can do when terminating idle-in-transaction connections), Pipeline.Close() would panic with "close of closed channel". The root cause is that PgConn.receiveMessage() unconditionally calls close(cleanupDone) when OnPgError returns false, without checking if the channel was already closed by a previous error on the same connection. When two FATAL errors are in the chunkReader buffer, processing the first FATAL closes cleanupDone, and processing the second FATAL attempts to close it again, causing the panic. The fix: 1. Guard close(cleanupDone) in receiveMessage() and CopyFrom() with a status check to prevent double-close of the channel. 2. Add a nil-results check in Pipeline.Close() to break out of the result-draining loop when no more results are available but ExpectedReadyForQuery > 0 (preventing a potential infinite loop). Fixes jackc#2470 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| p.conn.asyncClose() | ||
| break | ||
| } | ||
| } else if results == nil { |
There was a problem hiding this comment.
How can this path be executed? I thought that getResults would always either return results or an error. But this would only happen if it returns nil, nil.
If any change is to happen around here I would expect there to be more error cases that trigger asyncClose() and break.
There was a problem hiding this comment.
Good question. getResults() does return (nil, nil) -- it happens at the pipelineNil case (line 2514-2515) when ExtractFrontRequestType() finds the request event queue empty.
Here's the exact sequence in the double-FATAL scenario:
-
Iteration 1 -- pops
Preparefrom queue,getResultsPrepare()reads ParseComplete + ParameterDescription + RowDescription, returns(psd, nil). Loop continues. -
Iteration 2 -- pops
Preparefrom queue,getResultsPrepare()callsreceiveMessage()which reads the first FATAL.OnPgErrorreturns false, connection is closed,cleanupDoneis closed.Pipeline.receiveMessage()returns the error. Back inClose(): it's aPgError, so the loop continues (doesn't break). -
Iteration 3 -- pops
Syncfrom queue,getResultsSync()callsreceiveMessage()which reads the second FATAL. The guard prevents the double-close ofcleanupDone.Pipeline.receiveMessage()returns the error. Back inClose(): it's again aPgError, so the loop continues. -
Iteration 4 -- queue is now empty.
ExtractFrontRequestType()returnspipelineNil.getResults()returns(nil, nil). ButExpectedReadyForQuery()is still 1 becauseReadyForQuerywas never received (the server sent FATALs instead). Without thenilcheck, this spins forever.
The key insight is that FATAL PgErrors don't cause Close() to break (only non-PgError errors do), and FATAL errors consume queued request slots without the server ever sending ReadyForQuery, so expectedReadyForQueryCount is never decremented.
I've updated the comment in the latest push to explain this flow explicitly.
Expand the comment on the results == nil guard in Pipeline.Close() to explain exactly when and why getResults() returns (nil, nil): When the server sends FATAL errors that consume queued request slots (e.g. Prepare, Sync) without ever sending ReadyForQuery, the request event queue is exhausted while expectedReadyForQueryCount remains > 0. ExtractFrontRequestType() returns pipelineNil, getResults() returns (nil, nil), and without this guard the loop would spin indefinitely.
|
After looking at this a little more, I wonder if there might be a more fundamental issue (and solution). The underlying issue is calling Normally, |
Summary
Fixes #2470
When the server sends multiple FATAL
ErrorResponsemessages that are buffered together in a single TCP segment (as PgBouncer can do when terminating idle-in-transaction connections),Pipeline.Close()panics withclose of closed channel.Root cause:
PgConn.receiveMessage()unconditionally callsclose(cleanupDone)whenOnPgErrorreturns false for a FATAL error. When two FATAL errors are both present in thechunkReader's internal buffer (read from the network in a single syscall), processing the first FATAL closescleanupDone, and processing the second FATAL attempts to close it again -- causing the panic.The fix has three parts:
Guard
close(cleanupDone)inreceiveMessage()-- checkpgConn.status != connStatusClosedbefore closing the channel, so that if a previous error path (orasyncClose()) already closed the connection, we don't double-close the channel.Apply the same guard in
CopyFrom()-- the same unguardedclose(cleanupDone)pattern exists in the CopyFrom error handling path (line ~1416).Break out of
Pipeline.Close()loop on nil results -- whengetResults()returns(nil, nil)butExpectedReadyForQuery > 0, the connection is in a bad state (server closed without sendingReadyForQuery). Previously this could spin indefinitely; now it callsasyncClose()and breaks.Test plan
TestPipelineCloseDoesNotPanicOnMultipleFatalErrorswhich uses a mock server that sends all responses in a singleFlush()(single TCP write) to guarantee both FATAL errors are in thechunkReaderbuffer simultaneously.close of closed channelatPgConn.receiveMessage-- matching the exact stack trace from Panic on Tx rollback when connection has been closed by PgBouncer/Postgres (close of closed channel) #2470.TestFatalErrorReceivedInPipelineMode(from the related Panic in (*Pool).SendBatch: close of closed channel #1920 fix) continues to pass.🤖 Generated with Claude Code