Skip to content

pgconn: fix panic on Pipeline.Close when server sends multiple FATAL errors#2501

Open
veeceey wants to merge 2 commits intojackc:masterfrom
veeceey:fix-pipeline-close-panic
Open

pgconn: fix panic on Pipeline.Close when server sends multiple FATAL errors#2501
veeceey wants to merge 2 commits intojackc:masterfrom
veeceey:fix-pipeline-close-panic

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 10, 2026

Summary

Fixes #2470

When the server sends multiple FATAL ErrorResponse messages that are buffered together in a single TCP segment (as PgBouncer can do when terminating idle-in-transaction connections), Pipeline.Close() panics with close of closed channel.

Root cause: PgConn.receiveMessage() unconditionally calls close(cleanupDone) when OnPgError returns false for a FATAL error. When two FATAL errors are both present in the chunkReader's internal buffer (read from the network in a single syscall), processing the first FATAL closes cleanupDone, and processing the second FATAL attempts to close it again -- causing the panic.

The fix has three parts:

  1. Guard close(cleanupDone) in receiveMessage() -- check pgConn.status != connStatusClosed before closing the channel, so that if a previous error path (or asyncClose()) already closed the connection, we don't double-close the channel.

  2. Apply the same guard in CopyFrom() -- the same unguarded close(cleanupDone) pattern exists in the CopyFrom error handling path (line ~1416).

  3. Break out of Pipeline.Close() loop on nil results -- when getResults() returns (nil, nil) but ExpectedReadyForQuery > 0, the connection is in a bad state (server closed without sending ReadyForQuery). Previously this could spin indefinitely; now it calls asyncClose() and breaks.

Test plan

--- PASS: TestFatalErrorReceivedAfterCommandComplete (0.00s)
--- PASS: TestFatalErrorReceivedInPipelineMode (0.00s)
--- PASS: TestPipelineCloseDoesNotPanicOnMultipleFatalErrors (0.00s)

🤖 Generated with Claude Code

…errors

When the server sends multiple FATAL ErrorResponse messages that are
buffered together (as PgBouncer can do when terminating idle-in-transaction
connections), Pipeline.Close() would panic with "close of closed channel".

The root cause is that PgConn.receiveMessage() unconditionally calls
close(cleanupDone) when OnPgError returns false, without checking if the
channel was already closed by a previous error on the same connection.
When two FATAL errors are in the chunkReader buffer, processing the first
FATAL closes cleanupDone, and processing the second FATAL attempts to
close it again, causing the panic.

The fix:
1. Guard close(cleanupDone) in receiveMessage() and CopyFrom() with a
   status check to prevent double-close of the channel.
2. Add a nil-results check in Pipeline.Close() to break out of the
   result-draining loop when no more results are available but
   ExpectedReadyForQuery > 0 (preventing a potential infinite loop).

Fixes jackc#2470

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
p.conn.asyncClose()
break
}
} else if results == nil {
Copy link
Owner

@jackc jackc Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this path be executed? I thought that getResults would always either return results or an error. But this would only happen if it returns nil, nil.

If any change is to happen around here I would expect there to be more error cases that trigger asyncClose() and break.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. getResults() does return (nil, nil) -- it happens at the pipelineNil case (line 2514-2515) when ExtractFrontRequestType() finds the request event queue empty.

Here's the exact sequence in the double-FATAL scenario:

  1. Iteration 1 -- pops Prepare from queue, getResultsPrepare() reads ParseComplete + ParameterDescription + RowDescription, returns (psd, nil). Loop continues.

  2. Iteration 2 -- pops Prepare from queue, getResultsPrepare() calls receiveMessage() which reads the first FATAL. OnPgError returns false, connection is closed, cleanupDone is closed. Pipeline.receiveMessage() returns the error. Back in Close(): it's a PgError, so the loop continues (doesn't break).

  3. Iteration 3 -- pops Sync from queue, getResultsSync() calls receiveMessage() which reads the second FATAL. The guard prevents the double-close of cleanupDone. Pipeline.receiveMessage() returns the error. Back in Close(): it's again a PgError, so the loop continues.

  4. Iteration 4 -- queue is now empty. ExtractFrontRequestType() returns pipelineNil. getResults() returns (nil, nil). But ExpectedReadyForQuery() is still 1 because ReadyForQuery was never received (the server sent FATALs instead). Without the nil check, this spins forever.

The key insight is that FATAL PgErrors don't cause Close() to break (only non-PgError errors do), and FATAL errors consume queued request slots without the server ever sending ReadyForQuery, so expectedReadyForQueryCount is never decremented.

I've updated the comment in the latest push to explain this flow explicitly.

Expand the comment on the results == nil guard in Pipeline.Close() to
explain exactly when and why getResults() returns (nil, nil):

When the server sends FATAL errors that consume queued request slots
(e.g. Prepare, Sync) without ever sending ReadyForQuery, the request
event queue is exhausted while expectedReadyForQueryCount remains > 0.
ExtractFrontRequestType() returns pipelineNil, getResults() returns
(nil, nil), and without this guard the loop would spin indefinitely.
@jackc
Copy link
Owner

jackc commented Feb 11, 2026

After looking at this a little more, I wonder if there might be a more fundamental issue (and solution). The underlying issue is calling PgConn.receiveMessage() on a connection that is closed. Multiple FATAL errors in pipeline mode is a way of inducing that, but may not be the root issue.

Normally, receiveMessage() is only called by methods that have already called lock(). lock() returns an error if the connection is already closed. And as far as I can tell, all of these methods return when an error is encountered. It's only pipeline mode that can keep going after an error. I think the fix should be for pipeline mode not to call receiveMessage() when the connection is dead. Potentially, this would be handled in pipeline.receiveMessage() or in pipeline.getResults(). Or maybe PgConn.receiveMessage() should have a guard clause to check if the connection is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic on Tx rollback when connection has been closed by PgBouncer/Postgres (close of closed channel)

2 participants