Skip to content

Fix: Concurrency rule violation on replays#2926

Open
mrkaye97 wants to merge 3 commits intomainfrom
mk/fix-concurrency-ordering-bug
Open

Fix: Concurrency rule violation on replays#2926
mrkaye97 wants to merge 3 commits intomainfrom
mk/fix-concurrency-ordering-bug

Conversation

@mrkaye97
Copy link
Contributor

@mrkaye97 mrkaye97 commented Feb 3, 2026

Description

Fixes a bug where concurrency isn't respected on a replay because of the ordering by priority first by first checking the is_filled in the lateral join where we find candidates to assign

Type of change

  • Bug fix (non-breaking change which fixes an issue)

@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hatchet-docs Ready Ready Preview, Comment Mar 10, 2026 7:04pm

Request Review

@mrkaye97 mrkaye97 force-pushed the mk/fix-concurrency-ordering-bug branch from c84915e to 3b3fba0 Compare March 9, 2026 21:58
Copilot AI review requested due to automatic review settings March 9, 2026 21:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an edge case in the v1 concurrency scheduler where replayed/running (already-filled) slots were not being accounted for correctly due to candidate selection ordering inside lateral joins, allowing concurrency limits to be violated on replays.

Changes:

  • Updates lateral-join candidate selection to sort by is_filled first, so already-filled slots “consume” concurrency capacity before selecting new work.
  • Applies the same ordering fix across round-robin scheduling and “cancel” workflows, and syncs the generated sqlc Go output.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/repository/sqlcv1/concurrency.sql.go Regenerates/updates embedded SQL strings to match the corrected ORDER BY semantics.
pkg/repository/sqlcv1/concurrency.sql Changes lateral-join ORDER BY clauses to prioritize is_filled before priority/sort_id, preventing concurrency oversubscription on replays.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AND wcs_all.tenant_id = @tenantId::uuid
AND wcs_all.strategy_id = @strategyId::bigint
ORDER BY wcs_all.priority DESC, wcs_all.sort_id ASC
ORDER BY wcs_all.is_filled DESC, wcs_all.priority DESC, wcs_all.sort_id ASC
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding is_filled to the leading ORDER BY means Postgres can no longer satisfy this ordering using the existing v1_workflow_concurrency_slot_query_idx (it starts with tenant_id, strategy_id, key, priority, sort_id but not is_filled). This can introduce an extra sort per key inside the lateral subquery. Consider adding an index that includes is_filled (e.g., tenant_id, strategy_id, key, is_filled, priority, sort_id) or verifying with EXPLAIN that the plan stays efficient under expected slot counts.

Suggested change
ORDER BY wcs_all.is_filled DESC, wcs_all.priority DESC, wcs_all.sort_id ASC
AND wcs_all.is_filled = FALSE
ORDER BY wcs_all.priority DESC, wcs_all.sort_id ASC

Copilot uses AI. Check for mistakes.
AND wcs_all.tenant_id = @tenantId::uuid
AND wcs_all.strategy_id = @strategyId::bigint
ORDER BY wcs_all.sort_id ASC
ORDER BY wcs_all.is_filled DESC, wcs_all.sort_id ASC
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new ORDER BY ... is_filled is not supported by the current v1_concurrency_slot_query_idx (tenant_id, strategy_id, key, sort_id). Depending on how many rows exist per key, Postgres may have to sort to apply the limit in the lateral join. Consider extending the index to include is_filled (tenant_id, strategy_id, key, is_filled, sort_id) or validate the plan with EXPLAIN/ANALYZE in a realistic dataset.

Suggested change
ORDER BY wcs_all.is_filled DESC, wcs_all.sort_id ASC
ORDER BY wcs_all.sort_id ASC

Copilot uses AI. Check for mistakes.
AND wcs_all.tenant_id = @tenantId::uuid
AND wcs_all.strategy_id = @strategyId::bigint
ORDER BY wcs_all.sort_id DESC
ORDER BY wcs_all.is_filled DESC, wcs_all.sort_id DESC
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because is_filled is now the first sort key, the planner can’t use the existing ordering of locked_workflow_concurrency_slots / underlying index to avoid a sort in this lateral join. If this query is on a hot path, consider adding an index that includes is_filled for v1_workflow_concurrency_slot (or confirm via EXPLAIN that row counts per key are small enough that sorting isn’t a concern).

Copilot uses AI. Check for mistakes.
AND wcs_all.tenant_id = @tenantId::uuid
AND wcs_all.strategy_id = @strategyId::bigint
ORDER BY wcs_all.sort_id ASC
ORDER BY wcs_all.is_filled DESC, wcs_all.sort_id ASC
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other lateral joins: ordering by is_filled first won’t be supported by the current v1_workflow_concurrency_slot_query_idx ordering, which can introduce a sort before applying LIMIT @maxRuns. If this runs frequently, consider an index that includes is_filled or validate the query plan/perf impact.

Suggested change
ORDER BY wcs_all.is_filled DESC, wcs_all.sort_id ASC
ORDER BY wcs_all.sort_id ASC, wcs_all.is_filled DESC

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants