feat: dual mode transaction scheduler#1004
feat: dual mode transaction scheduler#1004bmuddha wants to merge 3 commits intobmuddha/transaction/original-bincodefrom
Conversation
📝 WalkthroughWalkthroughThe PR adds runtime coordinator modes (Primary and Replica) and a tokio::sync::Notify-based mode_switcher threaded from MagicValidator through TransactionSchedulerState into the scheduler. It introduces a ReplicationMode config, changes TransactionProcessingMode::Replay to carry a persist flag, adapts executor replay/commit logic to an optional persist, and updates the scheduler/coordinator to enforce mode-aware queuing, readiness, transaction rejection in Replica, and a switch_to_primary transition. Tests and test-kit helpers for replica-mode ordering and mode switching were added. Assessment against linked issues
Out-of-scope changes
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
magicblock-core/src/link/transactions.rs (1)
78-89:⚠️ Potential issue | 🟡 MinorFix stale
TransactionProcessingModedocumentation.The enum-level comment still says each variant carries a one-shot sender, but
Replay(bool)no longer does. This is now misleading for API consumers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-core/src/link/transactions.rs` around lines 78 - 89, The enum-level doc for TransactionProcessingMode is stale: it claims "each variant also carries the one-shot sender" but Replay(bool) no longer carries a sender; update the documentation to accurately state which variants carry result senders (Simulation and Execution) and that Replay only carries a bool controlling ledger persistence. Also adjust the Replay variant doc to clearly explain the bool semantics (true = record to ledger, false = no recording) and remove any reference to one-shot senders from the enum-level comment.magicblock-processor/tests/replay.rs (1)
93-113:⚠️ Potential issue | 🟡 MinorValidate “no notifications” after replay application is confirmed.
Because Line 94 only confirms enqueue success, the emptiness checks can run too early and miss delayed notifications. Assert post-replay state first, then check channels stay empty over the timeout window.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/tests/replay.rs` around lines 93 - 113, The test currently checks that notifications channels are empty immediately after enqueueing a replay (env.replay_transaction(false, txn)) which can race with delayed notifications; instead, first verify post-replay state by checking accounts via env.accountsdb.get_account for each pubkey (assert the data byte equals 42) and only after confirming the replay applied, assert that env.dispatch.transaction_status.recv_timeout(TIMEOUT) and env.dispatch.account_update.try_recv() remain empty; reorder the checks so the loop that inspects account state runs before the channel emptiness assertions (references: replay_transaction, env.accountsdb.get_account, env.dispatch.transaction_status, env.dispatch.account_update, TIMEOUT).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-config/src/config/validator.rs`:
- Around line 22-29: Add equality derives to the ReplicationMode enum so it can
be compared in tests and config code: update the derive on ReplicationMode to
include PartialEq and Eq alongside Deserialize, Serialize, Debug, Clone; no
other API changes needed because Url already implements PartialEq/Eq, so
consumers can now use ==/!= with ReplicationMode.
In `@magicblock-core/src/link/transactions.rs`:
- Around line 269-275: The replay path is dropping any pre-encoded bytes by
calling txn.sanitize(true) and hardcoding ProcessableTransaction.encoded: None;
update the replay branch to call txn.sanitize_with_encoded(true) (or the
equivalent method that returns WithEncoded) and set
ProcessableTransaction.encoded to the encoded bytes returned by that call so
that TransactionProcessingMode::Replay preserves optional pre-encoded bytes;
adjust the assignment where ProcessableTransaction { transaction, mode, encoded
} is built to use the encoded value from sanitize_with_encoded.
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 92-99: Replica readiness currently returns true if
pending.is_none(), which allows non-conflicting replays to run concurrently and
breaks primary ordering; change the Replica branch in is_ready (and the same
logic at the other occurrence) to require both pending.is_none() AND "no replay
in flight" (e.g., check m.in_flight_replay/is_empty or m.active_replay_count ==
0 or whatever field tracks current replay activity) so only a single replay can
be active at once; update Replica's readiness checks to reference the concrete
field names (pending and the in-flight/active replay tracker) in
ReplicaMode/Replica struct to enforce strict single‑flight replay ordering.
- Around line 217-219: The decrement of
CoordinationMode::Primary.p.blocked_txn_count using p.blocked_txn_count -=
txn.is_some() as usize can underflow after a Replica→Primary switch; change the
decrement to be safe by either checking p.blocked_txn_count > 0 before
subtracting or using a saturating/checked subtraction (e.g., saturating_sub)
whenever blocked_txn_count is modified (the same fix should be applied to the
other occurrences in the same block/branch around the 228–239 region) so that
blocked_txn_count never wraps below zero.
- Around line 245-255: The match in is_transaction_allowed currently blocks
Execution in Replica mode but still allows Simulation; tighten the Replica
branch so non-replay modes are rejected by matching both Execution and
Simulation. Update the match arm for Replica to something like matching
(Replica(_), Execution(_) | Simulation(_)) => false so Replica(_) only permits
Replay modes, leaving the Primary/Replay rule unchanged; reference function
is_transaction_allowed, CoordinationMode::Replica, and
TransactionProcessingMode::{Execution, Simulation}.
In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 120-140: The helper submit_all_and_start currently spawns
concurrent tasks that call scheduler.replay, making send order nondeterministic
and risking deadlock on bounded channels; change it to submit sequentially by
iterating txs and awaiting scheduler.replay(true, tx).await directly (no
tokio::spawn) so the order matches sigs, and if the scheduler consumes via
run_scheduler(), ensure run_scheduler() is started before or concurrently with
submission to avoid blocking—refer to submit_all_and_start and scheduler.replay
to locate and update the code.
---
Outside diff comments:
In `@magicblock-core/src/link/transactions.rs`:
- Around line 78-89: The enum-level doc for TransactionProcessingMode is stale:
it claims "each variant also carries the one-shot sender" but Replay(bool) no
longer carries a sender; update the documentation to accurately state which
variants carry result senders (Simulation and Execution) and that Replay only
carries a bool controlling ledger persistence. Also adjust the Replay variant
doc to clearly explain the bool semantics (true = record to ledger, false = no
recording) and remove any reference to one-shot senders from the enum-level
comment.
In `@magicblock-processor/tests/replay.rs`:
- Around line 93-113: The test currently checks that notifications channels are
empty immediately after enqueueing a replay (env.replay_transaction(false, txn))
which can race with delayed notifications; instead, first verify post-replay
state by checking accounts via env.accountsdb.get_account for each pubkey
(assert the data byte equals 42) and only after confirming the replay applied,
assert that env.dispatch.transaction_status.recv_timeout(TIMEOUT) and
env.dispatch.account_update.try_recv() remain empty; reorder the checks so the
loop that inspects account state runs before the channel emptiness assertions
(references: replay_transaction, env.accountsdb.get_account,
env.dispatch.transaction_status, env.dispatch.account_update, TIMEOUT).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
magicblock-api/src/magic_validator.rsmagicblock-config/src/config/validator.rsmagicblock-core/src/link/transactions.rsmagicblock-ledger/src/blockstore_processor/mod.rsmagicblock-processor/src/executor/mod.rsmagicblock-processor/src/executor/processing.rsmagicblock-processor/src/scheduler/coordinator.rsmagicblock-processor/src/scheduler/mod.rsmagicblock-processor/src/scheduler/state.rsmagicblock-processor/src/scheduler/tests.rsmagicblock-processor/tests/replay.rsmagicblock-processor/tests/replica_ordering.rstest-kit/Cargo.tomltest-kit/src/lib.rs
| let mode = TransactionProcessingMode::Replay(persist); | ||
| let transaction = txn.sanitize(true)?; | ||
| let txn = ProcessableTransaction { | ||
| transaction, | ||
| mode, | ||
| encoded: None, | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Replay path currently discards optional pre-encoded bytes.
Line 270 uses sanitize(true) and Line 274 hardcodes encoded: None, so WithEncoded<T> cannot carry bytes through replay. Prefer sanitize_with_encoded(true) here too.
♻️ Proposed refactor
pub async fn replay(
&self,
persist: bool,
txn: impl SanitizeableTransaction,
) -> TransactionResult {
let mode = TransactionProcessingMode::Replay(persist);
- let transaction = txn.sanitize(true)?;
+ let (transaction, encoded) = txn.sanitize_with_encoded(true)?;
let txn = ProcessableTransaction {
transaction,
mode,
- encoded: None,
+ encoded,
};
self.0
.send(txn)
.await
.map_err(|_| TransactionError::ClusterMaintenance)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-core/src/link/transactions.rs` around lines 269 - 275, The replay
path is dropping any pre-encoded bytes by calling txn.sanitize(true) and
hardcoding ProcessableTransaction.encoded: None; update the replay branch to
call txn.sanitize_with_encoded(true) (or the equivalent method that returns
WithEncoded) and set ProcessableTransaction.encoded to the encoded bytes
returned by that call so that TransactionProcessingMode::Replay preserves
optional pre-encoded bytes; adjust the assignment where ProcessableTransaction {
transaction, mode, encoded } is built to use the encoded value from
sanitize_with_encoded.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
magicblock-processor/tests/replica_ordering.rs (1)
120-136:⚠️ Potential issue | 🟠 MajorPotential pre-start enqueue hang in
submit_all_and_start.On Line 127-Line 132, all
replay()sends are awaited beforerun_scheduler()(Line 135). If replay uses a bounded queue and the consumer starts only afterrun_scheduler(), this can block the helper under stress.Suggested adjustment
async fn submit_all_and_start( env: &mut ExecutionTestEnv, txs: Vec<Transaction>, ) -> Vec<Signature> { let sigs: Vec<Signature> = txs.iter().map(|tx| tx.signatures[0]).collect(); - // Submit all transactions sequentially to preserve order + // Start consumer first to avoid potential bounded-queue backpressure deadlock. + env.run_scheduler(); + env.advance_slot(); + + // Submit sequentially to preserve deterministic order. for tx in txs { env.transaction_scheduler .replay(true, tx) .await .expect("Failed to submit transaction"); } - - // Now start the scheduler - env.run_scheduler(); - env.advance_slot(); sigs }#!/bin/bash set -euo pipefail # Verify whether replay submission can block before scheduler startup. # 1) Locate replay method definitions and inspect enqueue behavior. ast-grep --pattern $'async fn replay($$$) -> $_ { $$$ }' || true ast-grep --pattern $'fn replay($$$) -> $_ { $$$ }' || true # 2) Find channel types/capacities used by scheduler/dispatch paths. rg -n -C3 --type rust 'async_channel::bounded|async_channel::unbounded|tokio::sync::mpsc::channel|crossbeam_channel::bounded|crossbeam_channel::unbounded|flume::bounded|flume::unbounded' # 3) Check where scheduler consumer is started relative to replay calls. rg -n -C3 --type rust '\brun_scheduler\s*\(' rg -n -C3 --type rust '\.replay\s*\('Expected verification outcome: if replay ultimately sends to a bounded channel and consumer activation occurs in/after
run_scheduler(), this helper pattern remains vulnerable to blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/tests/replica_ordering.rs` around lines 120 - 136, The helper submit_all_and_start awaits transaction_scheduler.replay for each tx before calling env.run_scheduler, which can block if replay enqueues onto a bounded queue; fix by starting the consumer before flooding the queue or making enqueues non-blocking: either call env.run_scheduler() (and env.advance_slot() if needed) prior to looping over txs, or submit replays without awaiting (spawn each replay future or use a try_send-style non-blocking API) so that transaction_scheduler.replay calls cannot backpressure the test helper; update submit_all_and_start accordingly to use run_scheduler or non-blocking replay submission.magicblock-processor/src/scheduler/coordinator.rs (1)
234-238:⚠️ Potential issue | 🟠 MajorInitialize
blocked_txn_countfrom existing queues during Replica→Primary switch.Line [235] sets
blocked_txn_countto0. If any blocked entries exist at switch time, Primary backpressure accounting starts undercounted.Suggested fix
let mode = PrimaryMode { - blocked_txn_count: 0, + blocked_txn_count: self + .blocked_transactions + .iter() + .map(|queue| queue.len()) + .sum(), max_blocked_txn: self.blocked_transactions.len() * BLOCKED_TXN_MULTIPLIER, };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-processor/src/scheduler/coordinator.rs` around lines 234 - 238, When constructing PrimaryMode during the Replica→Primary transition, initialize blocked_txn_count from the existing blocked queue instead of zero: set PrimaryMode.blocked_txn_count to reflect the current number of blocked entries (e.g. based on self.blocked_transactions.len() or the appropriate aggregated count of entries) so backpressure accounting starts correctly; leave max_blocked_txn computed with BLOCKED_TXN_MULTIPLIER as-is. Ensure you update the PrimaryMode initialization where PrimaryMode { blocked_txn_count: ..., max_blocked_txn: ... } is created.magicblock-core/src/link/transactions.rs (1)
273-279: 🧹 Nitpick | 🔵 TrivialPreserve pre-encoded bytes in
replay()submissions.Line [274] uses
sanitize(true)and Line [278] hardcodesencoded: None, so replay dropsWithEncoded<T>bytes and forces redundant serialization later.Suggested fix
pub async fn replay( &self, persist: bool, txn: impl SanitizeableTransaction, ) -> TransactionResult { let mode = TransactionProcessingMode::Replay(persist); - let transaction = txn.sanitize(true)?; + let (transaction, encoded) = txn.sanitize_with_encoded(true)?; let txn = ProcessableTransaction { transaction, mode, - encoded: None, + encoded, }; self.0 .send(txn) .await .map_err(|_| TransactionError::ClusterMaintenance) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-core/src/link/transactions.rs` around lines 273 - 279, The replay branch currently calls txn.sanitize(true) which strips pre-encoded bytes and then hardcodes encoded: None on the new ProcessableTransaction, causing replay to drop WithEncoded<T> bytes; instead preserve any existing encoded bytes by calling sanitize in a way that does not remove the encoded payload (e.g., txn.sanitize(false)) and set ProcessableTransaction.encoded from the original txn's encoded field (or clone/retain it via the original WithEncoded<T> accessor) when creating the ProcessableTransaction for TransactionProcessingMode::Replay so redundant serialization is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 242-256: The doc comment for is_transaction_allowed incorrectly
states that Primary accepts only Execution and Replica accepts only Replay, but
the match implementation also permits Simulation in both modes; update either
the comment or the logic to match: either revise the comment above
is_transaction_allowed to explicitly mention that Simulation transactions are
allowed in both CoordinationMode::Primary and CoordinationMode::Replica, or
change the match arms in is_transaction_allowed (referencing
CoordinationMode::Primary, CoordinationMode::Replica and
TransactionProcessingMode::{Execution, Replay, Simulation}) to disallow
Simulation where appropriate so policy and code are consistent.
In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 237-246: The "Independent" branch reuses the same independent
account for both non-conflict transactions in each 4-item batch, causing
unintended conflicts; in the loop (for i in 0..count) where tx_write is called
for group_a/group_b/independent, change how you compute idx for independent so
the two independent branches in a batch select distinct accounts (e.g., derive
idx from the batch number rather than i directly: use batch = i/4 and an offset
for the first vs second independent slot, or multiply batch by 2 and add 0/1
based on (i % 4) to pick different independent[(batch*2 + offset) %
independent.len()]); update the selection logic used in the tx_write calls so
independent accesses are unique per non-conflict transaction.
---
Duplicate comments:
In `@magicblock-core/src/link/transactions.rs`:
- Around line 273-279: The replay branch currently calls txn.sanitize(true)
which strips pre-encoded bytes and then hardcodes encoded: None on the new
ProcessableTransaction, causing replay to drop WithEncoded<T> bytes; instead
preserve any existing encoded bytes by calling sanitize in a way that does not
remove the encoded payload (e.g., txn.sanitize(false)) and set
ProcessableTransaction.encoded from the original txn's encoded field (or
clone/retain it via the original WithEncoded<T> accessor) when creating the
ProcessableTransaction for TransactionProcessingMode::Replay so redundant
serialization is avoided.
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 234-238: When constructing PrimaryMode during the Replica→Primary
transition, initialize blocked_txn_count from the existing blocked queue instead
of zero: set PrimaryMode.blocked_txn_count to reflect the current number of
blocked entries (e.g. based on self.blocked_transactions.len() or the
appropriate aggregated count of entries) so backpressure accounting starts
correctly; leave max_blocked_txn computed with BLOCKED_TXN_MULTIPLIER as-is.
Ensure you update the PrimaryMode initialization where PrimaryMode {
blocked_txn_count: ..., max_blocked_txn: ... } is created.
In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 120-136: The helper submit_all_and_start awaits
transaction_scheduler.replay for each tx before calling env.run_scheduler, which
can block if replay enqueues onto a bounded queue; fix by starting the consumer
before flooding the queue or making enqueues non-blocking: either call
env.run_scheduler() (and env.advance_slot() if needed) prior to looping over
txs, or submit replays without awaiting (spawn each replay future or use a
try_send-style non-blocking API) so that transaction_scheduler.replay calls
cannot backpressure the test helper; update submit_all_and_start accordingly to
use run_scheduler or non-blocking replay submission.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
magicblock-core/src/link/transactions.rsmagicblock-processor/src/scheduler/coordinator.rsmagicblock-processor/tests/replica_ordering.rs
| for i in 0..count { | ||
| env.advance_slot(); | ||
| match i % 4 { | ||
| 0 => txs.push(tx_write(&mut env, group_a[0], i as u8)), | ||
| 1 => txs.push(tx_write(&mut env, group_b[0], i as u8)), | ||
| _ => { | ||
| // Independent writes to unique accounts | ||
| let idx = (i / 4) as usize % independent.len(); | ||
| txs.push(tx_write(&mut env, independent[idx], i as u8)); | ||
| } |
There was a problem hiding this comment.
“Independent” branch currently creates avoidable conflicts.
Line 244 maps both non-conflict branch transactions in each 4-item batch to the same account, so they are not unique writes as the comment states. This weakens the intended independent-workload coverage.
Suggested fix
let mut txs = Vec::with_capacity(count);
+ let mut independent_idx = 0usize;
for i in 0..count {
env.advance_slot();
match i % 4 {
0 => txs.push(tx_write(&mut env, group_a[0], i as u8)),
1 => txs.push(tx_write(&mut env, group_b[0], i as u8)),
_ => {
// Independent writes to unique accounts
- let idx = (i / 4) as usize % independent.len();
+ let idx = independent_idx % independent.len();
+ independent_idx += 1;
txs.push(tx_write(&mut env, independent[idx], i as u8));
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i in 0..count { | |
| env.advance_slot(); | |
| match i % 4 { | |
| 0 => txs.push(tx_write(&mut env, group_a[0], i as u8)), | |
| 1 => txs.push(tx_write(&mut env, group_b[0], i as u8)), | |
| _ => { | |
| // Independent writes to unique accounts | |
| let idx = (i / 4) as usize % independent.len(); | |
| txs.push(tx_write(&mut env, independent[idx], i as u8)); | |
| } | |
| let mut txs = Vec::with_capacity(count); | |
| let mut independent_idx = 0usize; | |
| for i in 0..count { | |
| env.advance_slot(); | |
| match i % 4 { | |
| 0 => txs.push(tx_write(&mut env, group_a[0], i as u8)), | |
| 1 => txs.push(tx_write(&mut env, group_b[0], i as u8)), | |
| _ => { | |
| // Independent writes to unique accounts | |
| let idx = independent_idx % independent.len(); | |
| independent_idx += 1; | |
| txs.push(tx_write(&mut env, independent[idx], i as u8)); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-processor/tests/replica_ordering.rs` around lines 237 - 246, The
"Independent" branch reuses the same independent account for both non-conflict
transactions in each 4-item batch, causing unintended conflicts; in the loop
(for i in 0..count) where tx_write is called for group_a/group_b/independent,
change how you compute idx for independent so the two independent branches in a
batch select distinct accounts (e.g., derive idx from the batch number rather
than i directly: use batch = i/4 and an offset for the first vs second
independent slot, or multiply batch by 2 and add 0/1 based on (i % 4) to pick
different independent[(batch*2 + offset) % independent.len()]); update the
selection logic used in the tx_write calls so independent accesses are unique
per non-conflict transaction.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 234-235: The initialization of PrimaryMode with blocked_txn_count:
0 assumes all executor blocked queues are empty at the Replica→Primary
transition; update this to be defensive by either (a) computing
blocked_txn_count from the actual queue lengths across executors (use the
executor/queue APIs available in the coordinator to sum blocked counts) or (b)
adding a runtime assertion that verifies all blocked queues are empty before
constructing PrimaryMode, or (c) expanding the doc comment on the transition
function to explicitly state the empty-queue invariant; reference PrimaryMode
and blocked_txn_count and the coordinator's executor queue accessors when making
the change so the check or computed initialization is placed at the mode-switch
site right after ledger replay completes.
| let mode = PrimaryMode { | ||
| blocked_txn_count: 0, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
rg -n -C5 'switch_to_primary_mode' magicblock-processor/src/scheduler/Repository: magicblock-labs/magicblock-validator
Length of output: 3078
🏁 Script executed:
# Get more context around switch_to_primary_mode definition and PrimaryMode initialization
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '220,260p'Repository: magicblock-labs/magicblock-validator
Length of output: 1697
🏁 Script executed:
# Search for queue initialization and clearing logic
rg -n 'blocked_txns|blocked_queue|replay|clear' magicblock-processor/src/scheduler/coordinator.rs | head -30Repository: magicblock-labs/magicblock-validator
Length of output: 478
🏁 Script executed:
# Check the CoordinationMode enum and its variants to see queue structure
rg -n -B5 -A10 'enum CoordinationMode\|struct PrimaryMode\|struct ReplicaMode' magicblock-processor/src/scheduler/coordinator.rs | head -60Repository: magicblock-labs/magicblock-validator
Length of output: 62
🏁 Script executed:
# Get the enum and struct definitions for coordination modes
rg -n -A15 'enum CoordinationMode' magicblock-processor/src/scheduler/coordinator.rsRepository: magicblock-labs/magicblock-validator
Length of output: 658
🏁 Script executed:
# Get the full struct definitions
rg -n -A8 'struct PrimaryMode\|struct ReplicaMode' magicblock-processor/src/scheduler/coordinator.rsRepository: magicblock-labs/magicblock-validator
Length of output: 62
🏁 Script executed:
# Check what blocked_transactions is
rg -n 'blocked_transactions' magicblock-processor/src/scheduler/coordinator.rs | head -20Repository: magicblock-labs/magicblock-validator
Length of output: 405
🏁 Script executed:
# Check how blocked_txn_count is modified throughout the code
rg -n 'blocked_txn_count' magicblock-processor/src/scheduler/coordinator.rsRepository: magicblock-labs/magicblock-validator
Length of output: 398
🏁 Script executed:
# Get the full context of PrimaryMode struct and find replay clearing logic
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '40,120p'Repository: magicblock-labs/magicblock-validator
Length of output: 3672
🏁 Script executed:
# Search for where queues might be cleared or checked during mode transition
rg -n 'push|pop|clear|drain' magicblock-processor/src/scheduler/coordinator.rs | head -20Repository: magicblock-labs/magicblock-validator
Length of output: 410
🏁 Script executed:
# Check the try_schedule and dequeue_blocked methods to understand queue usage context
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '180,220p'Repository: magicblock-labs/magicblock-validator
Length of output: 2010
🏁 Script executed:
# Verify when transactions are added to blocked_transactions (should only be in Primary mode context)
rg -n -B10 'heap.push' magicblock-processor/src/scheduler/coordinator.rsRepository: magicblock-labs/magicblock-validator
Length of output: 550
🏁 Script executed:
# Check if there's any explicit clearing or verification of queue state during mode transitions
rg -n 'transition_to_new_slot|replay|Replica|Primary' magicblock-processor/src/scheduler/mod.rs | head -30Repository: magicblock-labs/magicblock-validator
Length of output: 297
🏁 Script executed:
# Look at the full flow of mode_switcher to understand when switch_to_primary_mode is called
cat -n magicblock-processor/src/scheduler/mod.rs | sed -n '120,160p'Repository: magicblock-labs/magicblock-validator
Length of output: 2123
🏁 Script executed:
# Check reschedule_blocked_transactions to understand the full lifecycle
rg -n -A15 'fn reschedule_blocked_transactions' magicblock-processor/src/scheduler/mod.rsRepository: magicblock-labs/magicblock-validator
Length of output: 865
🏁 Script executed:
# Verify: are there any tests that show queue state during mode transitions?
rg -n -B5 -A10 'switch_to_primary_mode' magicblock-processor/src/scheduler/tests.rsRepository: magicblock-labs/magicblock-validator
Length of output: 702
🏁 Script executed:
# Check transition_to_new_slot to see if it clears state or what happens at slot boundaries
cat -n magicblock-processor/src/scheduler/mod.rs | sed -n '200,230p'Repository: magicblock-labs/magicblock-validator
Length of output: 1488
🏁 Script executed:
# Look for any assertions or checks about queue state
rg -n 'assert|debug_assert|unwrap|expect' magicblock-processor/src/scheduler/coordinator.rs | grep -i 'queue\|blocked\|empty'Repository: magicblock-labs/magicblock-validator
Length of output: 170
🏁 Script executed:
# Check what BLOCKED_TXN_MULTIPLIER is
rg -n 'BLOCKED_TXN_MULTIPLIER' magicblock-processor/src/scheduler/Repository: magicblock-labs/magicblock-validator
Length of output: 269
🏁 Script executed:
# Look at the mode_switcher initialization to understand when it's triggered
rg -n -B10 -A10 'mode_switcher' magicblock-processor/src/scheduler/mod.rs | head -50Repository: magicblock-labs/magicblock-validator
Length of output: 2220
🏁 Script executed:
# Verify Replica mode doesn't queue in blocked_transactions heaps
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '180,205p'Repository: magicblock-labs/magicblock-validator
Length of output: 1289
Consider documenting the invariant that queues are empty at mode transition.
The initialization blocked_txn_count: 0 assumes all blocked queues are empty when switching from Replica to Primary. While the doc comment states this is called "after ledger replay completes," explicitly verifying or documenting the invariant would be more defensive. Either:
- Initialize
blocked_txn_countfrom actual queue lengths across executors, or - Add a runtime check/assertion that queues are empty, or
- Expand the existing doc comment to explicitly state the empty-queue guarantee
This would prevent subtle bugs if the invariant is ever violated by future changes to the replay/mode-switch flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-processor/src/scheduler/coordinator.rs` around lines 234 - 235,
The initialization of PrimaryMode with blocked_txn_count: 0 assumes all executor
blocked queues are empty at the Replica→Primary transition; update this to be
defensive by either (a) computing blocked_txn_count from the actual queue
lengths across executors (use the executor/queue APIs available in the
coordinator to sum blocked counts) or (b) adding a runtime assertion that
verifies all blocked queues are empty before constructing PrimaryMode, or (c)
expanding the doc comment on the transition function to explicitly state the
empty-queue invariant; reference PrimaryMode and blocked_txn_count and the
coordinator's executor queue accessors when making the change so the check or
computed initialization is placed at the mode-switch site right after ledger
replay completes.

Summary
Adds two modes to the transaction scheduler: primary and replica,
these modes affect how state is persisted and what transaction sources
are allowed.
Compatibility
Testing
Checklist
Summary by CodeRabbit
New Features
Tests