Skip to content

feat: dual mode transaction scheduler#1004

Draft
bmuddha wants to merge 3 commits intobmuddha/transaction/original-bincodefrom
bmuddha/scheduler/dual-mode
Draft

feat: dual mode transaction scheduler#1004
bmuddha wants to merge 3 commits intobmuddha/transaction/original-bincodefrom
bmuddha/scheduler/dual-mode

Conversation

@bmuddha
Copy link
Collaborator

@bmuddha bmuddha commented Feb 26, 2026

Summary

Adds two modes to the transaction scheduler: primary and replica,
these modes affect how state is persisted and what transaction sources
are allowed.

Compatibility

  • There're now two modes of scheduling, primary mode is the default, as it was before
  • Config change (describe): configuration now has an extra field to specify the replication mode

Testing

  • added more tests for the replication mode in the scheduler

Checklist

Summary by CodeRabbit

  • New Features

    • Validators now support Primary and Replica replication roles driven by an async mode switch notifier; schedulers can switch modes at runtime and drop transactions not allowed in the current mode.
    • Replay API and test helpers gained a persist flag to control whether replays produce side effects.
    • Config adds a ReplicationMode setting to select Primary or Replica.
  • Tests

    • Added extensive Replica-mode ordering and coordination tests and replica-mode test helpers with explicit mode switching.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

The PR adds runtime coordinator modes (Primary and Replica) and a tokio::sync::Notify-based mode_switcher threaded from MagicValidator through TransactionSchedulerState into the scheduler. It introduces a ReplicationMode config, changes TransactionProcessingMode::Replay to carry a persist flag, adapts executor replay/commit logic to an optional persist, and updates the scheduler/coordinator to enforce mode-aware queuing, readiness, transaction rejection in Replica, and a switch_to_primary transition. Tests and test-kit helpers for replica-mode ordering and mode switching were added.

Assessment against linked issues

Objective Addressed Explanation
Scheduler mode enum and runtime mode switch method; runtime mode switching without restart [#958]
Replay mode enforces strict ordering [#958] Implementation enforces single-pending semantics in Replica but does not show index-based ordering or sequence-number checks from primary.
Non-replay transactions rejected with warning in replay mode [#958]
Execution mode behavior unchanged [#958]

Out-of-scope changes

Code Change Explanation
Logging format changed from structured fields to positional form (magicblock-ledger/src/blockstore_processor/mod.rs) Formatting/logging change unrelated to scheduler mode or replay ordering objectives.
Added tokio dependency to test-kit workspace Cargo.toml (test-kit/Cargo.toml) Test infra dependency addition not required by the scheduler mode feature itself.

Suggested reviewers

  • thlorenz
  • GabrielePicco
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bmuddha/scheduler/dual-mode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

bmuddha commented Feb 26, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
magicblock-core/src/link/transactions.rs (1)

78-89: ⚠️ Potential issue | 🟡 Minor

Fix stale TransactionProcessingMode documentation.

The enum-level comment still says each variant carries a one-shot sender, but Replay(bool) no longer does. This is now misleading for API consumers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-core/src/link/transactions.rs` around lines 78 - 89, The
enum-level doc for TransactionProcessingMode is stale: it claims "each variant
also carries the one-shot sender" but Replay(bool) no longer carries a sender;
update the documentation to accurately state which variants carry result senders
(Simulation and Execution) and that Replay only carries a bool controlling
ledger persistence. Also adjust the Replay variant doc to clearly explain the
bool semantics (true = record to ledger, false = no recording) and remove any
reference to one-shot senders from the enum-level comment.
magicblock-processor/tests/replay.rs (1)

93-113: ⚠️ Potential issue | 🟡 Minor

Validate “no notifications” after replay application is confirmed.

Because Line 94 only confirms enqueue success, the emptiness checks can run too early and miss delayed notifications. Assert post-replay state first, then check channels stay empty over the timeout window.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/tests/replay.rs` around lines 93 - 113, The test
currently checks that notifications channels are empty immediately after
enqueueing a replay (env.replay_transaction(false, txn)) which can race with
delayed notifications; instead, first verify post-replay state by checking
accounts via env.accountsdb.get_account for each pubkey (assert the data byte
equals 42) and only after confirming the replay applied, assert that
env.dispatch.transaction_status.recv_timeout(TIMEOUT) and
env.dispatch.account_update.try_recv() remain empty; reorder the checks so the
loop that inspects account state runs before the channel emptiness assertions
(references: replay_transaction, env.accountsdb.get_account,
env.dispatch.transaction_status, env.dispatch.account_update, TIMEOUT).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-config/src/config/validator.rs`:
- Around line 22-29: Add equality derives to the ReplicationMode enum so it can
be compared in tests and config code: update the derive on ReplicationMode to
include PartialEq and Eq alongside Deserialize, Serialize, Debug, Clone; no
other API changes needed because Url already implements PartialEq/Eq, so
consumers can now use ==/!= with ReplicationMode.

In `@magicblock-core/src/link/transactions.rs`:
- Around line 269-275: The replay path is dropping any pre-encoded bytes by
calling txn.sanitize(true) and hardcoding ProcessableTransaction.encoded: None;
update the replay branch to call txn.sanitize_with_encoded(true) (or the
equivalent method that returns WithEncoded) and set
ProcessableTransaction.encoded to the encoded bytes returned by that call so
that TransactionProcessingMode::Replay preserves optional pre-encoded bytes;
adjust the assignment where ProcessableTransaction { transaction, mode, encoded
} is built to use the encoded value from sanitize_with_encoded.

In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 92-99: Replica readiness currently returns true if
pending.is_none(), which allows non-conflicting replays to run concurrently and
breaks primary ordering; change the Replica branch in is_ready (and the same
logic at the other occurrence) to require both pending.is_none() AND "no replay
in flight" (e.g., check m.in_flight_replay/is_empty or m.active_replay_count ==
0 or whatever field tracks current replay activity) so only a single replay can
be active at once; update Replica's readiness checks to reference the concrete
field names (pending and the in-flight/active replay tracker) in
ReplicaMode/Replica struct to enforce strict single‑flight replay ordering.
- Around line 217-219: The decrement of
CoordinationMode::Primary.p.blocked_txn_count using p.blocked_txn_count -=
txn.is_some() as usize can underflow after a Replica→Primary switch; change the
decrement to be safe by either checking p.blocked_txn_count > 0 before
subtracting or using a saturating/checked subtraction (e.g., saturating_sub)
whenever blocked_txn_count is modified (the same fix should be applied to the
other occurrences in the same block/branch around the 228–239 region) so that
blocked_txn_count never wraps below zero.
- Around line 245-255: The match in is_transaction_allowed currently blocks
Execution in Replica mode but still allows Simulation; tighten the Replica
branch so non-replay modes are rejected by matching both Execution and
Simulation. Update the match arm for Replica to something like matching
(Replica(_), Execution(_) | Simulation(_)) => false so Replica(_) only permits
Replay modes, leaving the Primary/Replay rule unchanged; reference function
is_transaction_allowed, CoordinationMode::Replica, and
TransactionProcessingMode::{Execution, Simulation}.

In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 120-140: The helper submit_all_and_start currently spawns
concurrent tasks that call scheduler.replay, making send order nondeterministic
and risking deadlock on bounded channels; change it to submit sequentially by
iterating txs and awaiting scheduler.replay(true, tx).await directly (no
tokio::spawn) so the order matches sigs, and if the scheduler consumes via
run_scheduler(), ensure run_scheduler() is started before or concurrently with
submission to avoid blocking—refer to submit_all_and_start and scheduler.replay
to locate and update the code.

---

Outside diff comments:
In `@magicblock-core/src/link/transactions.rs`:
- Around line 78-89: The enum-level doc for TransactionProcessingMode is stale:
it claims "each variant also carries the one-shot sender" but Replay(bool) no
longer carries a sender; update the documentation to accurately state which
variants carry result senders (Simulation and Execution) and that Replay only
carries a bool controlling ledger persistence. Also adjust the Replay variant
doc to clearly explain the bool semantics (true = record to ledger, false = no
recording) and remove any reference to one-shot senders from the enum-level
comment.

In `@magicblock-processor/tests/replay.rs`:
- Around line 93-113: The test currently checks that notifications channels are
empty immediately after enqueueing a replay (env.replay_transaction(false, txn))
which can race with delayed notifications; instead, first verify post-replay
state by checking accounts via env.accountsdb.get_account for each pubkey
(assert the data byte equals 42) and only after confirming the replay applied,
assert that env.dispatch.transaction_status.recv_timeout(TIMEOUT) and
env.dispatch.account_update.try_recv() remain empty; reorder the checks so the
loop that inspects account state runs before the channel emptiness assertions
(references: replay_transaction, env.accountsdb.get_account,
env.dispatch.transaction_status, env.dispatch.account_update, TIMEOUT).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9edc6c4 and 5237fb4.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • magicblock-api/src/magic_validator.rs
  • magicblock-config/src/config/validator.rs
  • magicblock-core/src/link/transactions.rs
  • magicblock-ledger/src/blockstore_processor/mod.rs
  • magicblock-processor/src/executor/mod.rs
  • magicblock-processor/src/executor/processing.rs
  • magicblock-processor/src/scheduler/coordinator.rs
  • magicblock-processor/src/scheduler/mod.rs
  • magicblock-processor/src/scheduler/state.rs
  • magicblock-processor/src/scheduler/tests.rs
  • magicblock-processor/tests/replay.rs
  • magicblock-processor/tests/replica_ordering.rs
  • test-kit/Cargo.toml
  • test-kit/src/lib.rs

Comment on lines +269 to +275
let mode = TransactionProcessingMode::Replay(persist);
let transaction = txn.sanitize(true)?;
let txn = ProcessableTransaction {
transaction,
mode,
encoded: None,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Replay path currently discards optional pre-encoded bytes.

Line 270 uses sanitize(true) and Line 274 hardcodes encoded: None, so WithEncoded<T> cannot carry bytes through replay. Prefer sanitize_with_encoded(true) here too.

♻️ Proposed refactor
     pub async fn replay(
         &self,
         persist: bool,
         txn: impl SanitizeableTransaction,
     ) -> TransactionResult {
         let mode = TransactionProcessingMode::Replay(persist);
-        let transaction = txn.sanitize(true)?;
+        let (transaction, encoded) = txn.sanitize_with_encoded(true)?;
         let txn = ProcessableTransaction {
             transaction,
             mode,
-            encoded: None,
+            encoded,
         };
         self.0
             .send(txn)
             .await
             .map_err(|_| TransactionError::ClusterMaintenance)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-core/src/link/transactions.rs` around lines 269 - 275, The replay
path is dropping any pre-encoded bytes by calling txn.sanitize(true) and
hardcoding ProcessableTransaction.encoded: None; update the replay branch to
call txn.sanitize_with_encoded(true) (or the equivalent method that returns
WithEncoded) and set ProcessableTransaction.encoded to the encoded bytes
returned by that call so that TransactionProcessingMode::Replay preserves
optional pre-encoded bytes; adjust the assignment where ProcessableTransaction {
transaction, mode, encoded } is built to use the encoded value from
sanitize_with_encoded.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
magicblock-processor/tests/replica_ordering.rs (1)

120-136: ⚠️ Potential issue | 🟠 Major

Potential pre-start enqueue hang in submit_all_and_start.

On Line 127-Line 132, all replay() sends are awaited before run_scheduler() (Line 135). If replay uses a bounded queue and the consumer starts only after run_scheduler(), this can block the helper under stress.

Suggested adjustment
 async fn submit_all_and_start(
     env: &mut ExecutionTestEnv,
     txs: Vec<Transaction>,
 ) -> Vec<Signature> {
     let sigs: Vec<Signature> = txs.iter().map(|tx| tx.signatures[0]).collect();

-    // Submit all transactions sequentially to preserve order
+    // Start consumer first to avoid potential bounded-queue backpressure deadlock.
+    env.run_scheduler();
+    env.advance_slot();
+
+    // Submit sequentially to preserve deterministic order.
     for tx in txs {
         env.transaction_scheduler
             .replay(true, tx)
             .await
             .expect("Failed to submit transaction");
     }
-
-    // Now start the scheduler
-    env.run_scheduler();
-    env.advance_slot();

     sigs
 }
#!/bin/bash
set -euo pipefail

# Verify whether replay submission can block before scheduler startup.

# 1) Locate replay method definitions and inspect enqueue behavior.
ast-grep --pattern $'async fn replay($$$) -> $_ { $$$ }' || true
ast-grep --pattern $'fn replay($$$) -> $_ { $$$ }' || true

# 2) Find channel types/capacities used by scheduler/dispatch paths.
rg -n -C3 --type rust 'async_channel::bounded|async_channel::unbounded|tokio::sync::mpsc::channel|crossbeam_channel::bounded|crossbeam_channel::unbounded|flume::bounded|flume::unbounded'

# 3) Check where scheduler consumer is started relative to replay calls.
rg -n -C3 --type rust '\brun_scheduler\s*\('
rg -n -C3 --type rust '\.replay\s*\('

Expected verification outcome: if replay ultimately sends to a bounded channel and consumer activation occurs in/after run_scheduler(), this helper pattern remains vulnerable to blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/tests/replica_ordering.rs` around lines 120 - 136, The
helper submit_all_and_start awaits transaction_scheduler.replay for each tx
before calling env.run_scheduler, which can block if replay enqueues onto a
bounded queue; fix by starting the consumer before flooding the queue or making
enqueues non-blocking: either call env.run_scheduler() (and env.advance_slot()
if needed) prior to looping over txs, or submit replays without awaiting (spawn
each replay future or use a try_send-style non-blocking API) so that
transaction_scheduler.replay calls cannot backpressure the test helper; update
submit_all_and_start accordingly to use run_scheduler or non-blocking replay
submission.
magicblock-processor/src/scheduler/coordinator.rs (1)

234-238: ⚠️ Potential issue | 🟠 Major

Initialize blocked_txn_count from existing queues during Replica→Primary switch.

Line [235] sets blocked_txn_count to 0. If any blocked entries exist at switch time, Primary backpressure accounting starts undercounted.

Suggested fix
         let mode = PrimaryMode {
-            blocked_txn_count: 0,
+            blocked_txn_count: self
+                .blocked_transactions
+                .iter()
+                .map(|queue| queue.len())
+                .sum(),
             max_blocked_txn: self.blocked_transactions.len()
                 * BLOCKED_TXN_MULTIPLIER,
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/src/scheduler/coordinator.rs` around lines 234 - 238,
When constructing PrimaryMode during the Replica→Primary transition, initialize
blocked_txn_count from the existing blocked queue instead of zero: set
PrimaryMode.blocked_txn_count to reflect the current number of blocked entries
(e.g. based on self.blocked_transactions.len() or the appropriate aggregated
count of entries) so backpressure accounting starts correctly; leave
max_blocked_txn computed with BLOCKED_TXN_MULTIPLIER as-is. Ensure you update
the PrimaryMode initialization where PrimaryMode { blocked_txn_count: ...,
max_blocked_txn: ... } is created.
magicblock-core/src/link/transactions.rs (1)

273-279: 🧹 Nitpick | 🔵 Trivial

Preserve pre-encoded bytes in replay() submissions.

Line [274] uses sanitize(true) and Line [278] hardcodes encoded: None, so replay drops WithEncoded<T> bytes and forces redundant serialization later.

Suggested fix
     pub async fn replay(
         &self,
         persist: bool,
         txn: impl SanitizeableTransaction,
     ) -> TransactionResult {
         let mode = TransactionProcessingMode::Replay(persist);
-        let transaction = txn.sanitize(true)?;
+        let (transaction, encoded) = txn.sanitize_with_encoded(true)?;
         let txn = ProcessableTransaction {
             transaction,
             mode,
-            encoded: None,
+            encoded,
         };
         self.0
             .send(txn)
             .await
             .map_err(|_| TransactionError::ClusterMaintenance)
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-core/src/link/transactions.rs` around lines 273 - 279, The replay
branch currently calls txn.sanitize(true) which strips pre-encoded bytes and
then hardcodes encoded: None on the new ProcessableTransaction, causing replay
to drop WithEncoded<T> bytes; instead preserve any existing encoded bytes by
calling sanitize in a way that does not remove the encoded payload (e.g.,
txn.sanitize(false)) and set ProcessableTransaction.encoded from the original
txn's encoded field (or clone/retain it via the original WithEncoded<T>
accessor) when creating the ProcessableTransaction for
TransactionProcessingMode::Replay so redundant serialization is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 242-256: The doc comment for is_transaction_allowed incorrectly
states that Primary accepts only Execution and Replica accepts only Replay, but
the match implementation also permits Simulation in both modes; update either
the comment or the logic to match: either revise the comment above
is_transaction_allowed to explicitly mention that Simulation transactions are
allowed in both CoordinationMode::Primary and CoordinationMode::Replica, or
change the match arms in is_transaction_allowed (referencing
CoordinationMode::Primary, CoordinationMode::Replica and
TransactionProcessingMode::{Execution, Replay, Simulation}) to disallow
Simulation where appropriate so policy and code are consistent.

In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 237-246: The "Independent" branch reuses the same independent
account for both non-conflict transactions in each 4-item batch, causing
unintended conflicts; in the loop (for i in 0..count) where tx_write is called
for group_a/group_b/independent, change how you compute idx for independent so
the two independent branches in a batch select distinct accounts (e.g., derive
idx from the batch number rather than i directly: use batch = i/4 and an offset
for the first vs second independent slot, or multiply batch by 2 and add 0/1
based on (i % 4) to pick different independent[(batch*2 + offset) %
independent.len()]); update the selection logic used in the tx_write calls so
independent accesses are unique per non-conflict transaction.

---

Duplicate comments:
In `@magicblock-core/src/link/transactions.rs`:
- Around line 273-279: The replay branch currently calls txn.sanitize(true)
which strips pre-encoded bytes and then hardcodes encoded: None on the new
ProcessableTransaction, causing replay to drop WithEncoded<T> bytes; instead
preserve any existing encoded bytes by calling sanitize in a way that does not
remove the encoded payload (e.g., txn.sanitize(false)) and set
ProcessableTransaction.encoded from the original txn's encoded field (or
clone/retain it via the original WithEncoded<T> accessor) when creating the
ProcessableTransaction for TransactionProcessingMode::Replay so redundant
serialization is avoided.

In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 234-238: When constructing PrimaryMode during the Replica→Primary
transition, initialize blocked_txn_count from the existing blocked queue instead
of zero: set PrimaryMode.blocked_txn_count to reflect the current number of
blocked entries (e.g. based on self.blocked_transactions.len() or the
appropriate aggregated count of entries) so backpressure accounting starts
correctly; leave max_blocked_txn computed with BLOCKED_TXN_MULTIPLIER as-is.
Ensure you update the PrimaryMode initialization where PrimaryMode {
blocked_txn_count: ..., max_blocked_txn: ... } is created.

In `@magicblock-processor/tests/replica_ordering.rs`:
- Around line 120-136: The helper submit_all_and_start awaits
transaction_scheduler.replay for each tx before calling env.run_scheduler, which
can block if replay enqueues onto a bounded queue; fix by starting the consumer
before flooding the queue or making enqueues non-blocking: either call
env.run_scheduler() (and env.advance_slot() if needed) prior to looping over
txs, or submit replays without awaiting (spawn each replay future or use a
try_send-style non-blocking API) so that transaction_scheduler.replay calls
cannot backpressure the test helper; update submit_all_and_start accordingly to
use run_scheduler or non-blocking replay submission.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5237fb4 and ce34538.

📒 Files selected for processing (3)
  • magicblock-core/src/link/transactions.rs
  • magicblock-processor/src/scheduler/coordinator.rs
  • magicblock-processor/tests/replica_ordering.rs

Comment on lines +237 to +246
for i in 0..count {
env.advance_slot();
match i % 4 {
0 => txs.push(tx_write(&mut env, group_a[0], i as u8)),
1 => txs.push(tx_write(&mut env, group_b[0], i as u8)),
_ => {
// Independent writes to unique accounts
let idx = (i / 4) as usize % independent.len();
txs.push(tx_write(&mut env, independent[idx], i as u8));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

“Independent” branch currently creates avoidable conflicts.

Line 244 maps both non-conflict branch transactions in each 4-item batch to the same account, so they are not unique writes as the comment states. This weakens the intended independent-workload coverage.

Suggested fix
     let mut txs = Vec::with_capacity(count);
+    let mut independent_idx = 0usize;
     for i in 0..count {
         env.advance_slot();
         match i % 4 {
             0 => txs.push(tx_write(&mut env, group_a[0], i as u8)),
             1 => txs.push(tx_write(&mut env, group_b[0], i as u8)),
             _ => {
                 // Independent writes to unique accounts
-                let idx = (i / 4) as usize % independent.len();
+                let idx = independent_idx % independent.len();
+                independent_idx += 1;
                 txs.push(tx_write(&mut env, independent[idx], i as u8));
             }
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in 0..count {
env.advance_slot();
match i % 4 {
0 => txs.push(tx_write(&mut env, group_a[0], i as u8)),
1 => txs.push(tx_write(&mut env, group_b[0], i as u8)),
_ => {
// Independent writes to unique accounts
let idx = (i / 4) as usize % independent.len();
txs.push(tx_write(&mut env, independent[idx], i as u8));
}
let mut txs = Vec::with_capacity(count);
let mut independent_idx = 0usize;
for i in 0..count {
env.advance_slot();
match i % 4 {
0 => txs.push(tx_write(&mut env, group_a[0], i as u8)),
1 => txs.push(tx_write(&mut env, group_b[0], i as u8)),
_ => {
// Independent writes to unique accounts
let idx = independent_idx % independent.len();
independent_idx += 1;
txs.push(tx_write(&mut env, independent[idx], i as u8));
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/tests/replica_ordering.rs` around lines 237 - 246, The
"Independent" branch reuses the same independent account for both non-conflict
transactions in each 4-item batch, causing unintended conflicts; in the loop
(for i in 0..count) where tx_write is called for group_a/group_b/independent,
change how you compute idx for independent so the two independent branches in a
batch select distinct accounts (e.g., derive idx from the batch number rather
than i directly: use batch = i/4 and an offset for the first vs second
independent slot, or multiply batch by 2 and add 0/1 based on (i % 4) to pick
different independent[(batch*2 + offset) % independent.len()]); update the
selection logic used in the tx_write calls so independent accesses are unique
per non-conflict transaction.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-processor/src/scheduler/coordinator.rs`:
- Around line 234-235: The initialization of PrimaryMode with blocked_txn_count:
0 assumes all executor blocked queues are empty at the Replica→Primary
transition; update this to be defensive by either (a) computing
blocked_txn_count from the actual queue lengths across executors (use the
executor/queue APIs available in the coordinator to sum blocked counts) or (b)
adding a runtime assertion that verifies all blocked queues are empty before
constructing PrimaryMode, or (c) expanding the doc comment on the transition
function to explicitly state the empty-queue invariant; reference PrimaryMode
and blocked_txn_count and the coordinator's executor queue accessors when making
the change so the check or computed initialization is placed at the mode-switch
site right after ledger replay completes.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce34538 and 1acdff2.

📒 Files selected for processing (1)
  • magicblock-processor/src/scheduler/coordinator.rs

Comment on lines +234 to +235
let mode = PrimaryMode {
blocked_txn_count: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

rg -n -C5 'switch_to_primary_mode' magicblock-processor/src/scheduler/

Repository: magicblock-labs/magicblock-validator

Length of output: 3078


🏁 Script executed:

# Get more context around switch_to_primary_mode definition and PrimaryMode initialization
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '220,260p'

Repository: magicblock-labs/magicblock-validator

Length of output: 1697


🏁 Script executed:

# Search for queue initialization and clearing logic
rg -n 'blocked_txns|blocked_queue|replay|clear' magicblock-processor/src/scheduler/coordinator.rs | head -30

Repository: magicblock-labs/magicblock-validator

Length of output: 478


🏁 Script executed:

# Check the CoordinationMode enum and its variants to see queue structure
rg -n -B5 -A10 'enum CoordinationMode\|struct PrimaryMode\|struct ReplicaMode' magicblock-processor/src/scheduler/coordinator.rs | head -60

Repository: magicblock-labs/magicblock-validator

Length of output: 62


🏁 Script executed:

# Get the enum and struct definitions for coordination modes
rg -n -A15 'enum CoordinationMode' magicblock-processor/src/scheduler/coordinator.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 658


🏁 Script executed:

# Get the full struct definitions
rg -n -A8 'struct PrimaryMode\|struct ReplicaMode' magicblock-processor/src/scheduler/coordinator.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 62


🏁 Script executed:

# Check what blocked_transactions is
rg -n 'blocked_transactions' magicblock-processor/src/scheduler/coordinator.rs | head -20

Repository: magicblock-labs/magicblock-validator

Length of output: 405


🏁 Script executed:

# Check how blocked_txn_count is modified throughout the code
rg -n 'blocked_txn_count' magicblock-processor/src/scheduler/coordinator.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 398


🏁 Script executed:

# Get the full context of PrimaryMode struct and find replay clearing logic
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '40,120p'

Repository: magicblock-labs/magicblock-validator

Length of output: 3672


🏁 Script executed:

# Search for where queues might be cleared or checked during mode transition
rg -n 'push|pop|clear|drain' magicblock-processor/src/scheduler/coordinator.rs | head -20

Repository: magicblock-labs/magicblock-validator

Length of output: 410


🏁 Script executed:

# Check the try_schedule and dequeue_blocked methods to understand queue usage context
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '180,220p'

Repository: magicblock-labs/magicblock-validator

Length of output: 2010


🏁 Script executed:

# Verify when transactions are added to blocked_transactions (should only be in Primary mode context)
rg -n -B10 'heap.push' magicblock-processor/src/scheduler/coordinator.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 550


🏁 Script executed:

# Check if there's any explicit clearing or verification of queue state during mode transitions
rg -n 'transition_to_new_slot|replay|Replica|Primary' magicblock-processor/src/scheduler/mod.rs | head -30

Repository: magicblock-labs/magicblock-validator

Length of output: 297


🏁 Script executed:

# Look at the full flow of mode_switcher to understand when switch_to_primary_mode is called
cat -n magicblock-processor/src/scheduler/mod.rs | sed -n '120,160p'

Repository: magicblock-labs/magicblock-validator

Length of output: 2123


🏁 Script executed:

# Check reschedule_blocked_transactions to understand the full lifecycle
rg -n -A15 'fn reschedule_blocked_transactions' magicblock-processor/src/scheduler/mod.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 865


🏁 Script executed:

# Verify: are there any tests that show queue state during mode transitions?
rg -n -B5 -A10 'switch_to_primary_mode' magicblock-processor/src/scheduler/tests.rs

Repository: magicblock-labs/magicblock-validator

Length of output: 702


🏁 Script executed:

# Check transition_to_new_slot to see if it clears state or what happens at slot boundaries
cat -n magicblock-processor/src/scheduler/mod.rs | sed -n '200,230p'

Repository: magicblock-labs/magicblock-validator

Length of output: 1488


🏁 Script executed:

# Look for any assertions or checks about queue state
rg -n 'assert|debug_assert|unwrap|expect' magicblock-processor/src/scheduler/coordinator.rs | grep -i 'queue\|blocked\|empty'

Repository: magicblock-labs/magicblock-validator

Length of output: 170


🏁 Script executed:

# Check what BLOCKED_TXN_MULTIPLIER is
rg -n 'BLOCKED_TXN_MULTIPLIER' magicblock-processor/src/scheduler/

Repository: magicblock-labs/magicblock-validator

Length of output: 269


🏁 Script executed:

# Look at the mode_switcher initialization to understand when it's triggered
rg -n -B10 -A10 'mode_switcher' magicblock-processor/src/scheduler/mod.rs | head -50

Repository: magicblock-labs/magicblock-validator

Length of output: 2220


🏁 Script executed:

# Verify Replica mode doesn't queue in blocked_transactions heaps
cat -n magicblock-processor/src/scheduler/coordinator.rs | sed -n '180,205p'

Repository: magicblock-labs/magicblock-validator

Length of output: 1289


Consider documenting the invariant that queues are empty at mode transition.

The initialization blocked_txn_count: 0 assumes all blocked queues are empty when switching from Replica to Primary. While the doc comment states this is called "after ledger replay completes," explicitly verifying or documenting the invariant would be more defensive. Either:

  1. Initialize blocked_txn_count from actual queue lengths across executors, or
  2. Add a runtime check/assertion that queues are empty, or
  3. Expand the existing doc comment to explicitly state the empty-queue guarantee

This would prevent subtle bugs if the invariant is ever violated by future changes to the replay/mode-switch flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/src/scheduler/coordinator.rs` around lines 234 - 235,
The initialization of PrimaryMode with blocked_txn_count: 0 assumes all executor
blocked queues are empty at the Replica→Primary transition; update this to be
defensive by either (a) computing blocked_txn_count from the actual queue
lengths across executors (use the executor/queue APIs available in the
coordinator to sum blocked counts) or (b) adding a runtime assertion that
verifies all blocked queues are empty before constructing PrimaryMode, or (c)
expanding the doc comment on the transition function to explicitly state the
empty-queue invariant; reference PrimaryMode and blocked_txn_count and the
coordinator's executor queue accessors when making the change so the check or
computed initialization is placed at the mode-switch site right after ledger
replay completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant