Skip to content

Comments

op-node: add shadow engine support for fire-and-forget Engine API replication#5

Open
cody-wang-cb wants to merge 3 commits intodevelopfrom
cody/shadow_l2
Open

op-node: add shadow engine support for fire-and-forget Engine API replication#5
cody-wang-cb wants to merge 3 commits intodevelopfrom
cody/shadow_l2

Conversation

@cody-wang-cb
Copy link
Owner

Summary

Adds support for op-node to replicate Engine API calls (ForkchoiceUpdate, NewPayload, GetPayload) to additional "shadow" EL endpoints in a fire-and-forget manner. This enables running shadow block builders (e.g., reth-based) alongside the primary EL for comparison, testing, or redundancy purposes.

Key features:

  • Async, non-blocking replication to one or more shadow EL endpoints
  • Per-endpoint JWT secret configuration
  • Conductor-aware: only forwards when node is the active sequencer leader (prevents conflicting calls in HA setups)
  • Full observability via Prometheus metrics
  • Configurable buffer size with drop-on-full backpressure handling

Motivation

In an HA sequencer setup with multiple nodes and a shadow block builder, we need:

  1. Only the active leader to send Engine API calls to the shadow builder
  2. Fire-and-forget semantics so shadow EL latency doesn't affect block production
  3. Observability into shadow engine health and performance

Configuration

op-node \
  --l2=http://primary:8551 \
  --l2.jwt-secret=/path/to/primary.jwt \
  --l2.shadow-engines=http://shadow1:8551,http://shadow2:8551 \
  --l2.shadow-engines.jwt-secrets=/path/to/shadow1.jwt,/path/to/shadow2.jwt \
  --l2.shadow-engines.timeout=5s \
  --l2.shadow-engines.buffer=100

Metrics

Metric: op_node_shadow_engine_calls_total
Description: Total calls by endpoint/method/success
────────────────────────────────────────
Metric: op_node_shadow_engine_call_duration_seconds
Description: Call latency histogram
────────────────────────────────────────
Metric: op_node_shadow_engine_calls_dropped_total
Description: Dropped calls (queue full)
────────────────────────────────────────
Metric: op_node_shadow_engine_queue_depth
Description: Current queue depth

Test plan

  • Unit tests for shadow engine lifecycle, execution, failure handling
  • Unit tests for conductor-aware forwarding (skips when not leader)
  • go build ./op-node/... passes
  • Manual test with shadow reth node

🤖 Generated with https://claude.com/claude-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant