Skip to content

feat(worker): enhance error recovery and panic handling across worker routines#61

Open
vietddude wants to merge 1 commit intomainfrom
fix/rpc-panic
Open

feat(worker): enhance error recovery and panic handling across worker routines#61
vietddude wants to merge 1 commit intomainfrom
fix/rpc-panic

Conversation

@vietddude
Copy link
Collaborator

@vietddude vietddude commented Mar 10, 2026

Summary

Fix a regular-worker crash on EVM/BTC when a batch contains a failed/missing block followed by a valid block, and add panic containment for long-lived worker goroutines.

Problem

The regular worker could panic on this path:

  • GetBlocks(...) returns a batch where block N is missing or errored
  • block N+1 is still present in the same batch
  • the loop skips N because res.Block == nil || res.Error != nil
  • continuity check then evaluates the next result against results[i-1]
  • checkContinuity(results[i-1], res) dereferences a nil previous block and crashes the worker

This was especially visible with transient RPC issues such as rate limits / quota errors.

Separately, long-lived worker goroutines did not consistently recover from panics, so a panic could crash the whole indexer process.

Changes

Regular worker

  • Refactor regular batch processing for reorg-checked chains (EVM/BTC)
  • Stop trusting the rest of a batch once we hit:
    • a block-level error
    • a nil block
    • an out-of-order result
    • a continuity mismatch
  • Immediately switch to same-tick single-block recovery via Indexer.GetBlock(...)
  • Keep recovery bounded with fixed internal retries:
    • 2 attempts
    • 1s delay between attempts
  • Only advance currentBlock across a contiguous recovered prefix
  • Mark only the first unresolved block as failed if recovery still cannot complete
  • Keep rescanner as the fallback path instead of skipping past gaps

Continuity safety

  • Make continuity checks nil-safe
  • Avoid dereferencing prev.Block when the previous batch entry is missing/errored

Panic containment

  • Add shared panic recovery helpers for workers
  • Convert panics inside BaseWorker.run() jobs into returned errors so they flow through the existing retry path
  • Add panic recovery to:
    • catchup loop
    • catchup range workers
    • manual worker background goroutines
    • rescanner background goroutines

Why this approach

  • Preserves correctness for EVM/BTC by never indexing past an unresolved gap
  • Improves liveness by retrying the bad block immediately in the same tick
  • Reuses the existing failover-aware GetBlock(...) path for provider rotation instead of adding custom RPC failover logic in the worker
  • Prevents one worker panic from taking down the entire process

Tests

Added/updated worker tests to cover:

  • gap recovery via single-block fetch after a failed batch entry
  • unresolved gap handling and failed-block persistence
  • nil-safe continuity checks
  • panic-to-error conversion in worker execution paths

Notes

  • No config changes
  • No API changes
  • Rescanner behavior remains the final safety net for blocks that still cannot be recovered immediately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant