feat(worker): enhance error recovery and panic handling across worker routines by vietddude · Pull Request #61 · fystack/multichain-indexer

vietddude · 2026-03-10T04:02:14Z

Summary

Fix a regular-worker crash on EVM/BTC when a batch contains a failed/missing block followed by a valid block, and add panic containment for long-lived worker goroutines.

Problem

The regular worker could panic on this path:

GetBlocks(...) returns a batch where block N is missing or errored
block N+1 is still present in the same batch
the loop skips N because res.Block == nil || res.Error != nil
continuity check then evaluates the next result against results[i-1]
checkContinuity(results[i-1], res) dereferences a nil previous block and crashes the worker

This was especially visible with transient RPC issues such as rate limits / quota errors.

Separately, long-lived worker goroutines did not consistently recover from panics, so a panic could crash the whole indexer process.

Changes

Regular worker

Refactor regular batch processing for reorg-checked chains (EVM/BTC)
Stop trusting the rest of a batch once we hit:
- a block-level error
- a nil block
- an out-of-order result
- a continuity mismatch
Immediately switch to same-tick single-block recovery via Indexer.GetBlock(...)
Keep recovery bounded with fixed internal retries:
- 2 attempts
- 1s delay between attempts
Only advance currentBlock across a contiguous recovered prefix
Mark only the first unresolved block as failed if recovery still cannot complete
Keep rescanner as the fallback path instead of skipping past gaps

Continuity safety

Make continuity checks nil-safe
Avoid dereferencing prev.Block when the previous batch entry is missing/errored

Panic containment

Add shared panic recovery helpers for workers
Convert panics inside BaseWorker.run() jobs into returned errors so they flow through the existing retry path
Add panic recovery to:
- catchup loop
- catchup range workers
- manual worker background goroutines
- rescanner background goroutines

Why this approach

Preserves correctness for EVM/BTC by never indexing past an unresolved gap
Improves liveness by retrying the bad block immediately in the same tick
Reuses the existing failover-aware GetBlock(...) path for provider rotation instead of adding custom RPC failover logic in the worker
Prevents one worker panic from taking down the entire process

Tests

Added/updated worker tests to cover:

gap recovery via single-block fetch after a failed batch entry
unresolved gap handling and failed-block persistence
nil-safe continuity checks
panic-to-error conversion in worker execution paths

Notes

No config changes
No API changes
Rescanner behavior remains the final safety net for blocks that still cannot be recovered immediately

… routines

feat(worker): enhance error recovery and panic handling across worker…

c699e27

… routines

vietddude requested a review from anhthii March 10, 2026 04:02

vietddude mentioned this pull request Mar 10, 2026

fix: nil pointer dereference panic in RegularWorker.checkContinuity + add panic recovery to worker goroutines #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(worker): enhance error recovery and panic handling across worker routines#61

feat(worker): enhance error recovery and panic handling across worker routines#61
vietddude wants to merge 1 commit intomainfrom
fix/rpc-panic

vietddude commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vietddude commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Regular worker

Continuity safety

Panic containment

Why this approach

Tests

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vietddude commented Mar 10, 2026 •

edited

Loading