-
Notifications
You must be signed in to change notification settings - Fork 26
Description
(This is a discussion issue, not a concrete feature plan yet, arising out of discussion with @timfong888)
Assuming functionality that exists now in #593
Currently, provider selection is stateless across upload() calls. If an SP fails during store or commit, the error is reported to the caller with providerId, and the caller can retry with excludeProviderIds. Ping failures are already handled internally (auto-excluded, invisible to caller).
This works but puts the retry burden entirely on the developer. The SDK could track recent failures and factor them into selection.
What this would look like:
- SDK maintains a short-lived record of SP failures (provider ID + timestamp + failure type)
selectProviders/smartSelectdeprioritises (not excludes) recently-failed SPs- Failure memory decays over time (e.g. 10-30 minutes)
Open design questions:
- How to balance failure history against dataset-matching preference: if SP-A has your dataset but failed 5 minutes ago, do we skip him for a healthy SP that requires a new dataset + payment rail?
- When all endorsed SPs have recent failures, they're effectively equal again: is time-decayed deprioritisation enough, or do we need a distinct "all failed" fallback?
- Scope: store failures, commit failures, or both? Commit failures may be chain-related (gas, nonce) rather than SP health.
- Where does the state live? In-memory on
Synapseinstance is simplest (it's the stateful bit after all) but lost on restart. Probably fine, this is a session-level optimization, not durable state.
--- Update by @timfong888 Feb 24 2026
The core trade-off: Simpler SDK = more work for developers. Smarter SDK = more edge cases we own.
| Ping | Request from SP | Store | Commit | Flags | |||||
|---|---|---|---|---|---|---|---|---|---|
| Success | Fail | Success | Fail | Success | Fail | Success | Fail | ||
| Primary 0 | Store |
Ping other (All Endorsed -SP failed) randomly |
N/A | N/A | Commit |
Currently: throw |
Go to Secondary | throw |
Developer has burden of checking message, retrying |
| Primary 1 - diff | retry before payload GC'd (24 hour period) If fails then throw |
Limbo state - what happens if fails to commit after 24 hours? Means 24 hours till we have a success on Endorsed. |
|||||||
| Primary 2 - diff | If all Endorsed = SP failed then, throw;Else try (All Endorsed - SP failed) |
retry from Ping; exclude failed ID |
retry from Ping; exclude failed ID; No retry of commit() |
New SP provider, additional floor price; If chainstate slowness delayed commit, may commit original SP AND the successful retry SP. Now 3 copies. Retry store: client has additional bandwidth because needs to re-uploadto the new SP. |
|||||
| Secondary | Request |
Ping Approved SPs randomly |
Store |
Fails | Commit |
Failure message | Success | Failure message |
| Option | What It Does | Complexity | Edge Case Risk | Developer Burden |
|---|---|---|---|---|
| A. Keep Primary 0 | Throw on failure, no retry, no state | Low | Low — developer handles all retries | High — they build retry, failover, partial-commit handling |
| B. Add Primary 1-diff (stateful retry) | SDK tracks failed SPs, retries excluding them, returns detailed result objects | Medium-High | High — see edge cases below | Low — SDK handles most failure paths |
| C. Hybrid — simple SDK + documented patterns | Keep SDK at Primary 0, ship retry recipes and error-handling guides | Low | Low (same as A) | Medium — guided but still developer-owned |
| Edge Case | What Happens | User Impact |
|---|---|---|
| Partial commit | Primary commits on-chain, secondary fails | User pays for 1 copy, thinks they have 2 (or thinks it all failed) |
| Stale exclusion | SP recovers mid-retry but is still in the failed set | Unnecessarily narrow SP pool, may exhaust all options |
| Duplicate commit on retry | Developer retries after partial success → 3 copies on-chain | User overpays |
| Concurrent calls sharing state | Two parallel store calls mutate the same failed-SP list | Unpredictable routing within TTL window |
Metadata
Metadata
Assignees
Labels
Type
Projects
Status