tonic clients, blocking and retries
#795
Closed
Mirko-von-Leipzig
started this conversation in
General
Replies: 1 comment 1 reply
-
|
Was this closed by #785 - or is something still outstanding? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My take on blocking vs native tonic client.
TL;DR: use native client, blocking is rare.
Desired behavior imo:
RPC
User gRPC requests should always fail immedietely, no internal retries. This ideally means that transitively that
block-producer::add_transactionshould fail immedietely if theblock-producer -> Storeconnection is broken.Error response must be
internal error, and should be reported aserror!internally.Currently has no startup calls, but one can imagine its possible to cache or mirror some basic state for efficiency reasons. This should loop/retry before spinning up.
Store
Not applicable, no gRPC clients.
Block-producer
add_transactionshould fail immedietely as discussed in### RPC.Start-up should block/retry in a loop. We do want to capture this failure in a span + event for monitoring purposes. This means it probably should not block internally as we cannot capture the error unless we add callbacks etc.
get_batch_inputsandget_block_inputs- this can actually go either way. These are already on a periodic timer so simply failing once-off is actually fine. We expect connection failures to be rare/extraordinary events, so under that assumption simply failing the entire build and "retrying" next time the natural way seems fine to me. Counter-argument is that building is expensive and we should do our utmost to preserve it - fine enough, but this isn't required now imo so let's YAGNI and KISS.Network transaction builder
Not entirely clear yet as this does not exist. Component has a client connection to the
storefor startup andget_tx_inputs, and is a server for theblock-producer.The component is effectively desync'd if either connection is broken, and probably needs to reset in that scenario but I'll argue below that this isn't really a now issue.
On startup, this componment must load all existing network notes from the store, and the associated network accounts' state. This should retry/block.
Further requests to the
storeare all fetching tx inputs. Sidebar: is it possible we have all the required information locally and don't need thestoreIO?block-producersends all network account delta's, new notes and transaction status' to the builder.This must succeed in order for the state to remain in-sync between the two. An issue here is that this will block production of blocks and submission of new txs. Dirty short term solution:
General statements
This implies to me that reconnection can be safely delegated to tonic client. Though we may want a utility to retry with backoff at the outermost level e.g. loop startup code if the error is a retriable error. The shape of this is unclear to me, might be best to see some code here first before deciding on a design. I know there are also crates that implement this already, but it seems like a trivial thing tbh.
A concern I do have with this is that this means connection health is only determinable via external drivers. e.g. the RPC component will only check connection health when driven by a gRPC request, similarly the block-producer only checks when requiring inputs. This is "solved" by having a periodic health check task, which we probably want in any case. My main concern is unknown unknowns here since we haven't actually built it.
We've also only touched on network issues. What's important to note here is that a client component cannot really distinguish between a network issue, and the server component restarting. This is a broader question which we shouldn't really attempt to figure out here. aka when must child components reset if the parent resets.. this is a problem and maybe we need to perform a desync check whenever we reconnect.. but this is an "external" issue and not a tonic client problem so let's just brush that into a different corner and avoid eye contact :) aka let's tackle that another day.
Beta Was this translation helpful? Give feedback.
All reactions