-
Notifications
You must be signed in to change notification settings - Fork 247
Add support for block-until-commit responses #7562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for endpoints that defer their HTTP responses until the transaction is globally committed (or invalidated) by the consensus protocol. The implementation introduces a commit callback subsystem, modifies the task system to use shared ownership for resumable tasks, and integrates these changes throughout the RPC stack.
- Introduces
CommitCallbackSubsystemto manage callbacks triggered when transactions reach terminal consensus states - Changes
Resumablefromunique_ptrtoshared_ptrto support capturing in lambda callbacks - Adds
set_consensus_committed_functionto endpoint configuration API allowing endpoints to defer responses
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e_logging.py | Adds test comparing response timing between blocking and non-blocking endpoints |
| src/tasks/task_system.cpp | Updates function signature to accept Resumable by value instead of rvalue reference |
| src/tasks/resumable.h | Changes Resumable from unique_ptr to shared_ptr to enable shared ownership |
| src/node/rpc_context_impl.h | Adds optional respond_on_commit field to track deferred responses |
| src/node/commit_callback_subsystem.h | New subsystem managing callbacks for committed/invalidated transactions |
| src/node/commit_callback_interface.h | Interface definition for commit callback subsystem |
| src/http/http_session.h | Implements response deferral by pausing tasks and registering commit callbacks |
| src/endpoints/endpoint_registry.cpp | Adds default handler for commit responses that reports invalidated transactions |
| src/endpoints/endpoint.cpp | Implements setter for consensus_committed_func |
| src/node/node_state.h | Integrates commit_callbacks subsystem into node state |
| src/enclave/rpc_sessions.h | Passes commit callbacks subsystem to HTTP sessions |
| src/enclave/enclave.h | Creates and installs commit callback subsystem during initialization |
| src/consensus/aft/raft.h | Integrates callback execution when transactions commit |
| src/node/rpc/frontend.h | Sets respond_on_commit when endpoint has consensus_committed_func |
| samples/apps/logging/logging.cpp | Demonstrates blocking endpoints with example /log/blocking/private endpoints |
| include/ccf/tx_status.h | Adds FinalTxStatus enum containing only terminal transaction states |
| include/ccf/endpoint_registry.h | Declares default_respond_on_commit_func in public API |
| include/ccf/endpoint_context.h | Defines ConsensusCommittedEndpointFunction type |
| include/ccf/endpoint.h | Adds consensus_committed_func field and setter to Endpoint |
| CHANGELOG.md | Documents the new feature and adds formatting changes to older sections |
Comments suppressed due to low confidence (5)
CHANGELOG.md:2160
- This adds an extra blank line that appears unintentional. This section already had proper formatting and this change doesn't improve readability.
- The new `command.start.service_configuration.maximum_node_certificate_validity_days` (defaults to 365 days) configuration entry sets the maximum validity period allowed for node certificates.
CHANGELOG.md:1874
- This adds an extra blank line that appears unintentional. This section already had proper formatting and this change doesn't improve readability.
- The new `node_certificate.initial_validity_days` (defaults to 1 day) configuration entry lets operators set the initial validity period for the node certificate (valid from the current system time).
CHANGELOG.md:2908
- This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.
CHANGELOG.md:2913 - This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.
- Governance
CHANGELOG.md:2918
- This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.
src/node/commit_callback_subsystem.h
Outdated
| tx_id.seqno, | ||
| committed.view, | ||
| committed.view, | ||
| committed.seqno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check the semantics here of evaluate_tx_status.
We have a tx, 5.15 proposed and then there is an election which calls this with committed = 6.17.
From what I can see of the evaluate_tx_status this would return as invalid as is_committed=true and views_match=false.
I think that this is not quite the right semantics here?
Is there a guarantee that trigger_callbacks would be called with commit index of 5.15 first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good spot. We should not be passing committed.view twice here, the first instance should instead be consensus->get_view(seqno) (as it is in other calls to evaluate_tx_status). local_view is "what view does this node's view-history say target_seqno occured in?"
In your example, we'd then either call with (5, 15, 5, 6, 17) (we're now committed to 6.17, we think seqno 15 happened in view 5, so 5.15 is Committed), or (if 5.15 had been replaced by some 6.15 on this committing node) with (5, 15, 6, 6, 17) (we're now committed to 6.17, we think seqno 15 happened in view 6, so 5.15 is now known to be Invalid).
I'll fix this, and probably add a evaluate_tx_status method on the consensus to avoid this error-prone calling pattern in future.
| return network | ||
|
|
||
|
|
||
| def test_blocking_calls(network, args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worthwhile bumping up the signature interval, as that would decrease the impact of noise here.
Unfortunately it would require restarting the network (or adding a new node and forcing it to become primary etc).
| ccf::json_adapter(record), | ||
| auth_policies) | ||
| .set_auto_schema<LoggingRecord::In, bool>() | ||
| .set_consensus_committed_function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intended interface going forward that applications enable return on commit by providing this callback? Are we thinking of providing make_endpoint_v2() that does this by default?
| tx_id.to_str())); | ||
| } | ||
|
|
||
| // Else leave the original response untouched, and return it now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, is this a TODO?
| } | ||
|
|
||
| // Resume processing work for this session | ||
| ccf::tasks::resume_task(paused_task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud.
IIIRC, each task belongs to a session. If a callback throws, shall we force-resume current task to unblock the session? Or does each request spawn a task for itself, so the session will keep going? And in that case, is there any garbage collection for never-resumed tasks needed?
…into respond_at_commit
This has taken a few different forms as I tried to work out what the right API was. I think this is a neat approach, and roughly minimal for apps to implement - a new modifier when installing an endpoint that defers the response until commit, and a default callback that inserts an error response if the TxID is invalidated.
A handful of outstanding items, probably for future PRs but up-for-discussion:
CommittedorInvalid) is not reached for a long time, the caller may still want to know the assignedTxID, rather than simply getting a client HTTP timeout.