Skip to content

Conversation

@eddyashton
Copy link
Member

This has taken a few different forms as I tried to work out what the right API was. I think this is a neat approach, and roughly minimal for apps to implement - a new modifier when installing an endpoint that defers the response until commit, and a default callback that inserts an error response if the TxID is invalidated.

A handful of outstanding items, probably for future PRs but up-for-discussion:

  • Better testing, including one of an invalidated (straddling an election) request.
  • Support for a timeout response. If the terminal state (Committed or Invalid) is not reached for a long time, the caller may still want to know the assigned TxID, rather than simply getting a client HTTP timeout.
  • Support for inserting receipts with the committed response. There's both semantic churn here - I want to lock down how any request works, separately from how some might wait further for a receipt - and some decisions about how we ensure fast-path retention of recent receipts.
  • Variants of more default/built-in endpoints that are respond-on-commit.
  • Naming.

@eddyashton eddyashton requested a review from a team as a code owner January 8, 2026 11:54
Copilot AI review requested due to automatic review settings January 8, 2026 11:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for endpoints that defer their HTTP responses until the transaction is globally committed (or invalidated) by the consensus protocol. The implementation introduces a commit callback subsystem, modifies the task system to use shared ownership for resumable tasks, and integrates these changes throughout the RPC stack.

  • Introduces CommitCallbackSubsystem to manage callbacks triggered when transactions reach terminal consensus states
  • Changes Resumable from unique_ptr to shared_ptr to support capturing in lambda callbacks
  • Adds set_consensus_committed_function to endpoint configuration API allowing endpoints to defer responses

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/e2e_logging.py Adds test comparing response timing between blocking and non-blocking endpoints
src/tasks/task_system.cpp Updates function signature to accept Resumable by value instead of rvalue reference
src/tasks/resumable.h Changes Resumable from unique_ptr to shared_ptr to enable shared ownership
src/node/rpc_context_impl.h Adds optional respond_on_commit field to track deferred responses
src/node/commit_callback_subsystem.h New subsystem managing callbacks for committed/invalidated transactions
src/node/commit_callback_interface.h Interface definition for commit callback subsystem
src/http/http_session.h Implements response deferral by pausing tasks and registering commit callbacks
src/endpoints/endpoint_registry.cpp Adds default handler for commit responses that reports invalidated transactions
src/endpoints/endpoint.cpp Implements setter for consensus_committed_func
src/node/node_state.h Integrates commit_callbacks subsystem into node state
src/enclave/rpc_sessions.h Passes commit callbacks subsystem to HTTP sessions
src/enclave/enclave.h Creates and installs commit callback subsystem during initialization
src/consensus/aft/raft.h Integrates callback execution when transactions commit
src/node/rpc/frontend.h Sets respond_on_commit when endpoint has consensus_committed_func
samples/apps/logging/logging.cpp Demonstrates blocking endpoints with example /log/blocking/private endpoints
include/ccf/tx_status.h Adds FinalTxStatus enum containing only terminal transaction states
include/ccf/endpoint_registry.h Declares default_respond_on_commit_func in public API
include/ccf/endpoint_context.h Defines ConsensusCommittedEndpointFunction type
include/ccf/endpoint.h Adds consensus_committed_func field and setter to Endpoint
CHANGELOG.md Documents the new feature and adds formatting changes to older sections
Comments suppressed due to low confidence (5)

CHANGELOG.md:2160

  • This adds an extra blank line that appears unintentional. This section already had proper formatting and this change doesn't improve readability.
  - The new `command.start.service_configuration.maximum_node_certificate_validity_days` (defaults to 365 days) configuration entry sets the maximum validity period allowed for node certificates.

CHANGELOG.md:1874

  • This adds an extra blank line that appears unintentional. This section already had proper formatting and this change doesn't improve readability.
  - The new `node_certificate.initial_validity_days` (defaults to 1 day) configuration entry lets operators set the initial validity period for the node certificate (valid from the current system time).

CHANGELOG.md:2908

  • This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.
    CHANGELOG.md:2913
  • This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.
- Governance

CHANGELOG.md:2918

  • This adds an extra blank line that appears unintentional. The surrounding sections don't use blank lines to separate items in this way.

tx_id.seqno,
committed.view,
committed.view,
committed.seqno);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check the semantics here of evaluate_tx_status.
We have a tx, 5.15 proposed and then there is an election which calls this with committed = 6.17.
From what I can see of the evaluate_tx_status this would return as invalid as is_committed=true and views_match=false.
I think that this is not quite the right semantics here?

Is there a guarantee that trigger_callbacks would be called with commit index of 5.15 first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good spot. We should not be passing committed.view twice here, the first instance should instead be consensus->get_view(seqno) (as it is in other calls to evaluate_tx_status). local_view is "what view does this node's view-history say target_seqno occured in?"

In your example, we'd then either call with (5, 15, 5, 6, 17) (we're now committed to 6.17, we think seqno 15 happened in view 5, so 5.15 is Committed), or (if 5.15 had been replaced by some 6.15 on this committing node) with (5, 15, 6, 6, 17) (we're now committed to 6.17, we think seqno 15 happened in view 6, so 5.15 is now known to be Invalid).

I'll fix this, and probably add a evaluate_tx_status method on the consensus to avoid this error-prone calling pattern in future.

return network


def test_blocking_calls(network, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile bumping up the signature interval, as that would decrease the impact of noise here.
Unfortunately it would require restarting the network (or adding a new node and forcing it to become primary etc).

ccf::json_adapter(record),
auth_policies)
.set_auto_schema<LoggingRecord::In, bool>()
.set_consensus_committed_function(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intended interface going forward that applications enable return on commit by providing this callback? Are we thinking of providing make_endpoint_v2() that does this by default?

@eddyashton eddyashton marked this pull request as draft January 12, 2026 13:14
tx_id.to_str()));
}

// Else leave the original response untouched, and return it now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, is this a TODO?

}

// Resume processing work for this session
ccf::tasks::resume_task(paused_task);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud.

IIIRC, each task belongs to a session. If a callback throws, shall we force-resume current task to unblock the session? Or does each request spawn a task for itself, so the session will keep going? And in that case, is there any garbage collection for never-resumed tasks needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants