Skip to content

Integrate the zone signer with the new zone storage#500

Merged
bal-e merged 10 commits intomainfrom
signer-storage
Mar 12, 2026
Merged

Integrate the zone signer with the new zone storage#500
bal-e merged 10 commits intomainfrom
signer-storage

Conversation

@bal-e
Copy link
Contributor

@bal-e bal-e commented Mar 4, 2026

Building on top of #499, this PR makes the core sign_zone() function use the new zone storage. It adjusts the rest of Cascade to call into the new signer flow functions (from #499) since they are now properly implemented.

For reviewers: please check the signing flow, e.g. that the signer correctly launches signed review when it finishes.

NOTE: While developing this PR, I noticed a subtle logic error in our parallelized zone signing code: it arbitrarily splits the unsigned records into segments to pass to sign_sorted_zone_records(), but that function assumes it has received the full zone in order to locate zone cuts. I have retained this bug in the new code; we need to copy over domain's signing code and explicitly parallelize it. This effort would be greatly simplified with better zone data structures, so perhaps that should be addressed first.

@bal-e bal-e added this to the 0.1.0-rc1 milestone Mar 4, 2026
@bal-e bal-e requested review from tertsdiepraam and ximon18 March 4, 2026 07:40
@bal-e bal-e self-assigned this Mar 4, 2026
Copy link
Member

@ximon18 ximon18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to compare the old code to the new code but LGTM.

However, if I try and run it then after generating and reloading a default policy, and then adding a zone, I get:

$ target/debug/cascade --server 127.0.0.1:2006 zone add --source ./example.com.zone --policy default example.com
Added zone example.com

thread 'cascade-worker' (1351986) panicked at crates/zonedata/src/viewer.rs:274:9:
a signed component cannot be provided without a loaded one
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'cascade-worker' (1351991) panicked at src/zone/mod.rs:468:51:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

@bal-e
Copy link
Contributor Author

bal-e commented Mar 9, 2026

Oh, that's a bug in the state machine! I still have a pending local branch to test it exhaustively, I should pick that up again soon. Fixing.

bal-e added 10 commits March 12, 2026 11:55
- Take '&Arc<Zone>' instead of the zone name.
- Use 'domain::new::base::Serial' where possible.
- Remove a duplicate retrieval of zone policy.
This simplifies the control flow in 'sign_zone()' so that it can become
a fully synchronous function. I will restore parallelization after the
switch to the new zone storage.
'join_sign_zone_queue()' waits for permission to sign the zone, and the
performs the actual signing. The waiting part needs to be 'async' (at
least, until it is replaced with a synchronous stateful queue), but the
signing part should be done on a blocking Tokio task. By splitting the
function into the waiting and the signing, it becomes easier to make the
signing synchronous (esp. wrt. argument passing and 'static lifetimes).
Now that all 'static lifetime restrictions are gone from within
'sign_zone()', it is possible to use the new zone storage's
'SignedZoneReplacer' for accumulating records.
As signing finishes, it can result in 'on_idle()' and
'SignerZoneHandle::start_pending()' being called. This tripped up an
assertion. To prevent races, 'SignerState::ongoing' has been updated to
use 'BackgroundTasks'.
Base automatically changed from signer-flow to main March 12, 2026 11:01
self.data.clone(),
!self.curr_loaded_index,
// Iff there is a loaded diff, use '!curr_loaded_index'.
self.curr_loaded_index ^ self.loaded_diff.is_some(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get this merged so I won't say this is blocking, but to keep in mind for later: I find this very confusing. new() takes a boolean that is named index, to me an index is an unsigned integer, not a boolean, and then inside new() it even treats the bool as a usize. And the Rust Docs on new() don't say anything about what the loaded_index or signed_index function arguments are, and at the call site here it looks like an index is XOR'd with a boolean which is weird, plus since it isn't assigned to a named local variable before being passed as an argument to new() one has to look at the new() function docs to work out what the meaning of the result of the XOR value is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be improved later by using a dedicated enum for the "current buffer" of the "double buffer" system, rather than usize and bool.

.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "CASCADE-DEBUG" "1" "Feb 25, 2026" "0.1.0-alpha5" "Cascade"
.TH "CASCADE-DEBUG" "1" "Nov 21, 2025" "0.1.0-alpha5" "Cascade"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps re-generate the man pages, you seem to be replacing newer ones with older ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or don't include this file in the PR?

Ok(()) => {
let built = builder.finish().unwrap_or_else(|_| unreachable!());
handle.storage().finish_sign(built);
status.status.finish(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible improvement here (can be later): currently one could mistakenly set, or forget to set, status.current_action = "Aborted", I wonder if that could be incorporated into the line above somehow that calls status.status.finish()?

record_zone_event(
&center,
&zone.name,
handle.state.record_event(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I asked this before but what was the reason for removing the call to halt_zone() here?

@bal-e bal-e merged commit c1e8505 into main Mar 12, 2026
1 check passed
@bal-e bal-e deleted the signer-storage branch March 12, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants