Integrate the zone signer with the new zone storage#500
Conversation
There was a problem hiding this comment.
Took me a while to compare the old code to the new code but LGTM.
However, if I try and run it then after generating and reloading a default policy, and then adding a zone, I get:
$ target/debug/cascade --server 127.0.0.1:2006 zone add --source ./example.com.zone --policy default example.com
Added zone example.com
thread 'cascade-worker' (1351986) panicked at crates/zonedata/src/viewer.rs:274:9:
a signed component cannot be provided without a loaded one
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'cascade-worker' (1351991) panicked at src/zone/mod.rs:468:51:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
|
Oh, that's a bug in the state machine! I still have a pending local branch to test it exhaustively, I should pick that up again soon. Fixing. |
- Take '&Arc<Zone>' instead of the zone name. - Use 'domain::new::base::Serial' where possible. - Remove a duplicate retrieval of zone policy.
This simplifies the control flow in 'sign_zone()' so that it can become a fully synchronous function. I will restore parallelization after the switch to the new zone storage.
'join_sign_zone_queue()' waits for permission to sign the zone, and the performs the actual signing. The waiting part needs to be 'async' (at least, until it is replaced with a synchronous stateful queue), but the signing part should be done on a blocking Tokio task. By splitting the function into the waiting and the signing, it becomes easier to make the signing synchronous (esp. wrt. argument passing and 'static lifetimes).
Now that all 'static lifetime restrictions are gone from within 'sign_zone()', it is possible to use the new zone storage's 'SignedZoneReplacer' for accumulating records.
As signing finishes, it can result in 'on_idle()' and 'SignerZoneHandle::start_pending()' being called. This tripped up an assertion. To prevent races, 'SignerState::ongoing' has been updated to use 'BackgroundTasks'.
| self.data.clone(), | ||
| !self.curr_loaded_index, | ||
| // Iff there is a loaded diff, use '!curr_loaded_index'. | ||
| self.curr_loaded_index ^ self.loaded_diff.is_some(), |
There was a problem hiding this comment.
We need to get this merged so I won't say this is blocking, but to keep in mind for later: I find this very confusing. new() takes a boolean that is named index, to me an index is an unsigned integer, not a boolean, and then inside new() it even treats the bool as a usize. And the Rust Docs on new() don't say anything about what the loaded_index or signed_index function arguments are, and at the call site here it looks like an index is XOR'd with a boolean which is weird, plus since it isn't assigned to a named local variable before being passed as an argument to new() one has to look at the new() function docs to work out what the meaning of the result of the XOR value is.
There was a problem hiding this comment.
Maybe this could be improved later by using a dedicated enum for the "current buffer" of the "double buffer" system, rather than usize and bool.
| .in \\n[rst2man-indent\\n[rst2man-indent-level]]u | ||
| .. | ||
| .TH "CASCADE-DEBUG" "1" "Feb 25, 2026" "0.1.0-alpha5" "Cascade" | ||
| .TH "CASCADE-DEBUG" "1" "Nov 21, 2025" "0.1.0-alpha5" "Cascade" |
There was a problem hiding this comment.
Perhaps re-generate the man pages, you seem to be replacing newer ones with older ones?
There was a problem hiding this comment.
Or don't include this file in the PR?
| Ok(()) => { | ||
| let built = builder.finish().unwrap_or_else(|_| unreachable!()); | ||
| handle.storage().finish_sign(built); | ||
| status.status.finish(true); |
There was a problem hiding this comment.
Possible improvement here (can be later): currently one could mistakenly set, or forget to set, status.current_action = "Aborted", I wonder if that could be incorporated into the line above somehow that calls status.status.finish()?
| record_zone_event( | ||
| ¢er, | ||
| &zone.name, | ||
| handle.state.record_event( |
There was a problem hiding this comment.
Apologies if I asked this before but what was the reason for removing the call to halt_zone() here?
Building on top of #499, this PR makes the core
sign_zone()function use the new zone storage. It adjusts the rest of Cascade to call into the new signer flow functions (from #499) since they are now properly implemented.For reviewers: please check the signing flow, e.g. that the signer correctly launches signed review when it finishes.
NOTE: While developing this PR, I noticed a subtle logic error in our parallelized zone signing code: it arbitrarily splits the unsigned records into segments to pass to
sign_sorted_zone_records(), but that function assumes it has received the full zone in order to locate zone cuts. I have retained this bug in the new code; we need to copy overdomain's signing code and explicitly parallelize it. This effort would be greatly simplified with better zone data structures, so perhaps that should be addressed first.