Integrate the zone signer with the new zone storage by bal-e · Pull Request #500 · NLnetLabs/cascade

bal-e · 2026-03-04T07:40:03Z

Building on top of #499, this PR makes the core sign_zone() function use the new zone storage. It adjusts the rest of Cascade to call into the new signer flow functions (from #499) since they are now properly implemented.

For reviewers: please check the signing flow, e.g. that the signer correctly launches signed review when it finishes.

NOTE: While developing this PR, I noticed a subtle logic error in our parallelized zone signing code: it arbitrarily splits the unsigned records into segments to pass to sign_sorted_zone_records(), but that function assumes it has received the full zone in order to locate zone cuts. I have retained this bug in the new code; we need to copy over domain's signing code and explicitly parallelize it. This effort would be greatly simplified with better zone data structures, so perhaps that should be addressed first.

src/units/zone_signer.rs

src/signer/mod.rs

src/units/zone_signer.rs

ximon18

Took me a while to compare the old code to the new code but LGTM.

However, if I try and run it then after generating and reloading a default policy, and then adding a zone, I get:

$ target/debug/cascade --server 127.0.0.1:2006 zone add --source ./example.com.zone --policy default example.com
Added zone example.com

thread 'cascade-worker' (1351986) panicked at crates/zonedata/src/viewer.rs:274:9:
a signed component cannot be provided without a loaded one
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'cascade-worker' (1351991) panicked at src/zone/mod.rs:468:51:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }

bal-e · 2026-03-09T13:50:25Z

Oh, that's a bug in the state machine! I still have a pending local branch to test it exhaustively, I should pick that up again soon. Fixing.

- Take '&Arc<Zone>' instead of the zone name. - Use 'domain::new::base::Serial' where possible. - Remove a duplicate retrieval of zone policy.

This simplifies the control flow in 'sign_zone()' so that it can become a fully synchronous function. I will restore parallelization after the switch to the new zone storage.

'join_sign_zone_queue()' waits for permission to sign the zone, and the performs the actual signing. The waiting part needs to be 'async' (at least, until it is replaced with a synchronous stateful queue), but the signing part should be done on a blocking Tokio task. By splitting the function into the waiting and the signing, it becomes easier to make the signing synchronous (esp. wrt. argument passing and 'static lifetimes).

Now that all 'static lifetime restrictions are gone from within 'sign_zone()', it is possible to use the new zone storage's 'SignedZoneReplacer' for accumulating records.

As signing finishes, it can result in 'on_idle()' and 'SignerZoneHandle::start_pending()' being called. This tripped up an assertion. To prevent races, 'SignerState::ongoing' has been updated to use 'BackgroundTasks'.

ximon18 · 2026-03-12T12:00:37Z

crates/zonedata/src/storage/transitions.rs

                self.data.clone(),
-                !self.curr_loaded_index,
+                // Iff there is a loaded diff, use '!curr_loaded_index'.
+                self.curr_loaded_index ^ self.loaded_diff.is_some(),


We need to get this merged so I won't say this is blocking, but to keep in mind for later: I find this very confusing. new() takes a boolean that is named index, to me an index is an unsigned integer, not a boolean, and then inside new() it even treats the bool as a usize. And the Rust Docs on new() don't say anything about what the loaded_index or signed_index function arguments are, and at the call site here it looks like an index is XOR'd with a boolean which is weird, plus since it isn't assigned to a named local variable before being passed as an argument to new() one has to look at the new() function docs to work out what the meaning of the result of the XOR value is.

Maybe this could be improved later by using a dedicated enum for the "current buffer" of the "double buffer" system, rather than usize and bool.

ximon18 · 2026-03-12T12:10:06Z

doc/manual/build/man/cascade-debug.1

 .in \\n[rst2man-indent\\n[rst2man-indent-level]]u
 ..
-.TH "CASCADE-DEBUG" "1" "Feb 25, 2026" "0.1.0-alpha5" "Cascade"
+.TH "CASCADE-DEBUG" "1" "Nov 21, 2025" "0.1.0-alpha5" "Cascade"


Perhaps re-generate the man pages, you seem to be replacing newer ones with older ones?

Or don't include this file in the PR?

ximon18 · 2026-03-12T12:17:01Z

src/signer/mod.rs

+        Ok(()) => {
+            let built = builder.finish().unwrap_or_else(|_| unreachable!());
+            handle.storage().finish_sign(built);
+            status.status.finish(true);


Possible improvement here (can be later): currently one could mistakenly set, or forget to set, status.current_action = "Aborted", I wonder if that could be incorporated into the line above somehow that calls status.status.finish()?

ximon18 · 2026-03-12T12:21:24Z

src/signer/mod.rs

-            record_zone_event(
-                &center,
-                &zone.name,
+            handle.state.record_event(


Apologies if I asked this before but what was the reason for removing the call to halt_zone() here?

bal-e added this to the 0.1.0-rc1 milestone Mar 4, 2026

bal-e requested review from tertsdiepraam and ximon18 March 4, 2026 07:40

bal-e self-assigned this Mar 4, 2026

bal-e force-pushed the signer-storage branch from 2050b8d to 0b21714 Compare March 4, 2026 18:22