Daemon: map unsuccessful reports to failed terminal state#16
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the daemon API and persistence model to better match the void-control orchestration contract, including stricter terminal-state mapping (unsuccessful reports → failed), richer event envelopes, idempotency semantics, and standardized structured API errors.
Changes:
- Add structured API errors (
{code, message, retryable}) via newApiError/ApiErrorCode. - Extend run/event state with v2 orchestration fields (event IDs, seq, timestamps, attempt IDs), add run listing + filtering,
from_event_idresume, andapi_version=v2event-type formatting. - Introduce run policy plumbing into the runtime and add a new orchestration contract integration test; add
uuid+humantimedependencies.
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/daemon.rs |
Implements idempotent run creation/cancel, event envelope fields, terminal mapping, run listing/filtering, from_event_id, and structured errors. |
src/error.rs |
Adds ApiError / ApiErrorCode for consistent JSON error responses. |
src/persistence.rs |
Adds RunStatus updates, run/event v2 fields, policy model, and helpers for timestamps/event IDs/type mapping. |
src/runtime.rs |
Threads optional run policy through execution to support policy-driven timeouts. |
src/bin/voidbox.rs |
Updates CLI path to new run_file(..., policy) signature. |
tests/orchestration_contract.rs |
New integration contract test validating the daemon HTTP API behavior. |
tests/oci_integration.rs |
Updates run_file call site for new signature. |
Cargo.toml |
Adds uuid (v7) and humantime; registers new test target. |
Cargo.lock |
Locks new dependency versions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async fn list_runs(query: Option<&str>, state: AppState) -> (String, String) { | ||
| let runs = state.runs.lock().await; | ||
| let state_filter = parse_query_param(query, "state"); | ||
| let filtered: Vec<RunState> = runs | ||
| .values() | ||
| .filter(|r| match state_filter.as_deref() { | ||
| Some("active") => r.status.is_active(), | ||
| Some("terminal") => r.status.is_terminal(), | ||
| _ => true, | ||
| }) | ||
| .cloned() | ||
| .collect(); | ||
| ( | ||
| "200 OK".to_string(), | ||
| serde_json::to_string(&ListRunsResponse { runs: filtered }) | ||
| .unwrap_or_else(|_| r#"{"runs":[]}"#.to_string()), | ||
| ) |
There was a problem hiding this comment.
list_runs ignores the api_version=v2 query param, so in v2 mode it still returns runs whose embedded events[*].event_type values are dotted (unless clients use event_type_v2 manually). This is inconsistent with get_run/get_events, and with the stated behavior that v2 returns PascalCase event types. Consider applying the same apply_v2_event_names transformation (or returning event-less summaries) when api_version=v2 is requested.
| // Set state dir to temp to avoid polluting real state | ||
| let dir = tempfile::tempdir().unwrap(); | ||
| std::env::set_var("VOIDBOX_STATE_DIR", dir.path()); | ||
| let _ = void_box::daemon::serve(addr).await; |
There was a problem hiding this comment.
start_daemon() sets VOIDBOX_STATE_DIR via std::env::set_var, but integration tests run in parallel by default and environment variables are process-global. This can cause flaky tests where one daemon instance reads another test’s state dir. Consider serializing the env-var mutation (e.g., global mutex/OnceLock around set_var+serve startup) or changing the daemon/provider to accept an explicit state directory for tests instead of using a global env var.
| let reason = serde_json::from_str::<CancelRunRequest>(body) | ||
| .ok() | ||
| .and_then(|r| r.reason); |
There was a problem hiding this comment.
cancel_run silently ignores invalid JSON bodies (from_str(...).ok()) and proceeds to cancel the run. Given the new structured error handling elsewhere, this makes it easy for clients to send malformed requests without noticing. Consider returning 400 Bad Request with ApiError::invalid_spec(...) when the cancel body is non-empty but fails to parse.
| let reason = serde_json::from_str::<CancelRunRequest>(body) | |
| .ok() | |
| .and_then(|r| r.reason); | |
| // If the body is non-empty, it must be valid JSON matching CancelRunRequest. | |
| // Empty or whitespace-only bodies are allowed and treated as having no explicit reason. | |
| let reason = if body.trim().is_empty() { | |
| None | |
| } else { | |
| match serde_json::from_str::<CancelRunRequest>(body) { | |
| Ok(r) => r.reason, | |
| Err(e) => { | |
| return ( | |
| "400 Bad Request".to_string(), | |
| ApiError::invalid_spec(format!("invalid cancel_run request body: {e}")) | |
| .to_json(), | |
| ); | |
| } | |
| } | |
| }; |
This pull request introduces significant improvements and new features to the API server in
src/daemon.rs, focusing on run management, idempotency, event handling, and API error consistency. It also adds new dependencies for UUID generation and time formatting, and introduces a new orchestration contract test. The most important changes are grouped below:Run Management & Idempotency Improvements
run_idand runpolicyin theCreateRunRequest, enabling clients to create runs with custom IDs and execution policies. The API now enforces idempotency for run creation: if a run with the same ID is already active, it returns the existing run; if terminal, it returns a conflict error. (src/daemon.rs, [1] [2]src/daemon.rs, src/daemon.rsR303-R580)API Features & Filtering
GET /v1/runs) with optional filtering by state (activeorterminal), and introduced support for querying events from a specific event ID (from_event_id). (src/daemon.rs, [1] [2]api_version=v2): in v2 mode, event types are returned in PascalCase, improving client compatibility and future-proofing the API. (src/daemon.rs, src/daemon.rsR303-R580)Event Handling & Structure
src/daemon.rs, src/daemon.rsR637-R656)API Error Handling Consistency
ApiErrortype, replacing ad-hoc JSON error generation for better consistency and easier client parsing. (src/daemon.rs, [1] [2] [3] [4]Dependency & Test Additions
uuidandhumantimecrates for event ID generation and time formatting, and introduced a new orchestration contract test to the test suite. (Cargo.toml, [1] [2]