Skip to content

P4-W4: Metadata and observability for export jobs#125

Merged
user1303836 merged 1 commit intomainfrom
p4-w4-metadata-observability
Mar 9, 2026
Merged

P4-W4: Metadata and observability for export jobs#125
user1303836 merged 1 commit intomainfrom
p4-w4-metadata-observability

Conversation

@user1303836
Copy link
Owner

Summary

Attach provenance metadata and status introspection to export and materialization jobs so downstream consumers can determine data provenance and completeness.

  • ExportJobStatus enrichment — added dataset_version_id, completeness_status, started_at / completed_at wall-clock timestamps, and last_ingestion_run_id to ExportJobStatus
  • run_export_job provenance lookup — enriched the export job runner to look up the active DatasetVersion and DatasetCompleteness records and propagate them into the job status
  • DeliveryMetadata provenance — propagated provenance fields into DeliveryMetadata so sinks receive complete lineage information
  • GET /v1/datasets/{name}/status — new materialization introspection endpoint returning dataset status, active version, completeness, and recent export job history
  • Comprehensive tests — 8 new tests covering serde round-trips, backward compatibility, defaults, provenance fields, and response shapes (780 total, 0 failures)

Changed files

File Change
core/src/materializer.rs New provenance fields on ExportJobStatus and DeliveryMetadata
adapters/src/v2_repo.rs Provenance lookup helpers for dataset version and completeness
api/src/main.rs GET /v1/datasets/{name}/status endpoint; provenance wiring in export runner
README.md Document new endpoint

Validation

All CI-equivalent checks pass locally:

  • cargo fmt --all --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace780 tests, 0 failures
  • 8 new P4-W4 tests cover metadata serialization, backward compatibility, defaults, delivery provenance, and dataset status response shape

Test plan

  • Verify cargo test --workspace passes in CI
  • Verify cargo clippy and cargo fmt pass in CI
  • Review provenance fields on ExportJobStatus for completeness
  • Review /v1/datasets/{name}/status response shape
  • Confirm backward compatibility — old clients without new fields still deserialize correctly

🤖 Generated with Claude Code

Enrich ExportJobStatus with provenance metadata: dataset_version_id,
dataset_version, completeness_status, completeness_coverage, started_at,
completed_at, and last_ingestion_run_id. These fields use
skip_serializing_if for backward compatibility.

Enhance run_export_job to look up the active DatasetVersion and
DatasetCompleteness records, aggregating coverage bounds and status
across matching targets/networks.

Enrich DeliveryMetadata with dataset_version_id and completeness_status
so sink consumers receive provenance context.

Add GET /v1/datasets/{name}/status endpoint for materialization
introspection: returns active version, all versions, and completeness
records for downstream consumers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 6a7e5fd842e4
@user1303836
Copy link
Owner Author

P4-W4 Remote Review: Metadata and Observability

Verdict: No blocking issues. Review pass is clear.

Reviewed the full diff against the 6 review criteria:

✅ Backward compatibility

All 7 new fields on ExportJobStatus use skip_serializing_if = "Option::is_none". Tests (test_export_job_status_backward_compat_no_metadata, test_export_job_status_metadata_fields_omitted_when_none) explicitly verify that None metadata fields are omitted from JSON, preserving the pre-P4-W4 response shape. DeliveryMetadata provenance fields also correctly use skip_serializing_if.

✅ Query correctness

get_active_dataset_versionfetch_optional → caller uses .ok().flatten(). list_completeness_filteredfetch_all → caller uses .unwrap_or_default(). Both handle missing records gracefully with no panic paths. Dynamic SQL parameter binding in list_completeness_filtered correctly tracks param_idx to match bind order across all filter combinations.

✅ No wallet-only assumptions

Export code uses generic target_id: Option<Uuid> and network: Option<&str> throughout. Completeness filtering, provenance lookups, and the new status endpoint are all target-type agnostic.

✅ Performance

Metadata lookups (one version query + one completeness list) execute once per export job at start time, not on status polls. The /v1/datasets/{name}/status endpoint makes two straightforward queries. No N+1 patterns or heavy joins.

✅ No migration required

All enrichment is in-memory. ExportMetadata, DatasetStatus, and DatasetCompletenessInfo are API-layer types only. Queries use existing dataset_versions and dataset_completeness tables.

⚠️ Non-blocking: gap_ranges omission

gap_ranges from DatasetCompleteness is fetched from the DB but not propagated to either completeness_coverage in the export response or DatasetCompletenessInfo in the status endpoint. The presence of gaps is captured via the aggregated status string (e.g., "gap"), and the full gap_ranges remain available through the existing /v1/datasets/{name}/completeness endpoint. Worth adding in a follow-up for completeness, but not a blocker since the semantics are not reduced to a boolean.

Non-blocking notes

  1. last_ingestion_run_id aggregation uses .rev().find_map() based on query ordering (ORDER BY target_id, network), not temporal ordering. Consider selecting the run ID from the record with the most recent updated_at in a follow-up.
  2. DatasetCompletenessInfo omits block_start/block_end from the per-record status view (they are included in aggregated export coverage). Minor omission.

This packet is ready to merge from the PR-comment review perspective.

@user1303836 user1303836 merged commit f76539b into main Mar 9, 2026
4 checks passed
@user1303836 user1303836 deleted the p4-w4-metadata-observability branch March 9, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant