Skip to content

P4-W2: Export jobs — async dataset export with JSONL and CSV support#123

Merged
user1303836 merged 1 commit intomainfrom
p4-w2-export-jobs
Mar 8, 2026
Merged

P4-W2: Export jobs — async dataset export with JSONL and CSV support#123
user1303836 merged 1 commit intomainfrom
p4-w2-export-jobs

Conversation

@user1303836
Copy link
Owner

Summary

  • Adds async export job system for Silver datasets with 3 new API endpoints:
    • POST /v1/export/dataset — create an export job by dataset name, target, and format (JSONL or CSV)
    • GET /v1/export/jobs/:job_id — poll job status and metadata
    • GET /v1/export/jobs/:job_id/download — download completed export file
  • Introduces ExportFormat enum in core with JSONL and CSV variants
  • Adds 6 export query methods in adapters (one per Silver dataset)
  • Full job lifecycle in api: creation → background processing → download
  • Fixed a String::leak() memory leak in the download handler
  • Preserves existing GET /v1/export/:wallet wallet export endpoint unchanged
  • 24 new tests covering format selection, job lifecycle, dataset routing, and error cases
  • README.md and CLAUDE.md route lists updated to reflect new endpoints

Validation

All 3 required checks pass:

  • cargo fmt --all --check
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace — 737 tests pass ✓

No migration changes. No breaking API changes. No wallet-assumption regressions.

Phase

Phase 4: ETL-First Delivery — Work Packet P4-W2

Test plan

  • Verify cargo test --workspace passes (737 tests)
  • Verify cargo clippy --workspace --all-targets -- -D warnings clean
  • Review export job lifecycle: create → poll → download
  • Confirm existing /v1/export/:wallet endpoint is unaffected
  • Confirm JSONL and CSV format outputs are correct
  • Verify no memory leak from String::leak() pattern

🤖 Generated with Claude Code

Adds export job endpoints for dataset-oriented bulk exports with JSONL
and CSV format support. Export jobs run asynchronously with status polling
and download on completion. Supports all 6 Silver datasets (token_transfers,
native_balance_deltas, decoded_events, hl_fills, hl_funding, positions)
with optional target_id, network, and time range filters. Preserves the
existing wallet export endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: fac1a771d287
@user1303836
Copy link
Owner Author

Remote Review — P4-W2: Export Jobs

Verdict: Review pass is clear. No blocking issues found.

GHA Status

All 4 checks pass: Formatting ✓, Clippy ✓, Tests ✓, Build (release) ✓

Review Focus Assessment

  1. Memory management (in-memory storage, 100k record cap)EXPORT_MAX_RECORDS = 100_000 provides a reasonable bound. TTL pruning via prune_stale_export_jobs cleans up completed jobs. The download_export handler clones the body bytes while holding the read lock, which is necessary since the response needs owned data; acceptable given the bounded record count.

  2. Content-Disposition header — Uses HeaderValue::from_str(&disposition) with proper error handling. No String::leak. Clean.

  3. CSV serializerscsv_escape correctly handles commas, double-quotes, and newlines. Optional fields like token_symbol and direction use unwrap_or("") without escaping, which is safe for their typical value domains (short identifiers, enum-like values). decoded_fields (JSON map) is intentionally omitted from CSV — JSONL covers the full schema. Acceptable design choice.

  4. Parquet deferredExportFormat enum in core is extensible. Adding a Parquet variant later won't require request model changes.

  5. Shared job semaphore — Export jobs acquire from job_semaphore (10 slots shared with ingestion). This controls total concurrency but means a burst of exports could starve ingestion or vice versa. Acceptable for current scale; splitting into separate semaphores is a natural follow-up if contention appears.

  6. Legacy /v1/export/:wallet preserved — Not touched in the diff. Confirmed unchanged.

Additional Observations (non-blocking)

  • The 6 export query methods in v2_repo.rs delegate cleanly to existing query_* methods with EXPORT_MAX_RECORDS and offset=0. Good reuse.
  • 24 new tests cover the full lifecycle: auth, validation, format selection, job status, download (completed/pending/failed), pruning, serialization, all 6 datasets, and non-exportable dataset rejection. Thorough.
  • ExportFormat in core/materializer.rs includes serde roundtrip, FromStr, and Display tests.
  • README and CLAUDE.md route lists are updated.

This packet is ready to merge from the PR-comment review perspective.

@user1303836 user1303836 merged commit c2baa46 into main Mar 8, 2026
4 checks passed
@user1303836 user1303836 deleted the p4-w2-export-jobs branch March 8, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant