Skip to content

feat: snapshot CLI + Jira/Salesforce with smart seeding#16

Closed
zozo123 wants to merge 5 commits intomainfrom
feat/airbyte-services
Closed

feat: snapshot CLI + Jira/Salesforce with smart seeding#16
zozo123 wants to merge 5 commits intomainfrom
feat/airbyte-services

Conversation

@zozo123
Copy link
Contributor

@zozo123 zozo123 commented Feb 15, 2026

Summary

Adds doubleagent snapshot CLI + Jira/Salesforce fake services + the snapshot_pull tool for Airbyte connectors. No SDK dependency — each service is self-contained.

What's in this PR

1. Rust CLI: doubleagent snapshot (5 subcommands)

  • pull <service> — auto-detects connector type from service.yaml:
    • type: airbyte → delegates to snapshot_pull (PyAirbyte, no Docker)
    • type: native → runs service's connector/pull.py
  • list, inspect, delete, push — manage snapshot profiles

2. snapshot_pull tool (services/_lib/src/snapshot_pull/)

Standalone Python package the CLI calls for Airbyte-backed services:

  • PyAirbyte backend — pip-installed connectors, no Docker needed
  • Smart relational filtering — BFS traversal: 3 projects → 10 issues each → 5 comments per issue. Small, relationally-consistent datasets instead of brute-force dumps
  • PII redaction — deterministic anonymization before writing to disk (emails → user-N@doubleagent.local)
  • Snapshot storage — manifest + per-resource JSON files at ~/.doubleagent/snapshots/

3. Jira + Salesforce services (read-only, snapshot-backed)

Self-contained FastAPI servers with inline COW state + namespace isolation:

  • /resources — list all resource types with counts
  • /resources/{type} — list/search/filter resources (supports ?q=field=value)
  • /resources/{type}/{id} — get by ID
  • Full control plane (/_doubleagent/health, reset, seed, bootstrap, info, namespaces)

Jira: 8 streams (projects, issues, comments, sprints, boards, users, fields, workflows)
Salesforce: 8 streams (accounts, contacts, leads, opportunities, cases, users, tasks, events)

Architecture

doubleagent snapshot pull jira
  → Rust CLI reads service.yaml → type: airbyte
  → uv run python -m snapshot_pull --service jira --image airbyte/source-jira:latest \
      --streams projects,issues,... --config-env JIRA_API_TOKEN=api_token \
      --seeding-json '{"seed_streams": [...]}'
  → PyAirbyte pulls 8 streams
  → Smart filter: 3 projects → issues/sprints/boards per project → comments per issue
  → PII redacted → saved to ~/.doubleagent/snapshots/jira/default/

doubleagent start jira
  → FastAPI server loads snapshot as read-only baseline
  → /resources/projects, /resources/issues, etc.

Test plan

  • cargo check — zero warnings
  • Jira smoke test — health, bootstrap, resource CRUD, namespace isolation, reset
  • Salesforce smoke test — health, bootstrap, resource CRUD, filtering
  • python -m snapshot_pull --help — entry point works
  • Python imports resolve (all 5 modules)

🤖 Generated with Claude Code

@tomerezer
Copy link
Contributor

Not sure we want to integrate wit airbyte automatically ?

@zozo123 zozo123 force-pushed the feat/airbyte-services branch from 02dd1f0 to a6bf78f Compare February 16, 2026 13:20
@zozo123 zozo123 changed the base branch from feat/sdk-library to main February 16, 2026 13:20
zozo123 and others added 3 commits February 16, 2026 15:51
…eeding

Adds the `doubleagent snapshot` CLI command and wires up Jira and
Salesforce as zero-code Airbyte-backed services with smart seeding.

Rust CLI:
- snapshot pull/list/inspect/delete/push subcommands
- ConnectorConfig struct with type/image/streams/config_env/stream_mapping
- seeding + backend fields for smart relational filtering
- Passes --seeding-json and --backend to Python subprocess

Services (zero custom code — use generic_server.py from SDK):
- Jira: 8 streams, smart seeding (3 projects → issues/sprints/boards/comments)
- Salesforce: 8 streams, smart seeding (5 accounts → contacts/opportunities/cases)

Smart seeding means: pull 3 Jira projects, follow foreign keys to get
only their issues (max 10 each), then follow to comments (max 5 per issue).
Result: small, relationally-consistent datasets instead of brute-force dumps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Write self-contained main.py for Jira and Salesforce services
  (inline StateOverlay, NamespaceRouter, read-only resource endpoints)
- Replace SDK generic_server with per-service FastAPI servers
- Add crates/core/src/snapshot.rs for Rust-side snapshot manifest I/O
- Simplify snapshot CLI pull command to delegate to connector/pull.py
- Remove doubleagent-sdk dependency from all service pyproject.toml
- Fix seed.rs for Option<String> file argument

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… pull

Adds the `snapshot_pull` Python package in services/_lib/ that handles
pulling data from Airbyte source connectors (PyAirbyte backend, no Docker
needed). Includes smart relational filtering, PII redaction, and snapshot
storage.

Restores Airbyte routing in the Rust CLI — `doubleagent snapshot pull jira`
now detects `type: airbyte` in service.yaml and delegates to
`python -m snapshot_pull` with the connector config.

This is infrastructure tooling (called by the CLI), not a service SDK.
Services remain fully self-contained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zozo123 zozo123 force-pushed the feat/airbyte-services branch from 76f95a4 to b186670 Compare February 16, 2026 13:52
@zozo123 zozo123 requested a review from tomerezer February 16, 2026 14:44
@zozo123
Copy link
Contributor Author

zozo123 commented Feb 16, 2026

Needs revision per feedback:

  • Snapshot should work with the existing /_doubleagent/seed endpoint, not a separate /_doubleagent/bootstrap with immutable baselines
  • Remove namespace isolation from Jira/Salesforce services
  • Remove COW state / inlined SDK classes — use simple dict state
  • The doubleagent snapshot pull + PyAirbyte infrastructure is useful, but the output should just be seed-compatible YAML/JSON files

The snapshot CLI and Airbyte connector approach are good — just need to feed into seed instead of a separate bootstrap concept.

…es, COW state

Strip StateOverlay/NamespaceRouter from Jira and Salesforce snapshot servers,
replace with simple dict state. Remove bootstrap/info/namespaces endpoints.
Remove --hard, --namespace, --validate-against-real CLI flags. Remove httpx
dependency from both services.

322 → 122 lines per service. Plain Python, zero framework overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zozo123
Copy link
Contributor Author

zozo123 commented Feb 16, 2026

Revised — Simplified Jira/Salesforce services and CLI per feedback:

Python services:

  • Removed StateOverlay/NamespaceRouter from both Jira and Salesforce servers — ~200 lines deleted each
  • Replaced with simple global dict state
  • Removed bootstrap/info/namespaces control plane endpoints
  • Removed httpx dependency from both services
  • 322 → 122 lines per service

CLI:

  • Removed --hard flag from doubleagent reset
  • Removed --namespace flag from doubleagent start and doubleagent run
  • Removed --validate-against-real flag from doubleagent contract
  • Kept snapshot subcommand (pull/list/inspect/delete/push) and --snapshot on start/run

cargo check passes clean.

…tracts

- Run cargo fmt on snapshot.rs
- Fix clippy unnecessary_unwrap: use if-let instead of is_some + unwrap
- Update CI discovery to only include services with a contracts/ dir
  (Jira/Salesforce are snapshot-only, no contract tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@tomerezer tomerezer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd separate the snapshot to its own PR

zozo123 added a commit that referenced this pull request Feb 23, 2026
Cherry-picked from feat/airbyte-services (#16):
- Jira fake service (8 streams, snapshot-backed)
- Salesforce fake service (8 streams, snapshot-backed)
- CI: discover only services with contracts/ dir
- .gitignore: snapshot and venv patterns
- _lib/pyproject.toml for snapshot_pull package

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zozo123
Copy link
Contributor Author

zozo123 commented Feb 23, 2026

Superseded by #33 (consolidated PR)

@zozo123 zozo123 closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants