Security testing framework for RAG systems. Measures information leakage, detects corpus poisoning, and gates CI pipelines β all deterministically.
π Read the Whitepaper (PDF) β threat model, methodology, and evaluation results.
- Features
- Quickstart
- CLI Commands
- CI Integration
- Configuration
- Benchmark Bundles
- External Results
- Updating Baseline
- Adoption
- Documentation
- Project Governance
- Development
- Local Gates
- Project Structure
| Threat | Pack | Description |
|---|---|---|
| Canary Extraction | canary-basic |
Detects planted secret tokens in outputs |
| Verbatim Extraction | verbatim-basic |
Measures direct text reproduction from corpus |
| Membership Inference | membership-basic |
Detects if specific documents were in corpus |
| Semantic Leakage | semantic-basic |
Detects sensitive claims (financial, medical, legal) |
| Cross-Document | crossdoc-basic |
Detects information combined from multiple documents |
| Corpus Poisoning | sentinel-takeover-safe |
Detects backdoor triggers and sentinel injections |
- Regression gates β
diffcommand exits non-zero on metric regression - Delta ingestion gates β detect leakage regressions when corpus changes
- SARIF + JUnit output β findings in GitHub Security tab and test reporters
- Deterministic by default β
verify determinismcommand validates reproducibility - Cassette record/replay β network-free CI with recorded HTTP responses
- Report summarizer β
report summarizewith top findings, attribution, remediation - GitHub annotations β
report annotateemits::error::/::warning::in PRs - Threshold calibration β
calibratefinds optimal pass/fail thresholds - Benchmark bundles β
bench bundleruns all packs, produces leaderboard results - Secret redaction β emails, API keys, canary tokens scrubbed from all outputs
- Query minimization β
--minimize-on-failreduces leaking queries to minimal form
- Parallel execution β
--jobs Nfor multi-core speedup - Plugin system β entry-point plugins for custom metrics, attacks, and targets
- HTTP target adapter β test any RAG API with SSRF protection and domain allowlisting
- Asset validation β
assets validatechecks pack manifests and corpora - Config validation β
config validatewith JSON Schema export
# Install
uv sync --all-extras
# Run a built-in pack
uv run ragleaklab run --pack canary-basic --out out/canary
# Run against your corpus
uv run ragleaklab run \
--corpus data/corpus_private_canary \
--attacks data/attacks \
--out out/
# Compare against baseline (for CI)
uv run ragleaklab diff \
--baseline baselines/v1/report.json \
--current out/report.json| File | Purpose |
|---|---|
out/report.json |
Summary metrics, pass/fail verdict |
out/runs.jsonl |
Per-case results (1 JSON per line) |
out/junit.xml |
JUnit test results (with --format junit) |
out/results.sarif |
SARIF findings (with --format sarif) |
ragleaklab run Run attack test cases against a target
ragleaklab diff Compare reports for regressions
ragleaklab calibrate Calibrate pack thresholds
ragleaklab bench Benchmark bundle / time / publish / validate
ragleaklab delta Delta ingestion gate
ragleaklab report Summarize / annotate findings
ragleaklab verify Verify determinism
ragleaklab attacks Attack coverage analysis
ragleaklab assets Asset build / validate
ragleaklab config Config validate / export
ragleaklab version Show version info
RAGLeakLab is designed for CI pipelines. The diff command exits with code 1
on regression:
# .github/workflows/security-audit.yml
- name: Security audit
run: |
uv run ragleaklab run \
--corpus data/corpus_private_canary \
--attacks data/attacks \
--out out/ \
--format junit \
--format sarif
- name: Upload test results
uses: dorny/test-reporter@v1
with:
name: RAGLeakLab Results
path: out/junit.xml
reporter: java-junit
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: out/results.sarif
- name: Regression gate
run: |
uv run ragleaklab diff \
--baseline baselines/v1/report.json \
--current out/report.json| Format | File | Purpose |
|---|---|---|
--format json |
report.json |
Machine-readable summary |
--format junit |
junit.xml |
Test results in CI UI |
--format sarif |
results.sarif |
GitHub Security alerts |
--format md |
summary.md |
Human-readable Markdown |
| Metric | Fail Condition |
|---|---|
canary_extracted |
false β true |
verbatim_leakage_rate |
Increase > 1% |
membership_confidence |
Increase > 5% |
See docs/CI.md for anti-patterns and best practices.
Use the summarizer to understand what leaked:
# Summarize findings from the output directory
uv run ragleaklab report summarize --in out/ --top 20
# For markdown output (good for PR comments)
uv run ragleaklab report summarize --in out/ --format md
# Emit GitHub-style annotations (::error::, ::warning::)
uv run ragleaklab report annotate --in out/See docs/TRIAGE.md for the complete triage guide.
Use --config for full configuration including HTTP targets:
uv run ragleaklab run --config ragleaklab.yaml --out out/Example config (see examples/ragleaklab.yaml):
corpus:
path: data/corpus_private_canary
attacks:
path: data/attacks
thresholds:
verbatim_delta: 0.01
membership_delta: 0.05
# Built-in pipeline (default)
target:
type: inprocess
top_k: 3
# OR: External HTTP RAG service
# target:
# type: http
# url: http://localhost:8000/ask
# method: POST
# request_json:
# question: "{{query}}"
# response:
# answer_field: "answer"
# headers:
# Authorization: "Bearer ${API_TOKEN}"
# timeout_sec: 30
# allowed_domains: [rag.example.com]Warning
Do not use HTTP targets in CI without cassette record/replay β non-deterministic and may incur costs.
Run all packs as a benchmark suite:
# Run benchmark bundle
uv run ragleaklab bench bundle \
--bundle benchmarks/ragleakbench_v1/bundle.yaml \
--out out/bench
# Publish results
uv run ragleaklab bench publish \
--in out/bench \
--bundle benchmarks/ragleakbench_v1/bundle.yaml \
--out results/v1_results.jsonCommunity benchmark results from third-party RAG systems live in
external_results/. All published results are
redacted and secret-scanned β no emails, tokens, or API keys.
# Publish after running a benchmark bundle
uv run ragleaklab bench publish-external \
--bench out/bench \
--system-name "My RAG System" \
--system-type oss \
--out external_results/my_system.json
# Validate before submitting a PR
uv run ragleaklab bench validate-external \
--file external_results/my_system.jsonSee external_results/README.md for the
full schema, safety guarantees, and contribution guide.
π Leaderboard: external_results/TABLE.md
(auto-generated β regenerate with
uv run ragleaklab results build-table --in external_results/ --out external_results/TABLE.md)
Baselines are updated manually to ensure human review:
# Generate new baseline
uv run ragleaklab run \
--corpus data/corpus_private_canary \
--attacks data/attacks \
--out baselines/v1/
# Review and commit
git diff baselines/v1/report.json
git add baselines/v1/report.json
git commit -m "baseline: update after [reason]"New to RAGLeakLab? Start here β docs/ADOPTION.md
Covers 30-minute quick integration, delta ingestion gates, failure triage, baseline updates, security posture, and a phased rollout plan (dry-run β warn-only β block merges).
See also: docs/SELL_SHEET.md β one-page feature overview.
| Document | Description |
|---|---|
| docs/ADOPTION.md | Enterprise adoption guide |
| docs/SELL_SHEET.md | One-page feature overview |
| docs/threat_model.md | Formal threat model |
| docs/ARCHITECTURE.md | Module structure and data flow |
| docs/CONFIG.md | Configuration reference and schema |
| docs/REPORT_SCHEMA.md | Report field descriptions |
| docs/V1_CONTRACTS.md | V1 public contract catalogue |
| docs/V1_PREFLIGHT.md | V1 release preflight checklist |
| docs/STABILITY.md | Stability policy and versioning |
| docs/BASELINE_POLICY.md | Baseline update policy |
| docs/EXTENDING.md | Writing plugins |
| docs/PLUGIN_COOKBOOK.md | Plugin development cookbook |
| docs/CI.md | CI integration guide |
| docs/CI_PARITY.md | CI parity between local and remote |
| docs/DOCKER.md | Container build and run |
| docs/ACTION.md | GitHub Action usage |
| docs/INTEGRATIONS.md | HTTP target examples |
| docs/INTEGRITY_TESTING.md | Integrity and poisoning testing |
| docs/RECORD_REPLAY.md | Cassette record/replay for HTTP |
| docs/CALIBRATION.md | Threshold calibration guide |
| docs/BENCHMARKS.md | Benchmark bundle reference |
| docs/DELTA_GATE.md | Delta ingestion gate |
| docs/WORKFLOWS.md | GitHub Actions workflow patterns |
| docs/TRIAGE.md | Failure triage guide |
| docs/PERFORMANCE.md | Performance tuning |
| docs/SUPPRESSIONS.md | Finding suppression system |
| docs/ROADMAP.md | Future roadmap |
| docs/poisoning.md | Corpus poisoning detection |
| docs/SECURITY_TOOLING.md | Security tooling overview |
| docs/RFC.md | RFC governance process |
| docs/GOOD_FIRST_ISSUES.md | Beginner-friendly tasks |
| docs/RELEASE.md | Release process |
| docs/ASSETS.md | Asset build and validation |
| docs/CASE_STUDIES.md | Security case studies |
| docs/REPO_HEALTH.md | Repository health overview |
| docs/threats/ | Individual threat specifications |
| CONTRIBUTING.md | How to contribute |
| SECURITY.md | Security policy |
| CHANGELOG.md | Version history |
RAGLeakLab uses lightweight governance to keep the project cohesive as it grows:
| Process | What it covers | Document |
|---|---|---|
| RFC | New threat classes, core metrics, breaking changes | docs/RFC.md |
| Baseline Policy | When and how baselines can be updated | docs/BASELINE_POLICY.md |
| Stability Contracts | What constitutes a breaking change | docs/STABILITY.md |
| Contributing | Code style, testing, commit conventions | CONTRIBUTING.md |
| Security | Vulnerability reporting | SECURITY.md |
- Adding a new threat class (e.g. prompt injection, model extraction)
- Adding a new core metric (shipped with the main package)
- Making a breaking change to report schemas, CLI flags, or contracts
- Adding a new claim type (e.g.
attribution,privacy)
Everything else β bug fixes, docs, integration recipes, external plugins β just opens a regular PR.
See docs/RFC.md for the full process.
uv run ruff format . # Format
uv run ruff check . # Lint
uv run pytest -q # TestLocal CI gates ensure you don't push broken code. Setup once per clone:
# Install pre-commit and pre-push hooks
uv run pre-commit install
uv run pre-commit install --hook-type pre-pushRun the full CI check manually:
./scripts/ci_smoke.sh
β οΈ Anti-pattern:git push --no-verifybypasses the pre-push hook. Use only in emergencies.
src/ragleaklab/ # Main package
βββ cli/ # CLI commands (run, diff, bench, calibrate, ...)
βββ core/ # Contracts, determinism, version, plugin system
βββ config/ # YAML config loading and validation
βββ attacks/ # Test case schema, strategy catalog, runner
βββ packs/ # Built-in threat packs (canary, verbatim, membership, ...)
βββ corpus/ # Document loading and chunking
βββ metrics/ # Leakage measurement (canary, verbatim, membership, semantic)
βββ rag/ # Reference pipeline (TF-IDF retrieval, mock generation)
βββ targets/ # Target adapters (in-process, HTTP, mock)
βββ reporting/ # Output schemas (JSON, SARIF, JUnit)
βββ regression/ # Baseline comparison for CI gates
βββ bench/ # Benchmark bundles and results
βββ calibration/ # Threshold calibration
βββ poisoning/ # Corpus poisoning detection
βββ analysis/ # Attack coverage analysis
βββ assets/ # Asset generation and validation
βββ ci/ # CI policy checks (baseline policy)
βββ suppressions/ # Finding suppression system
tests/ # Test files (995+ tests)
docs/ # Documentation (40+ documents)
data/ # Test data and corpora
baselines/ # CI baselines
benchmarks/ # Benchmark bundles
integrations/ # Framework integration recipes
examples/ # Sample files
scripts/ # CI smoke, SBOM generation
templates/ # Plugin development templates
external_results/ # Community benchmark results