Skip to content

Deterministic security testing for RAG pipelines: measure retrieval-induced data leakage with CI-ready metrics.

License

Notifications You must be signed in to change notification settings

mishabar410/RAGLeakLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

105 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

RAGLeakLab

CI Release Python 3.12+ v1.0.0

Security testing framework for RAG systems. Measures information leakage, detects corpus poisoning, and gates CI pipelines β€” all deterministically.

πŸ“„ Read the Whitepaper (PDF) β€” threat model, methodology, and evaluation results.


Table of Contents


Features

Security Testing

Threat Pack Description
Canary Extraction canary-basic Detects planted secret tokens in outputs
Verbatim Extraction verbatim-basic Measures direct text reproduction from corpus
Membership Inference membership-basic Detects if specific documents were in corpus
Semantic Leakage semantic-basic Detects sensitive claims (financial, medical, legal)
Cross-Document crossdoc-basic Detects information combined from multiple documents
Corpus Poisoning sentinel-takeover-safe Detects backdoor triggers and sentinel injections

CI & Automation

  • Regression gates β€” diff command exits non-zero on metric regression
  • Delta ingestion gates β€” detect leakage regressions when corpus changes
  • SARIF + JUnit output β€” findings in GitHub Security tab and test reporters
  • Deterministic by default β€” verify determinism command validates reproducibility
  • Cassette record/replay β€” network-free CI with recorded HTTP responses

Analysis & Reporting

  • Report summarizer β€” report summarize with top findings, attribution, remediation
  • GitHub annotations β€” report annotate emits ::error:: / ::warning:: in PRs
  • Threshold calibration β€” calibrate finds optimal pass/fail thresholds
  • Benchmark bundles β€” bench bundle runs all packs, produces leaderboard results
  • Secret redaction β€” emails, API keys, canary tokens scrubbed from all outputs
  • Query minimization β€” --minimize-on-fail reduces leaking queries to minimal form

Developer Experience

  • Parallel execution β€” --jobs N for multi-core speedup
  • Plugin system β€” entry-point plugins for custom metrics, attacks, and targets
  • HTTP target adapter β€” test any RAG API with SSRF protection and domain allowlisting
  • Asset validation β€” assets validate checks pack manifests and corpora
  • Config validation β€” config validate with JSON Schema export

Quickstart

# Install
uv sync --all-extras

# Run a built-in pack
uv run ragleaklab run --pack canary-basic --out out/canary

# Run against your corpus
uv run ragleaklab run \
  --corpus data/corpus_private_canary \
  --attacks data/attacks \
  --out out/

# Compare against baseline (for CI)
uv run ragleaklab diff \
  --baseline baselines/v1/report.json \
  --current out/report.json

Output Files

File Purpose
out/report.json Summary metrics, pass/fail verdict
out/runs.jsonl Per-case results (1 JSON per line)
out/junit.xml JUnit test results (with --format junit)
out/results.sarif SARIF findings (with --format sarif)

CLI Commands

ragleaklab run          Run attack test cases against a target
ragleaklab diff         Compare reports for regressions
ragleaklab calibrate    Calibrate pack thresholds
ragleaklab bench        Benchmark bundle / time / publish / validate
ragleaklab delta        Delta ingestion gate
ragleaklab report       Summarize / annotate findings
ragleaklab verify       Verify determinism
ragleaklab attacks      Attack coverage analysis
ragleaklab assets       Asset build / validate
ragleaklab config       Config validate / export
ragleaklab version      Show version info

CI Integration

RAGLeakLab is designed for CI pipelines. The diff command exits with code 1 on regression:

# .github/workflows/security-audit.yml
- name: Security audit
  run: |
    uv run ragleaklab run \
      --corpus data/corpus_private_canary \
      --attacks data/attacks \
      --out out/ \
      --format junit \
      --format sarif

- name: Upload test results
  uses: dorny/test-reporter@v1
  with:
    name: RAGLeakLab Results
    path: out/junit.xml
    reporter: java-junit

- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: out/results.sarif

- name: Regression gate
  run: |
    uv run ragleaklab diff \
      --baseline baselines/v1/report.json \
      --current out/report.json

Output Formats

Format File Purpose
--format json report.json Machine-readable summary
--format junit junit.xml Test results in CI UI
--format sarif results.sarif GitHub Security alerts
--format md summary.md Human-readable Markdown

Regression Rules

Metric Fail Condition
canary_extracted false β†’ true
verbatim_leakage_rate Increase > 1%
membership_confidence Increase > 5%

See docs/CI.md for anti-patterns and best practices.

If CI Fails

Use the summarizer to understand what leaked:

# Summarize findings from the output directory
uv run ragleaklab report summarize --in out/ --top 20

# For markdown output (good for PR comments)
uv run ragleaklab report summarize --in out/ --format md

# Emit GitHub-style annotations (::error::, ::warning::)
uv run ragleaklab report annotate --in out/

See docs/TRIAGE.md for the complete triage guide.


Configuration

Use --config for full configuration including HTTP targets:

uv run ragleaklab run --config ragleaklab.yaml --out out/

Example config (see examples/ragleaklab.yaml):

corpus:
  path: data/corpus_private_canary
attacks:
  path: data/attacks
thresholds:
  verbatim_delta: 0.01
  membership_delta: 0.05

# Built-in pipeline (default)
target:
  type: inprocess
  top_k: 3

# OR: External HTTP RAG service
# target:
#   type: http
#   url: http://localhost:8000/ask
#   method: POST
#   request_json:
#     question: "{{query}}"
#   response:
#     answer_field: "answer"
#   headers:
#     Authorization: "Bearer ${API_TOKEN}"
#   timeout_sec: 30
#   allowed_domains: [rag.example.com]

Warning

Do not use HTTP targets in CI without cassette record/replay β€” non-deterministic and may incur costs.


Benchmark Bundles

Run all packs as a benchmark suite:

# Run benchmark bundle
uv run ragleaklab bench bundle \
  --bundle benchmarks/ragleakbench_v1/bundle.yaml \
  --out out/bench

# Publish results
uv run ragleaklab bench publish \
  --in out/bench \
  --bundle benchmarks/ragleakbench_v1/bundle.yaml \
  --out results/v1_results.json

External Results

Community benchmark results from third-party RAG systems live in external_results/. All published results are redacted and secret-scanned β€” no emails, tokens, or API keys.

# Publish after running a benchmark bundle
uv run ragleaklab bench publish-external \
  --bench out/bench \
  --system-name "My RAG System" \
  --system-type oss \
  --out external_results/my_system.json

# Validate before submitting a PR
uv run ragleaklab bench validate-external \
  --file external_results/my_system.json

See external_results/README.md for the full schema, safety guarantees, and contribution guide.

πŸ“Š Leaderboard: external_results/TABLE.md (auto-generated β€” regenerate with uv run ragleaklab results build-table --in external_results/ --out external_results/TABLE.md)


Updating Baseline

Baselines are updated manually to ensure human review:

# Generate new baseline
uv run ragleaklab run \
  --corpus data/corpus_private_canary \
  --attacks data/attacks \
  --out baselines/v1/

# Review and commit
git diff baselines/v1/report.json
git add baselines/v1/report.json
git commit -m "baseline: update after [reason]"

Adoption

New to RAGLeakLab? Start here β†’ docs/ADOPTION.md

Covers 30-minute quick integration, delta ingestion gates, failure triage, baseline updates, security posture, and a phased rollout plan (dry-run β†’ warn-only β†’ block merges).

See also: docs/SELL_SHEET.md β€” one-page feature overview.


Documentation

Document Description
docs/ADOPTION.md Enterprise adoption guide
docs/SELL_SHEET.md One-page feature overview
docs/threat_model.md Formal threat model
docs/ARCHITECTURE.md Module structure and data flow
docs/CONFIG.md Configuration reference and schema
docs/REPORT_SCHEMA.md Report field descriptions
docs/V1_CONTRACTS.md V1 public contract catalogue
docs/V1_PREFLIGHT.md V1 release preflight checklist
docs/STABILITY.md Stability policy and versioning
docs/BASELINE_POLICY.md Baseline update policy
docs/EXTENDING.md Writing plugins
docs/PLUGIN_COOKBOOK.md Plugin development cookbook
docs/CI.md CI integration guide
docs/CI_PARITY.md CI parity between local and remote
docs/DOCKER.md Container build and run
docs/ACTION.md GitHub Action usage
docs/INTEGRATIONS.md HTTP target examples
docs/INTEGRITY_TESTING.md Integrity and poisoning testing
docs/RECORD_REPLAY.md Cassette record/replay for HTTP
docs/CALIBRATION.md Threshold calibration guide
docs/BENCHMARKS.md Benchmark bundle reference
docs/DELTA_GATE.md Delta ingestion gate
docs/WORKFLOWS.md GitHub Actions workflow patterns
docs/TRIAGE.md Failure triage guide
docs/PERFORMANCE.md Performance tuning
docs/SUPPRESSIONS.md Finding suppression system
docs/ROADMAP.md Future roadmap
docs/poisoning.md Corpus poisoning detection
docs/SECURITY_TOOLING.md Security tooling overview
docs/RFC.md RFC governance process
docs/GOOD_FIRST_ISSUES.md Beginner-friendly tasks
docs/RELEASE.md Release process
docs/ASSETS.md Asset build and validation
docs/CASE_STUDIES.md Security case studies
docs/REPO_HEALTH.md Repository health overview
docs/threats/ Individual threat specifications
CONTRIBUTING.md How to contribute
SECURITY.md Security policy
CHANGELOG.md Version history

Project Governance

RAGLeakLab uses lightweight governance to keep the project cohesive as it grows:

Process What it covers Document
RFC New threat classes, core metrics, breaking changes docs/RFC.md
Baseline Policy When and how baselines can be updated docs/BASELINE_POLICY.md
Stability Contracts What constitutes a breaking change docs/STABILITY.md
Contributing Code style, testing, commit conventions CONTRIBUTING.md
Security Vulnerability reporting SECURITY.md

When do I need an RFC?

  • Adding a new threat class (e.g. prompt injection, model extraction)
  • Adding a new core metric (shipped with the main package)
  • Making a breaking change to report schemas, CLI flags, or contracts
  • Adding a new claim type (e.g. attribution, privacy)

Everything else β€” bug fixes, docs, integration recipes, external plugins β€” just opens a regular PR.

See docs/RFC.md for the full process.


Development

uv run ruff format .   # Format
uv run ruff check .    # Lint
uv run pytest -q       # Test

Local Gates

Local CI gates ensure you don't push broken code. Setup once per clone:

# Install pre-commit and pre-push hooks
uv run pre-commit install
uv run pre-commit install --hook-type pre-push

Run the full CI check manually:

./scripts/ci_smoke.sh

⚠️ Anti-pattern: git push --no-verify bypasses the pre-push hook. Use only in emergencies.


Project Structure

src/ragleaklab/        # Main package
β”œβ”€β”€ cli/               # CLI commands (run, diff, bench, calibrate, ...)
β”œβ”€β”€ core/              # Contracts, determinism, version, plugin system
β”œβ”€β”€ config/            # YAML config loading and validation
β”œβ”€β”€ attacks/           # Test case schema, strategy catalog, runner
β”œβ”€β”€ packs/             # Built-in threat packs (canary, verbatim, membership, ...)
β”œβ”€β”€ corpus/            # Document loading and chunking
β”œβ”€β”€ metrics/           # Leakage measurement (canary, verbatim, membership, semantic)
β”œβ”€β”€ rag/               # Reference pipeline (TF-IDF retrieval, mock generation)
β”œβ”€β”€ targets/           # Target adapters (in-process, HTTP, mock)
β”œβ”€β”€ reporting/         # Output schemas (JSON, SARIF, JUnit)
β”œβ”€β”€ regression/        # Baseline comparison for CI gates
β”œβ”€β”€ bench/             # Benchmark bundles and results
β”œβ”€β”€ calibration/       # Threshold calibration
β”œβ”€β”€ poisoning/         # Corpus poisoning detection
β”œβ”€β”€ analysis/          # Attack coverage analysis
β”œβ”€β”€ assets/            # Asset generation and validation
β”œβ”€β”€ ci/                # CI policy checks (baseline policy)
└── suppressions/      # Finding suppression system
tests/                 # Test files (995+ tests)
docs/                  # Documentation (40+ documents)
data/                  # Test data and corpora
baselines/             # CI baselines
benchmarks/            # Benchmark bundles
integrations/          # Framework integration recipes
examples/              # Sample files
scripts/               # CI smoke, SBOM generation
templates/             # Plugin development templates
external_results/      # Community benchmark results

About

Deterministic security testing for RAG pipelines: measure retrieval-induced data leakage with CI-ready metrics.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages