Releases: arsbr/Veritensor
Veritensor v1.6.0
🚀 Veritensor v1.6.0: The Native RAG Firewall & Ecosystem Update
Version 1.6.0 introduces Native Python Integrations, allowing you to embed Veritensor directly into your RAG pipelines as an active firewall. We've also drastically improved the Developer Experience (UX) with ignore files and automated GitHub PR reviews.
Stop scanning data after it's ingested. Block it before it hits your Vector DB.
🔥 Major Features
🧱 Native RAG Integrations (In-Memory Firewall)
You can now wrap your favorite data loaders and vector databases with Veritensor. It scans raw text and extracted elements in-memory, physically blocking Prompt Injections, Data Poisoning, and PII leaks at runtime.
- LangChain: Wrap any loader with
SecureLangChainLoader. - LlamaIndex: Wrap any reader with
SecureLlamaIndexReader. - Unstructured.io: Sanitize extracted elements using
SecureUnstructuredScanner. - ChromaDB: Intercept
.add()and.upsert()calls directly at the database level usingSecureChromaCollection.
Example (LangChain):
from langchain_community.document_loaders import PyPDFLoader
from veritensor.integrations.langchain_guard import SecureLangChainLoader
unsafe_loader = PyPDFLoader("user_upload.pdf")
secure_loader = SecureLangChainLoader(file_path="user_upload.pdf", base_loader=unsafe_loader)
# Raises VeritensorSecurityError if prompt injections or PII are detected!
docs = secure_loader.load()
🙈 Smart Filtering with .veritensorignore
No more false positives on your dummy test data! Veritensor now natively supports .veritensorignore files.
- Works exactly like
.gitignore. - Supports standard glob patterns (e.g.,
tests/dummy_data/*,*.dev.env). - Keeps your CI/CD pipelines green while maintaining strict security on real assets.
🤖 GitHub App Support (Automated PR Reviews)
Veritensor can now be deployed as a fully-fledged GitHub App Backend.
- Automatically scans files in new Pull Requests.
- Posts beautiful, detailed Markdown tables directly into PR comments.
- Sets Commit Statuses (✅ Success / ❌ Failure) to block malicious merges automatically.
🌪️ Data Engineering (Apache Airflow)
We've added official documentation and patterns for securing ETL pipelines. You can now easily integrate Veritensor into your Airflow DAGs using the standard BashOperator to quarantine poisoned datasets before they enter your Data Lake.
🛠️ Improvements & Fixes
- Refactored Core Engine: Separated file I/O from text scanning (
scan_text), enabling lightning-fast in-memory analysis for our new integrations. - Smart Noise Reduction: Improved the CLI output to automatically hide "noisy" data science practices (like
!pip installorimport os) unless the--verboseflag is passed, focusing your attention only on real threats. - Bug Fixes: Resolved an issue where the CLI would crash when attempting to parse S3 URIs as local
Pathobjects on certain OS environments.
🔄 How to Upgrade
To get all the new features and engines:
pip install --upgrade "veritensor[all]"Veritensor v1.5.1
🚀 Veritensor v1.5.1: The Anti-Virus for AI Artifacts
We are introducing Parallel Scanning, Supply Chain Security, and Advanced Stealth Detection.
Veritensor is now a comprehensive firewall for your RAG pipelines, Data Lakes, and Development Environments. We secure the artifacts that traditional scanners miss: Models, Datasets, Notebooks, and Documents.
🔥 Major Features
⚡ High-Performance Parallel Scanning & Caching
We have rewritten the core engine to support Multiprocessing. Veritensor now utilizes all CPU cores to scan thousands of files in seconds.
- Process Pool Executor: Scans are distributed across workers for maximum throughput.
- SQLite Caching: Implemented a robust SQLite database (WAL mode) to cache scan results. Re-scanning a 100GB dataset now takes milliseconds if files haven't changed.
- Usage: Use the
--jobsflag to control concurrency (defaults to CPU count).
🔗 Supply Chain Security (Dependency Scanning)
Veritensor now audits your Python environment for compromised packages.
- Supported Files:
requirements.txt,pyproject.toml,poetry.lock,Pipfile.lock. - Typosquatting Detection: Detects malicious package impersonators (e.g.,
tourchinstead oftorch) using Levenshtein distance algorithms. - Vulnerability Check: Integrates with Google OSV.dev API to detect known CVEs in your pinned dependencies.
🕵️♂️ Advanced RAG & Stealth Detection
Veritensor v1.5.0 includes a new Stealth Engine to detect attacks hidden from human eyes but visible to LLMs.
- CSS/HTML Hiding: Detects "Invisible Text" attacks (e.g.,
font-size: 0,color: white,display: none) hidden inside PDF and HTML documents. - Base64 De-obfuscation: Automatically detects and decodes Base64 strings to find obfuscated prompt injections (e.g.,
SWdub3Jl...->Ignore previous instructions). - Unicode Normalization: Applies NFKC normalization to prevent Unicode bypass attacks.
📊 Expanded Format Support (Excel & Archives)
We have expanded our coverage to include critical business data formats.
- Excel Security: Scans
.xlsxand.csvfor Formula Injections (CSV Injection) and malicious macros. - Recursive Archives: Safely scans inside
.zip,.tar.gz, and.whlfiles without extracting them to disk. Includes protection against Zip Bombs.
📜 Data Governance & Manifests
- Provenance Manifest: New command
veritensor manifest .generates a signed JSON snapshot of your data artifacts (hashes, status, threats). Essential for compliance (EU AI Act, SOC2).
🛡️ Security Hardening
- Magic Number Validation: Detects malware masquerading as safe files (e.g., an
.exerenamed to.pdf). - Smart Filtering: Drastically reduced false positives in Jupyter Notebooks by filtering out common "noise" (like
!pip install) and using Entropy Analysis to find real API keys (e.g., WandB, Pinecone) even without known signatures. - Hybrid PII Detection: Added support for Microsoft Presidio (optional) for context-aware PII detection, alongside high-speed Regex checks.
📦 Modular Installation
To keep the core tool lightweight (~50MB), Veritensor uses a modular installation strategy. You only download the heavy ML libraries if you need them.
🚀 Recommended (Full Suite)
Installs support for Models, RAG Documents, Datasets, PII, and S3.
pip install "veritensor[all]"🛠️ Custom Installation
Mix and match dependencies based on your pipeline needs:
| Feature Set | Command | Description | Dependencies Added |
|---|---|---|---|
| Core | pip install veritensor |
Base Scanner. Scans Models (Pickle, Keras, PyTorch), Notebooks, and Dependencies. | Lightweight (No heavy deps) |
| Data | pip install "veritensor[data]" |
Dataset Security. Adds support for streaming scan of Parquet, CSV, and Excel files. | pyarrow, pandas, openpyxl |
| RAG | pip install "veritensor[rag]" |
Document Security. Adds support for PDF, DOCX, PPTX scanning. Includes PII support automatically. | pypdf, python-docx, python-pptx |
| PII | pip install "veritensor[pii]" |
Privacy. Adds Microsoft Presidio for ML-based PII detection (Names, Locations, etc). | presidio-analyzer, spacy |
| AWS | pip install "veritensor[aws]" |
Cloud. Adds support for scanning directly from S3 buckets (s3://...).. |
boto3 |
Note for PII Users: If you install [pii] or [rag], you must download the Spacy model once:
python -m spacy download en_core_web_lg🔄 How to Upgrade
To get all the new features and engines:
pip install --upgrade "veritensor[all]"Veritensor v1.4.1
🚀 Veritensor v1.4.1 release
Minor bug fixes.
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.4.0
🚀 Veritensor v1.4.0: The AI Security & Trust Platform
This major release transforms Veritensor from a model scanner into a holistic AI Security Platform. We are moving beyond simple model checks to secure the entire AI life cycle: Models, Datasets, Notebooks, and RAG knowledge bases.
🔥 New Features
-
📊 High-Speed Dataset Scanning (Data Poisoning Protection)
Veritensor now supports streaming analysis for massive datasets. Scan Parquet, CSV, TSV, and JSONL files (up to 100GB+) without memory overflows.- Threat Detection: Identifies malicious URLs, PII, and "Data Poisoning" patterns (e.g., "Ignore previous instructions") hidden in training data.
- Optimization: Uses Column Pruning to scan only string-based columns, making it up to 10x faster than raw text search.
- Requires:
pip install veritensor[aws]
-
📓 Jupyter Notebook Hardening (.ipynb)
Security for the researcher's primary tool. Veritensor now inspects:- Code Cells: For malicious execution and backdoors.
- Markdown: For XSS and phishing links.
- Outputs: For leaked API keys or credentials saved in execution results.
-
📚 RAG Knowledge Base Security (PDF, DOCX, PPTX)
Protect your LLM applications before ingestion. Veritensor extracts text from Office documents and PDFs to block prompt injections and sensitive data leaks.- Requires:
pip install veritensor[rag]
- Requires:
-
☁️ Cloud-Native Amazon S3 Support
Scan models and assets directly from your S3 buckets. No more manual downloads for security audits.- Requires:
pip install veritensor[aws]
- Requires:
-
🧩 Modular Installation To keep the core tool lightweight, we've introduced optional dependency groups:
Protect your LLM applications before ingestion. Veritensor extracts text from Office documents and PDFs to block prompt injections and sensitive data leaks.pip install veritensor[data]— for Dataset scanning.pip install veritensor[rag]— for Office/PDF documents.pip install veritensor[aws]— for AWS S3.pip install veritensor[all]— for the full security suite.
🛠️ Improvements
- Sampling Strategy: Introduced a 10k-row sampling default for datasets to ensure instant feedback, with a
--full-scanflag for deep audits. - Recursive Discovery: The scanner now automatically identifies and routes all supported formats within a directory.
📦 Full upgrade (includes RAG, S3, dataset scanning)
pip install --upgrade veritensor[all]Veritensor v1.3.1
🚀 Veritensor v1.3.1: Python Wheels & Granular Control
This release focuses on expanding format support and improving CI/CD flexibility based on community feedback.
🔥 New Features
-
📦 Python Wheel Support (.whl)
Veritensor now scans.whlpackages. It inspectssetup.pyand internal scripts for suspicious patterns (secrets, obfuscation) and recursively scans any embedded Pickle files (Thanks to u/ResponsibleTruck4717 for the suggestion!). -
🎛️ Granular CLI Overrides
Replaced the blunt--forceflag with precise controls for CI/CD pipelines:--ignore-license: Allows deployment of models with restrictive licenses.--ignore-malware: Forces deployment even if threats are found (Use with caution) (Thanks to @patrakov for the suggestion!).
-
📝 Externalized Signatures & Expanded Rules
- Security rules are now decoupled into
signatures.yamlfor easier updates. - New Heuristics: Added detection for modern SSH keys (
ed25519,ecdsa) and Windows PuTTY keys (.ppk) to catch credential theft attempts (Thanks to @patrakov for the suggestion!).
- Security rules are now decoupled into
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.3.0
🚀 Veritensor v1.3.0: Deep Scanning, Hybrid Compliance & Enterprise Reporting
This major release hardens the detection engine against obfuscated attacks and introduces industry-standard reporting for enterprise compliance.
🔥 New Features
- 🔍 Deep Archive Inspection (PyTorch Fix)
Fixed a critical blind spot in PyTorch model analysis.- The Problem: PyTorch models (
.bin,.pt) are often Zip archives containing adata.pklfile. Previously, scanners treated them as raw streams, missing nested malware. - The Fix: The engine now automatically detects Zip headers, extracts contents in memory, and recursively scans internal Pickle files.
- Validated: Successfully detects hidden malware in nested archives (tested against known malicious repos like
star23/baller13).
- The Problem: PyTorch models (
- 🧠 Hybrid License Check (File + API)
Veritensor now uses a smart fallback mechanism for license verification.- Zero-Trust: First, it inspects embedded file metadata (Safetensors/GGUF).
- API Fallback: If metadata is missing, it automatically queries the Hugging Face Hub API to fetch the license from the Model Card (requires
--repo).
Benefit: Drastically reduces "License not found" warnings for valid PyTorch models while maintaining security.
- 📊 Enterprise Reporting (SBOM & SARIF)
- SARIF Support: Native integration with GitHub Security Tab (
--sarif). - CycloneDX SBOM: Generate software bill of materials for compliance audits (
--sbom).
- SARIF Support: Native integration with GitHub Security Tab (
- ⚡ UX Improvements
- Added
veritensor initcommand to quickly generate a default configuration file.
- Added
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.2.2
🚀 Veritensor v1.2.2 release
Minor bug fixes.
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.2.1
🚀 Veritensor v1.2.1 release
🔥 New Features
-
📄 CycloneDX SBOM Support
Veritensor can now generate a Software Bill of Materials (SBOM) for your AI models in the CycloneDX v1.5 standard.- Usage:
veritensor scan ./models --sbom > sbom.json
- Usage:
-
🛡️ SARIF Reporting (GitHub Security)
Added native support for SARIF (Static Analysis Results Interchange Format).- Usage:
veritensor scan ./models --sarif > results.sarif
- Usage:
-
🤖 Machine-Readable JSON
Improved raw JSON output for custom CI/CD pipelines and SOAR integrations.- Usage: `veritensor scan ./models --json
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.2.0
🚀 Veritensor v1.2.0 release
We are introducing heuristic analysis for secrets, decoupled signature logic, and patched critical vulnerabilities in the core engine.
🔥 New Features
-
🧠 Heuristic Analysis (Secret Detection)
The engine now scans string constants for Indicators of Compromise (IOCs). It detects hardcoded secrets (AWS_ACCESS_KEY,OPENAI_API_KEY), system paths (/etc/passwd), and internal IPs inside model files. -
🛡️ Core Hardening (DoS & SSRF Protection)
Patched critical vulnerabilities in the scanner itself:- Zip Bomb Protection: Limits memory usage when parsing malicious archives to prevent OOM crashes.
- SSRF Protection: Restricts network calls to whitelisted domains (Hugging Face) during remote scanning.
-
🕒 Signed Timestamps (Time Drift Protection)
Addedscan_date(ISO 8601) to Cosign signature annotations. This allows downstream admission controllers to enforce TTL policies (e.g., "reject scans older than 24h"). -
📝 Externalized Signatures & Regex
- Threat logic moved to
src/veritensor/engines/static/signatures.yaml, allowing updates without recompiling. - Added support for Regex in configuration (e.g.,
regex:^google/.*) for flexible allowlisting.
- Threat logic moved to
-
🚫 Anti-Exfiltration Rules
Added blocking rules for MLOps tools (wandb,mlflow,dvc) and Cloud SDKs (boto3,google.cloud) within model weights to prevent silent data theft.
📦 Upgrade
pip install --upgrade veritensorVeritensor v1.1.2
⚙️Minor bag fixes
📦 Installation:
pip install veritensor==1.1.2🛡️ GitHub Action:
uses: ArseniiBrazhnyk/Veritensor@v1.1.2