Skip to content

Releases: arsbr/Veritensor

Veritensor v1.6.0

23 Feb 13:17
943f508

Choose a tag to compare

🚀 Veritensor v1.6.0: The Native RAG Firewall & Ecosystem Update

Version 1.6.0 introduces Native Python Integrations, allowing you to embed Veritensor directly into your RAG pipelines as an active firewall. We've also drastically improved the Developer Experience (UX) with ignore files and automated GitHub PR reviews.

Stop scanning data after it's ingested. Block it before it hits your Vector DB.

🔥 Major Features

🧱 Native RAG Integrations (In-Memory Firewall)

You can now wrap your favorite data loaders and vector databases with Veritensor. It scans raw text and extracted elements in-memory, physically blocking Prompt Injections, Data Poisoning, and PII leaks at runtime.

  • LangChain: Wrap any loader with SecureLangChainLoader.
  • LlamaIndex: Wrap any reader with SecureLlamaIndexReader.
  • Unstructured.io: Sanitize extracted elements using SecureUnstructuredScanner.
  • ChromaDB: Intercept .add() and .upsert() calls directly at the database level using SecureChromaCollection.

Example (LangChain):

from langchain_community.document_loaders import PyPDFLoader
from veritensor.integrations.langchain_guard import SecureLangChainLoader

unsafe_loader = PyPDFLoader("user_upload.pdf")
secure_loader = SecureLangChainLoader(file_path="user_upload.pdf", base_loader=unsafe_loader)

# Raises VeritensorSecurityError if prompt injections or PII are detected!
docs = secure_loader.load()

🙈 Smart Filtering with .veritensorignore

No more false positives on your dummy test data! Veritensor now natively supports .veritensorignore files.

  • Works exactly like .gitignore.
  • Supports standard glob patterns (e.g., tests/dummy_data/*, *.dev.env).
  • Keeps your CI/CD pipelines green while maintaining strict security on real assets.

🤖 GitHub App Support (Automated PR Reviews)

Veritensor can now be deployed as a fully-fledged GitHub App Backend.

  • Automatically scans files in new Pull Requests.
  • Posts beautiful, detailed Markdown tables directly into PR comments.
  • Sets Commit Statuses (✅ Success / ❌ Failure) to block malicious merges automatically.

🌪️ Data Engineering (Apache Airflow)

We've added official documentation and patterns for securing ETL pipelines. You can now easily integrate Veritensor into your Airflow DAGs using the standard BashOperator to quarantine poisoned datasets before they enter your Data Lake.


🛠️ Improvements & Fixes

  • Refactored Core Engine: Separated file I/O from text scanning (scan_text), enabling lightning-fast in-memory analysis for our new integrations.
  • Smart Noise Reduction: Improved the CLI output to automatically hide "noisy" data science practices (like !pip install or import os) unless the --verbose flag is passed, focusing your attention only on real threats.
  • Bug Fixes: Resolved an issue where the CLI would crash when attempting to parse S3 URIs as local Path objects on certain OS environments.

🔄 How to Upgrade

To get all the new features and engines:

pip install --upgrade "veritensor[all]"

Veritensor v1.5.1

18 Feb 16:11
486cc5e

Choose a tag to compare

🚀 Veritensor v1.5.1: The Anti-Virus for AI Artifacts

We are introducing Parallel Scanning, Supply Chain Security, and Advanced Stealth Detection.

Veritensor is now a comprehensive firewall for your RAG pipelines, Data Lakes, and Development Environments. We secure the artifacts that traditional scanners miss: Models, Datasets, Notebooks, and Documents.

🔥 Major Features

⚡ High-Performance Parallel Scanning & Caching

We have rewritten the core engine to support Multiprocessing. Veritensor now utilizes all CPU cores to scan thousands of files in seconds.

  • Process Pool Executor: Scans are distributed across workers for maximum throughput.
  • SQLite Caching: Implemented a robust SQLite database (WAL mode) to cache scan results. Re-scanning a 100GB dataset now takes milliseconds if files haven't changed.
  • Usage: Use the --jobs flag to control concurrency (defaults to CPU count).

🔗 Supply Chain Security (Dependency Scanning)

Veritensor now audits your Python environment for compromised packages.

  • Supported Files: requirements.txt, pyproject.toml, poetry.lock, Pipfile.lock.
  • Typosquatting Detection: Detects malicious package impersonators (e.g., tourch instead of torch) using Levenshtein distance algorithms.
  • Vulnerability Check: Integrates with Google OSV.dev API to detect known CVEs in your pinned dependencies.

🕵️‍♂️ Advanced RAG & Stealth Detection

Veritensor v1.5.0 includes a new Stealth Engine to detect attacks hidden from human eyes but visible to LLMs.

  • CSS/HTML Hiding: Detects "Invisible Text" attacks (e.g., font-size: 0, color: white, display: none) hidden inside PDF and HTML documents.
  • Base64 De-obfuscation: Automatically detects and decodes Base64 strings to find obfuscated prompt injections (e.g., SWdub3Jl... -> Ignore previous instructions).
  • Unicode Normalization: Applies NFKC normalization to prevent Unicode bypass attacks.

📊 Expanded Format Support (Excel & Archives)

We have expanded our coverage to include critical business data formats.

  • Excel Security: Scans .xlsx and .csv for Formula Injections (CSV Injection) and malicious macros.
  • Recursive Archives: Safely scans inside .zip, .tar.gz, and .whl files without extracting them to disk. Includes protection against Zip Bombs.

📜 Data Governance & Manifests

  • Provenance Manifest: New command veritensor manifest . generates a signed JSON snapshot of your data artifacts (hashes, status, threats). Essential for compliance (EU AI Act, SOC2).

🛡️ Security Hardening

  • Magic Number Validation: Detects malware masquerading as safe files (e.g., an .exe renamed to .pdf).
  • Smart Filtering: Drastically reduced false positives in Jupyter Notebooks by filtering out common "noise" (like !pip install) and using Entropy Analysis to find real API keys (e.g., WandB, Pinecone) even without known signatures.
  • Hybrid PII Detection: Added support for Microsoft Presidio (optional) for context-aware PII detection, alongside high-speed Regex checks.

📦 Modular Installation

To keep the core tool lightweight (~50MB), Veritensor uses a modular installation strategy. You only download the heavy ML libraries if you need them.

🚀 Recommended (Full Suite)

Installs support for Models, RAG Documents, Datasets, PII, and S3.

pip install "veritensor[all]"

🛠️ Custom Installation

Mix and match dependencies based on your pipeline needs:

Feature Set Command Description Dependencies Added
Core pip install veritensor Base Scanner. Scans Models (Pickle, Keras, PyTorch), Notebooks, and Dependencies. Lightweight (No heavy deps)
Data pip install "veritensor[data]" Dataset Security. Adds support for streaming scan of Parquet, CSV, and Excel files. pyarrow, pandas, openpyxl
RAG pip install "veritensor[rag]" Document Security. Adds support for PDF, DOCX, PPTX scanning. Includes PII support automatically. pypdf, python-docx, python-pptx
PII pip install "veritensor[pii]" Privacy. Adds Microsoft Presidio for ML-based PII detection (Names, Locations, etc). presidio-analyzer, spacy
AWS pip install "veritensor[aws]" Cloud. Adds support for scanning directly from S3 buckets (s3://...).. boto3

Note for PII Users: If you install [pii] or [rag], you must download the Spacy model once:

python -m spacy download en_core_web_lg

🔄 How to Upgrade

To get all the new features and engines:

pip install --upgrade "veritensor[all]"

Veritensor v1.4.1

07 Feb 10:34
7e3edc8

Choose a tag to compare

🚀 Veritensor v1.4.1 release

Minor bug fixes.


📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.4.0

05 Feb 14:04
f2bedff

Choose a tag to compare

🚀 Veritensor v1.4.0: The AI Security & Trust Platform

This major release transforms Veritensor from a model scanner into a holistic AI Security Platform. We are moving beyond simple model checks to secure the entire AI life cycle: Models, Datasets, Notebooks, and RAG knowledge bases.

🔥 New Features

  • 📊 High-Speed Dataset Scanning (Data Poisoning Protection)
    Veritensor now supports streaming analysis for massive datasets. Scan Parquet, CSV, TSV, and JSONL files (up to 100GB+) without memory overflows.

    • Threat Detection: Identifies malicious URLs, PII, and "Data Poisoning" patterns (e.g., "Ignore previous instructions") hidden in training data.
    • Optimization: Uses Column Pruning to scan only string-based columns, making it up to 10x faster than raw text search.
    • Requires: pip install veritensor[aws]
  • 📓 Jupyter Notebook Hardening (.ipynb)
    Security for the researcher's primary tool. Veritensor now inspects:

    • Code Cells: For malicious execution and backdoors.
    • Markdown: For XSS and phishing links.
    • Outputs: For leaked API keys or credentials saved in execution results.
  • 📚 RAG Knowledge Base Security (PDF, DOCX, PPTX)
    Protect your LLM applications before ingestion. Veritensor extracts text from Office documents and PDFs to block prompt injections and sensitive data leaks.

    • Requires: pip install veritensor[rag]
  • ☁️ Cloud-Native Amazon S3 Support
    Scan models and assets directly from your S3 buckets. No more manual downloads for security audits.

    • Requires: pip install veritensor[aws]
  • 🧩 Modular Installation To keep the core tool lightweight, we've introduced optional dependency groups:
    Protect your LLM applications before ingestion. Veritensor extracts text from Office documents and PDFs to block prompt injections and sensitive data leaks.

    • pip install veritensor[data] — for Dataset scanning.
    • pip install veritensor[rag] — for Office/PDF documents.
    • pip install veritensor[aws] — for AWS S3.
    • pip install veritensor[all] — for the full security suite.

🛠️ Improvements

  • Sampling Strategy: Introduced a 10k-row sampling default for datasets to ensure instant feedback, with a --full-scan flag for deep audits.
  • Recursive Discovery: The scanner now automatically identifies and routes all supported formats within a directory.

📦 Full upgrade (includes RAG, S3, dataset scanning)

pip install --upgrade veritensor[all]

Veritensor v1.3.1

22 Jan 18:41
a43fe67

Choose a tag to compare

🚀 Veritensor v1.3.1: Python Wheels & Granular Control

This release focuses on expanding format support and improving CI/CD flexibility based on community feedback.

🔥 New Features

  • 📦 Python Wheel Support (.whl)
    Veritensor now scans .whl packages. It inspects setup.py and internal scripts for suspicious patterns (secrets, obfuscation) and recursively scans any embedded Pickle files (Thanks to u/ResponsibleTruck4717 for the suggestion!).

  • 🎛️ Granular CLI Overrides
    Replaced the blunt --force flag with precise controls for CI/CD pipelines:

    • --ignore-license: Allows deployment of models with restrictive licenses.
    • --ignore-malware: Forces deployment even if threats are found (Use with caution) (Thanks to @patrakov for the suggestion!).
  • 📝 Externalized Signatures & Expanded Rules

    • Security rules are now decoupled into signatures.yaml for easier updates.
    • New Heuristics: Added detection for modern SSH keys (ed25519, ecdsa) and Windows PuTTY keys (.ppk) to catch credential theft attempts (Thanks to @patrakov for the suggestion!).

📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.3.0

15 Jan 23:24
a656429

Choose a tag to compare

🚀 Veritensor v1.3.0: Deep Scanning, Hybrid Compliance & Enterprise Reporting

This major release hardens the detection engine against obfuscated attacks and introduces industry-standard reporting for enterprise compliance.

🔥 New Features

  • 🔍 Deep Archive Inspection (PyTorch Fix)
    Fixed a critical blind spot in PyTorch model analysis.
    • The Problem: PyTorch models (.bin, .pt) are often Zip archives containing a data.pkl file. Previously, scanners treated them as raw streams, missing nested malware.
    • The Fix: The engine now automatically detects Zip headers, extracts contents in memory, and recursively scans internal Pickle files.
    • Validated: Successfully detects hidden malware in nested archives (tested against known malicious repos like star23/baller13).
  • 🧠 Hybrid License Check (File + API)
    Veritensor now uses a smart fallback mechanism for license verification.
    • Zero-Trust: First, it inspects embedded file metadata (Safetensors/GGUF).
    • API Fallback: If metadata is missing, it automatically queries the Hugging Face Hub API to fetch the license from the Model Card (requires --repo).
      Benefit: Drastically reduces "License not found" warnings for valid PyTorch models while maintaining security.
  • 📊 Enterprise Reporting (SBOM & SARIF)
    • SARIF Support: Native integration with GitHub Security Tab (--sarif).
    • CycloneDX SBOM: Generate software bill of materials for compliance audits (--sbom).
  • ⚡ UX Improvements
    • Added veritensor init command to quickly generate a default configuration file.

📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.2.2

10 Jan 22:37
1d9905a

Choose a tag to compare

🚀 Veritensor v1.2.2 release

Minor bug fixes.


📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.2.1

10 Jan 22:23
58dbd13

Choose a tag to compare

🚀 Veritensor v1.2.1 release

🔥 New Features

  • 📄 CycloneDX SBOM Support
    Veritensor can now generate a Software Bill of Materials (SBOM) for your AI models in the CycloneDX v1.5 standard.

    • Usage: veritensor scan ./models --sbom > sbom.json
  • 🛡️ SARIF Reporting (GitHub Security)
    Added native support for SARIF (Static Analysis Results Interchange Format).

    • Usage: veritensor scan ./models --sarif > results.sarif
  • 🤖 Machine-Readable JSON
    Improved raw JSON output for custom CI/CD pipelines and SOAR integrations.

    • Usage: `veritensor scan ./models --json

📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.2.0

10 Jan 20:01
f9641a5

Choose a tag to compare

🚀 Veritensor v1.2.0 release

We are introducing heuristic analysis for secrets, decoupled signature logic, and patched critical vulnerabilities in the core engine.

🔥 New Features

  • 🧠 Heuristic Analysis (Secret Detection)
    The engine now scans string constants for Indicators of Compromise (IOCs). It detects hardcoded secrets (AWS_ACCESS_KEY, OPENAI_API_KEY), system paths (/etc/passwd), and internal IPs inside model files.

  • 🛡️ Core Hardening (DoS & SSRF Protection)
    Patched critical vulnerabilities in the scanner itself:

    • Zip Bomb Protection: Limits memory usage when parsing malicious archives to prevent OOM crashes.
    • SSRF Protection: Restricts network calls to whitelisted domains (Hugging Face) during remote scanning.
  • 🕒 Signed Timestamps (Time Drift Protection)
    Added scan_date (ISO 8601) to Cosign signature annotations. This allows downstream admission controllers to enforce TTL policies (e.g., "reject scans older than 24h").

  • 📝 Externalized Signatures & Regex

    • Threat logic moved to src/veritensor/engines/static/signatures.yaml, allowing updates without recompiling.
    • Added support for Regex in configuration (e.g., regex:^google/.*) for flexible allowlisting.
  • 🚫 Anti-Exfiltration Rules
    Added blocking rules for MLOps tools (wandb, mlflow, dvc) and Cloud SDKs (boto3, google.cloud) within model weights to prevent silent data theft.


📦 Upgrade

pip install --upgrade veritensor

Veritensor v1.1.2

05 Jan 21:40
819a2ce

Choose a tag to compare

⚙️Minor bag fixes

📦 Installation:

pip install veritensor==1.1.2

🛡️ GitHub Action:

uses: ArseniiBrazhnyk/Veritensor@v1.1.2