Data silently rots.
- Facebook reports hundreds of CPUs showing silent data corruption across hundreds of thousands of machines over 18+ months. Silent Data Corruptions at Scale
- Drives are allowed to return unrecoverable read errors at measurable rates (e.g., <1 in 10^15 bits read for Seagate Exos X24). That's ~1 error per 125TB read. Seagate Exos X24 Data Sheet
- Even SSDs cite nonzero uncorrectable bit error rates (e.g., <1 in 10^17 bits read for an Oracle NVMe SSD). Oracle NVMe SSD Reliability Specs
- HDD cold-storage limits are explicit: WD Ultrastar DC HC650 specs list storage temperature 0-70C (non-operating -40 to 70C) and say a drive may not remain inoperative for more than one year. WD Ultrastar DC HC650 OEM Specification (PDF)
- Power-off retention is limited: JEDEC-based guidance for SSDs calls for client-class drives to retain data for 1 year at 30C after power-off, while enterprise-class drives are only 3 months at 40C after power-off. Drives left unpowered for a year or more are outside spec for enterprise SSDs and at the edge of spec for client SSDs. ATP: How Temperature Affects Data Retention for SSDs
- Cosmic rays can flip bits in electronics. If errors exceed ECC capability, drives can surface uncorrectable read errors. IBM Research on cosmic-ray soft errors Oracle NVMe SSD Product Notes
- Temperature strongly accelerates bitrot risk: JEDEC specs cited by Curtiss-Wright note that client-class SSD retention at BER <= 1e-15 drops to 500 hours at 52C or 96 hours at 66C. Curtiss-Wright: The Effects of Extended Temperatures on Flash Endurance and Data Retention
- Capacity keeps rising (24TB HDDs shipping), which means full reads and scrubs touch more bits and inevitably brush against those error rates. Seagate Exos X24 Data Sheet
If you aren't actively validating, you likely already have corrupt files that are being quietly re-copied to the cloud or your NAS as "good" backups. Family photos, legal documents, old projects, and cherished media are exactly the kind of files that get silently damaged and then preserved in that damaged state.
Drive failures are obvious. Silent sector failures, copy errors, and transmission errors are not. That's why validate exists: deterministic, byte-level validation across a wide range of file formats (100+, see FORMAT_VERIFICATIONS.md).
- Zig library (core validation)
- C FFI (stable-enough for integration, but not yet 1.0)
- C CLI wrapper:
validate
The C FFI mirrors the current Zig validation API for ease of integration. It is expected to evolve before a 1.0 release.
./buildRuns ./test first. When DEBUG is unset/0, dependencies build in ReleaseFast and ./build defaults to -Doptimize=ReleaseFast.
./zig-out/bin/validate <path> [--jobs N]--jobs 0 (default) uses all available cores (logical CPU count).
MAX_FILES limits the number of files scanned when validating a directory.
MAX_VIDEO_SIZE limits deep video validation to files under N MB (unset = no limit).
MEM_TELEMETRY=1 logs per-file RSS memory samples (use MEM_TELEMETRY_PATH to log to a file, MEM_TELEMETRY_EVERY=N to sample every N files).
UNKNOWN_OUT=/path writes UNKNOWN entries to that path instead of stdout (supports /dev/null, /dev/fd/1, /dev/fd/2).
ZIP_TELEMETRY=1 logs slow ZIP entry validation details to stderr (adjust threshold with ZIP_SLOW_SECONDS).
PDF_TELEMETRY=1 logs slow PDF deep-validation breakdowns to stderr (adjust threshold with PDF_SLOW_SECONDS).
./test./test-windowsRequires a CrossOver bottle named windows-dev-test (or set CROSSOVER_BOTTLE).
Note: this is a temporary external dependency; we plan to make the runner self-contained via flake.nix.