A high-performance, lightweight PDF parsing library for Go, forked from rsc/pdf.
This library has been extensively refactored to support modern PDF standards and high-throughput production environments with a focus on memory efficiency and security.
The internal Abstract Syntax Tree (AST) has been rewritten to use a rigid Object union struct instead of interface{}. This eliminates the overhead of interface boxing for every PDF object (integers, names, strings, etc.), leading to massive reductions in memory allocations and GC pressure.
Added comprehensive support for encrypted PDFs:
- AES-128 (v4): Full implementation of AES-CBC decryption for strings and streams.
- AES-256 (v5): Support for PDF 2.0 / Extension Level 3 security handlers, including SHA-256 based Key Derivation (KDK) and File Encryption Key (FEK) retrieval.
- Panic-Free Design: Removed legacy
paniccalls in favor of proper Go error propagation. - Safe Method Chaining: The
Valuestruct now carries error state, allowing safe nested calls likedoc.Trailer().Key("Root").Key("Pages").Count(). - Robustness: Improved recovery from malformed PDF structures and strict parsing errors.
- Buffer Pooling: Implemented
sync.Poolfor parsing buffers. - Bulk Scanning: Optimized
lex.gowith specialized bulk scanners for Names, Keywords, and Strings, drastically reducing per-byte overhead.
Throughput comparison against the original library (parsing standard documents):
| Metric | Upstream Library | This Version | Change |
|---|---|---|---|
| Parsing Speed | 79,526 ns/op | 66,925 ns/op | ~16% Faster |
| Allocations | 2,517 allocs/op | 97 allocs/op | 96% Reduction |
| Memory usage | 113,712 B/op | 87,226 B/op | 23% Lower |
import "github.com/digitorus/pdf"
r, err := pdf.NewReader(file, size)
if err != nil {
return err
}
// Fluent, error-safe access
root := r.Trailer().Key("Root")
if err := root.Err(); err != nil {
return err
}