PDF Parser for Go

A high-performance, lightweight PDF parsing library for Go, forked from rsc/pdf.

This library has been extensively refactored to support modern PDF standards and high-throughput production environments with a focus on memory efficiency and security.

Key Improvements

1. High-Performance Zero-Allocation AST

The internal Abstract Syntax Tree (AST) has been rewritten to use a rigid Object union struct instead of interface{}. This eliminates the overhead of interface boxing for every PDF object (integers, names, strings, etc.), leading to massive reductions in memory allocations and GC pressure.

2. Modern Security Support

Added comprehensive support for encrypted PDFs:

AES-128 (v4): Full implementation of AES-CBC decryption for strings and streams.
AES-256 (v5): Support for PDF 2.0 / Extension Level 3 security handlers, including SHA-256 based Key Derivation (KDK) and File Encryption Key (FEK) retrieval.

3. Stability & Error Handling

Panic-Free Design: Removed legacy panic calls in favor of proper Go error propagation.
Safe Method Chaining: The Value struct now carries error state, allowing safe nested calls like doc.Trailer().Key("Root").Key("Pages").Count().
Robustness: Improved recovery from malformed PDF structures and strict parsing errors.

4. Memory Efficiency

Buffer Pooling: Implemented sync.Pool for parsing buffers.
Bulk Scanning: Optimized lex.go with specialized bulk scanners for Names, Keywords, and Strings, drastically reducing per-byte overhead.

Benchmarks

Throughput comparison against the original library (parsing standard documents):

Metric	Upstream Library	This Version	Change
Parsing Speed	79,526 ns/op	66,925 ns/op	~16% Faster
Allocations	2,517 allocs/op	97 allocs/op	96% Reduction
Memory usage	113,712 B/op	87,226 B/op	23% Lower

Usage

import "github.com/digitorus/pdf"

r, err := pdf.NewReader(file, size)
if err != nil {
    return err
}

// Fluent, error-safe access
root := r.Trailer().Key("Root")
if err := root.Err(); err != nil {
    return err
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
pdfpasswd		pdfpasswd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_test.go		benchmark_test.go
corpus_test.go		corpus_test.go
encryption_test.go		encryption_test.go
filter_test.go		filter_test.go
go.mod		go.mod
lex.go		lex.go
lex_test.go		lex_test.go
name.go		name.go
page.go		page.go
page_test.go		page_test.go
ps.go		ps.go
ps_test.go		ps_test.go
read.go		read.go
read_test.go		read_test.go
text.go		text.go
text_test.go		text_test.go
types.go		types.go
types_test.go		types_test.go
xref_test.go		xref_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

PDF Parser for Go

Key Improvements

1. High-Performance Zero-Allocation AST

2. Modern Security Support

3. Stability & Error Handling

4. Memory Efficiency

Benchmarks

Usage

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

digitorus/pdf

Folders and files

Latest commit

History

Repository files navigation

PDF Parser for Go

Key Improvements

1. High-Performance Zero-Allocation AST

2. Modern Security Support

3. Stability & Error Handling

4. Memory Efficiency

Benchmarks

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages