Skip to content

Bump kreuzberg from 4.2.9 to 4.3.1#69

Open
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/uv/kreuzberg-4.3.1
Open

Bump kreuzberg from 4.2.9 to 4.3.1#69
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/uv/kreuzberg-4.3.1

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Feb 13, 2026

Bumps kreuzberg from 4.2.9 to 4.3.1.

Release notes

Sourced from kreuzberg's releases.

Release v4.3.1

Fixed

Elixir Package Checksums (#383)

  • Fixed checksum mismatch for Elixir 4.3.0 Hex package: Updated checksum-Elixir.Kreuzberg.Native.exs with correct SHA256 checksums for all 8 precompiled NIF binaries (NIF 2.16/2.17 across aarch64-apple-darwin, aarch64-unknown-linux-gnu, x86_64-unknown-linux-gnu, x86_64-pc-windows-gnu). The 4.3.0 release shipped with outdated 4.2.10 checksums, causing installation failures.

Dependency Updates

  • Updated all dependencies across 10 language ecosystems: Rust, Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, WASM, and pre-commit hooks all updated to latest compatible versions.
  • Enhanced dependency update tasks: All language-specific `task update` commands now upgrade to latest major versions (not just respecting version constraints). PHP, Ruby, C#, Elixir, and Python update tasks enhanced with major version upgrade support.

WASM Compatibility

  • Fixed WASM build failures: Added explicit `getrandom 0.3.4` dependency with `wasm_js` feature to `kreuzberg-wasm` crate to ensure transitive dependencies (ahash, lopdf, rand_core) have WebAssembly support enabled.

Dependency Pins

  • Pinned lzma-rust2 to 0.15.7: The 0.16.1 upgrade is incompatible with crc 3.4.0. Keeping 0.15.7 until upstream compatibility is restored.

See CHANGELOG.md for full details.

v4.3.0

Added

Blank Page Detection

  • is_blank field on PageInfo and PageContent: Pages with fewer than 3 non-whitespace characters and no tables or images are flagged as blank. Detection uses a two-phase approach: text-only analysis during extraction, then refinement after table/image assignment. Available across all 9 language bindings (Python, TypeScript, Ruby, Java, Go, C#, PHP, Elixir, WASM). Closes #378.

PaddleOCR Backend

  • PaddleOCR backend via ONNX Runtime: New OCR backend (kreuzberg-paddle-ocr) using PaddlePaddle's PP-OCRv4 models converted to ONNX format, run via ONNX Runtime. Supports 6 languages (English, Chinese, Japanese, Korean, German, French) with automatic model downloading and caching. Provides superior CJK recognition compared to Tesseract.
  • PaddleOCR support in all bindings: Available across Python, Rust, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, and Elixir bindings via the paddle-ocr feature flag.
  • PaddleOCR CLI support: The kreuzberg-cli binary supports --ocr-backend paddle-ocr for PaddleOCR extraction.

Unified OCR Element Output

  • Structured OCR element data: Extraction results now include OcrElement data with bounding geometry (rectangles and quadrilaterals), per-element confidence scores, rotation information, and hierarchical levels (word, line, block, page). Available from both PaddleOCR and Tesseract backends.

Shared ONNX Runtime Discovery

  • ort_discovery module: Finds ONNX Runtime shared libraries across platforms, shared between PaddleOCR and future ONNX-based backends.

Document Structure Output

  • DocumentStructure support across all bindings: Added structured document output with include_document_structure configuration option across Python, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, Elixir, and WASM bindings.

Native DOC/PPT Extraction

  • OLE/CFB-based extraction: Added native DOC and PPT extraction via OLE/CFB binary parsing. Legacy Office formats no longer require any external tools.

musl Linux Support

  • Re-enabled musl targets: Added x86_64-unknown-linux-musl and aarch64-unknown-linux-musl targets for CLI binaries, Python wheels (musllinux), and Node.js native bindings. Resolves glibc 2.38+ requirement for prebuilt CLI binaries on older distros like Ubuntu 22.04 (#364).

Fixed

MSG Extraction Hang on Large Attachments (#372)

  • Fixed .msg (Outlook) extraction hanging indefinitely on files with large attachments. Replaced the msg_parser crate with direct OLE/CFB parsing using the cfb crate — attachment binary data is now read directly without hex-encoding overhead.
  • Added lenient FAT padding for MSG files with truncated sector tables produced by some Outlook versions.

... (truncated)

Changelog

Sourced from kreuzberg's changelog.

[4.3.1] - 2026-02-12

Fixed

Elixir Package Checksums (#383)

  • Fixed checksum mismatch for Elixir 4.3.0 Hex package: Updated checksum-Elixir.Kreuzberg.Native.exs with correct SHA256 checksums for all 8 precompiled NIF binaries (NIF 2.16/2.17 across aarch64-apple-darwin, aarch64-unknown-linux-gnu, x86_64-unknown-linux-gnu, x86_64-pc-windows-gnu). The 4.3.0 release shipped with outdated 4.2.10 checksums, causing installation failures.

Dependency Updates

  • Updated all dependencies across 10 language ecosystems: Rust, Python, Node/TypeScript, Ruby, PHP, Go, Java, C#, Elixir, WASM, and pre-commit hooks all updated to latest compatible versions.
  • Enhanced dependency update tasks: All language-specific task update commands now upgrade to latest major versions (not just respecting version constraints). PHP, Ruby, C#, Elixir, and Python update tasks enhanced with major version upgrade support.

WASM Compatibility

  • Fixed WASM build failures: Added explicit getrandom 0.3.4 dependency with wasm_js feature to kreuzberg-wasm crate to ensure transitive dependencies (ahash, lopdf, rand_core) have WebAssembly support enabled.

Dependency Pins

  • Pinned lzma-rust2 to 0.15.7: The 0.16.1 upgrade is incompatible with crc 3.4.0. Keeping 0.15.7 until upstream compatibility is restored.

[4.3.0] - 2026-02-11

Added

Blank Page Detection

  • is_blank field on PageInfo and PageContent: Pages with fewer than 3 non-whitespace characters and no tables or images are flagged as blank. Detection uses a two-phase approach: text-only analysis during extraction, then refinement after table/image assignment. Available across all 9 language bindings (Python, TypeScript, Ruby, Java, Go, C#, PHP, Elixir, WASM). Closes #378.

PaddleOCR Backend

  • PaddleOCR backend via ONNX Runtime: New OCR backend (kreuzberg-paddle-ocr) using PaddlePaddle's PP-OCRv4 models converted to ONNX format, run via ONNX Runtime. Supports 6 languages (English, Chinese, Japanese, Korean, German, French) with automatic model downloading and caching. Provides superior CJK recognition compared to Tesseract.
  • PaddleOCR support in all bindings: Available across Python, Rust, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, and Elixir bindings via the paddle-ocr feature flag.
  • PaddleOCR CLI support: The kreuzberg-cli binary supports --ocr-backend paddle-ocr for PaddleOCR extraction.

Unified OCR Element Output

  • Structured OCR element data: Extraction results now include OcrElement data with bounding geometry (rectangles and quadrilaterals), per-element confidence scores, rotation information, and hierarchical levels (word, line, block, page). Available from both PaddleOCR and Tesseract backends.

Shared ONNX Runtime Discovery

  • ort_discovery module: Finds ONNX Runtime shared libraries across platforms, shared between PaddleOCR and future ONNX-based backends.

Document Structure Output

  • DocumentStructure support across all bindings: Added structured document output with include_document_structure configuration option across Python, TypeScript/Node.js, Go, Java, PHP, Ruby, C#, Elixir, and WASM bindings.

Native DOC/PPT Extraction

  • OLE/CFB-based extraction: Added native DOC and PPT extraction via OLE/CFB binary parsing. Legacy Office formats no longer require any external tools.

musl Linux Support

  • Re-enabled musl targets: Added x86_64-unknown-linux-musl and aarch64-unknown-linux-musl targets for CLI binaries, Python wheels (musllinux), and Node.js native bindings. Resolves glibc 2.38+ requirement for prebuilt CLI binaries on older distros like Ubuntu 22.04 (#364).

Fixed

MSG Extraction Hang on Large Attachments (#372)

  • Fixed .msg (Outlook) extraction hanging indefinitely on files with large attachments. Replaced the msg_parser crate with direct OLE/CFB parsing using the cfb crate — attachment binary data is now read directly without hex-encoding overhead.

... (truncated)

Commits
  • 990a4ac chore: release v4.3.1
  • fa4d2e4 fix(elixir): update checksums for 4.3.0 precompiled NIF binaries
  • 7e67b9a fix: enable wasm_js feature for getrandom 0.3.4 transitive deps
  • a51ea30 fix: pin lzma-rust2 to 0.15.7 for crc compatibility
  • 29b38f5 chore: update all dependencies across all language ecosystems
  • 0df39d5 fix: correct uv sync syntax for Python update task
  • f197bf2 fix: correct YAML syntax in elixir update task
  • dce2db8 feat: enhance all language update tasks to upgrade to latest major versions
  • 4888236 feat: enhance PHP update task to upgrade to latest major versions
  • 5385fdf fix: upgrade PHPUnit to ^12.5 to match e2e/php tests
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) from 4.2.9 to 4.3.1.
- [Release notes](https://github.com/kreuzberg-dev/kreuzberg/releases)
- [Changelog](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CHANGELOG.md)
- [Commits](kreuzberg-dev/kreuzberg@v4.2.9...v4.3.1)

---
updated-dependencies:
- dependency-name: kreuzberg
  dependency-version: 4.3.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants

Comments