Skip to content

Comments

Bump marker-pdf from 1.7.5 to 1.8.0#302

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/marker-pdf-1.8.0
Closed

Bump marker-pdf from 1.7.5 to 1.8.0#302
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/marker-pdf-1.8.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jun 30, 2025

Bumps marker-pdf from 1.7.5 to 1.8.0.

Release notes

Sourced from marker-pdf's releases.

Chunk output; custom prompts; structured extraction improvements

Marker 1.8.0

  • Marker will now output a flat list of blocks with associated html, which is useful for RAG
  • Structured extraction beta is significantly improved, with better performance/accuracy
  • New LLM sectionheader processor will correctly label section header levels
  • You can pass a prompt to marker in LLM mode to adjust the output
  • Marker batch conversion script has somewhat better performance, closer to our inference container - email us at hi@datalab.to if you want to get setup with our inference container (used on prem at top AI research orgs)
  • Add an option to filter out blank white page images from output
  • Enable keeping pageheader/pagefooter.

Chunking/RAG improvements

  • Add chunk output format which is a flat list of chunks with full html in each
  • Add an llm sectionheader processor that will redo all the header levels against each other properly

Use the sectionheaderprocessor by setting --use_llm, and the chunk output by setting --output_format chunks.

Structured extraction

  • Fix structured extraction, so it works much better than before (requires llm)
  • Improve structured extraction test app

You can try with with the streamlit app by running python extraction_app.py.

Promptability/customization!

  • Add promptability via block_correction_prompt, which can be used to create custom behavior (requires llm)

Try it by setting the block_correction_prompt config key to a specific prompt.

Misc

  • Get the marker script to perform a bit closer to our inference container by default (inference container gets 10-25 pages/s on H100). Will auto-configure worker count to available VRAM.
  • Fix where marker would output blank pages as images
  • Enable keeping pageheader/pagefooter in the output
  • Adjust llm services to enable text-only input
  • Add html field to almost every block type

Test pageheader/pagefooter by setting keep_pagefooter_in_output and keep_pageheader_in_output.

What's Changed

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jun 30, 2025
@github-actions github-actions bot enabled auto-merge June 30, 2025 19:11
@dependabot dependabot bot force-pushed the dependabot/pip/marker-pdf-1.8.0 branch from 61a3f0a to ef79da0 Compare June 30, 2025 19:51
@dependabot dependabot bot force-pushed the dependabot/pip/marker-pdf-1.8.0 branch from ef79da0 to c9f25c4 Compare June 30, 2025 20:04
Bumps [marker-pdf](https://github.com/VikParuchuri/marker) from 1.7.5 to 1.8.0.
- [Release notes](https://github.com/VikParuchuri/marker/releases)
- [Commits](datalab-to/marker@v1.7.5...v1.8.0)

---
updated-dependencies:
- dependency-name: marker-pdf
  dependency-version: 1.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/marker-pdf-1.8.0 branch from c9f25c4 to 45edfc3 Compare June 30, 2025 20:19
@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Jul 7, 2025

Superseded by #306.

@dependabot dependabot bot closed this Jul 7, 2025
auto-merge was automatically disabled July 7, 2025 18:51

Pull request was closed

@dependabot dependabot bot deleted the dependabot/pip/marker-pdf-1.8.0 branch July 7, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants