Replace PyPDF2 with pypdfium2 by yiwei-ang · Pull Request #38 · alejandro-ao/ask-multiple-pdfs

yiwei-ang · 2023-08-23T05:06:38Z

I really appreciate @alejandro-ao for creating good video demonstrating the perfect blend of openai, PDF readers and streamlit!

I've tried to use the tool for several PDFs, I found that there's an issue of text extraction quality using PyPDF2, that contexts of a PDF are not extracted fully and completely.

After looking into https://github.com/py-pdf/benchmarks, it seems we can go with pypdfium2 that serves similar functionality, while providing better text extraction quality and faster computational time (Verified from my end!)

IlianP · 2023-09-08T14:20:04Z

As a side note, LangChain also supports pypdfium2 as a document loader:
https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf#using-pypdfium2

costabm · 2023-11-02T14:26:15Z

I have added this important feature to my larger pull request (my first one ever). I gave you credit there, but no sure this is the right way to do it.

yiwei-ang added 3 commits August 23, 2023 12:45

replace PyPDF2 with pypdfium2

0a88e85

replace PyPDF2 with pypdfium2

3557fae

cleanup

f89146b

yiwei-ang changed the title ~~Replace pypdfium2 with~~ Replace PyPDF2 with pypdfium2 Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace PyPDF2 with pypdfium2 #38

Replace PyPDF2 with pypdfium2 #38
yiwei-ang wants to merge 3 commits intoalejandro-ao:mainfrom
yiwei-ang:feature/pdfium

yiwei-ang commented Aug 23, 2023 •

edited

Loading

Uh oh!

IlianP commented Sep 8, 2023

Uh oh!

costabm commented Nov 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yiwei-ang commented Aug 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlianP commented Sep 8, 2023

Uh oh!

costabm commented Nov 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiwei-ang commented Aug 23, 2023 •

edited

Loading