Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. #783

rupertj · 2025-10-30T11:58:50Z

Fix for #782

…for a PDFObject.

k00ni · 2025-11-03T07:45:58Z

Thank you for your PR.

Is it still work in progress?

If not, there are a few tasks left to solve before I take a closer look. Please read https://github.com/smalot/pdfparser/blob/master/CONTRIBUTING.md for more information.

rupertj · 2025-11-03T09:37:52Z

Thanks for the reminder @k00ni. I've added test coverage for the change.

k00ni

Thank you for taking the time to extend the test coverage for your changes!

However, I noticed that the changes in this PR modify existing test data (see diff here). This could introduce unintended side effects for the existing test logic.

To avoid this risk, could you please:

Move your test code into a separate test method
If helpful, you can copy an existing test as a template and adapt it for your new functionality.

This way, we ensure that the existing tests remain unaffected and your new functionality is tested in isolation.

Once that's done, your PR is good to go. By the way, your description in #782 was very helpful and provided good insights into the issue 👍

rupertj · 2025-11-07T10:35:39Z

That change from "Imo" to "Im0" was just correcting a typo in the existing test. I didn't spot that I got that wrong when I wrote it.

I could revert that line and submit it as a separate PR if you like? I think keeping the new test coverage in the same method as the existing coverage makes sense, as they're testing the same bit of code.

rupertj · 2025-11-07T10:42:22Z

Also, to clarify: when the command in the test data is "/Imo Do", the test passes, but for the wrong reason. We're checking for no result for that XObject, and we get no result because it can't find an object called Imo.

When the command is "/Im0 Do", we still get no result, but we're getting it for the right reason. The code finds the XObject, sees that it's an image and then decides not to include it in the text array.

k00ni · 2025-11-24T07:46:17Z

Sorry for the delayed response.

I follow your arguments, it looks good to me. The documentation provided in #782 was very helpful.

rupertj · 2025-11-24T09:33:57Z

Thankyou!

Ignore Form as well as Image XObjects when assembling the text array …

a23e53a

…for a PDFObject.

k00ni added the tests required label Nov 3, 2025

Add test coverage for the change.

7aa477a

k00ni requested changes Nov 7, 2025

View reviewed changes

k00ni added needs work fix and removed tests required labels Nov 7, 2025

k00ni linked an issue Nov 7, 2025 that may be closed by this pull request

PDFObject::getTextArray() shouldn't include XObject Forms. #782

Closed

k00ni merged commit 6b52c6b into smalot:master Nov 24, 2025
36 checks passed

k00ni mentioned this pull request Jan 19, 2026

Page getText() returns blank contents #788

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. #783

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. #783

Uh oh!

rupertj commented Oct 30, 2025

Uh oh!

k00ni commented Nov 3, 2025

Uh oh!

rupertj commented Nov 3, 2025

Uh oh!

k00ni left a comment •

edited

Loading

Uh oh!

rupertj commented Nov 7, 2025

Uh oh!

rupertj commented Nov 7, 2025

Uh oh!

k00ni commented Nov 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

rupertj commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. #783

Ignore Form as well as Image XObjects when assembling the text array for a PDFObject. #783

Uh oh!

Conversation

rupertj commented Oct 30, 2025

Uh oh!

k00ni commented Nov 3, 2025

Uh oh!

rupertj commented Nov 3, 2025

Uh oh!

k00ni left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rupertj commented Nov 7, 2025

Uh oh!

rupertj commented Nov 7, 2025

Uh oh!

k00ni commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rupertj commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

k00ni left a comment •

edited

Loading

k00ni commented Nov 24, 2025 •

edited

Loading