Skip to content

Conversation

@rupertj
Copy link
Contributor

@rupertj rupertj commented Oct 30, 2025

Fix for #782

@k00ni
Copy link
Collaborator

k00ni commented Nov 3, 2025

Thank you for your PR.

Is it still work in progress?

If not, there are a few tasks left to solve before I take a closer look. Please read https://github.com/smalot/pdfparser/blob/master/CONTRIBUTING.md for more information.

@rupertj
Copy link
Contributor Author

rupertj commented Nov 3, 2025

Thanks for the reminder @k00ni. I've added test coverage for the change.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to extend the test coverage for your changes!

However, I noticed that the changes in this PR modify existing test data (see diff here). This could introduce unintended side effects for the existing test logic.

To avoid this risk, could you please:

  • Move your test code into a separate test method
  • If helpful, you can copy an existing test as a template and adapt it for your new functionality.

This way, we ensure that the existing tests remain unaffected and your new functionality is tested in isolation.

Once that's done, your PR is good to go. By the way, your description in #782 was very helpful and provided good insights into the issue 👍

@k00ni k00ni linked an issue Nov 7, 2025 that may be closed by this pull request
@rupertj
Copy link
Contributor Author

rupertj commented Nov 7, 2025

That change from "Imo" to "Im0" was just correcting a typo in the existing test. I didn't spot that I got that wrong when I wrote it.

I could revert that line and submit it as a separate PR if you like? I think keeping the new test coverage in the same method as the existing coverage makes sense, as they're testing the same bit of code.

@rupertj
Copy link
Contributor Author

rupertj commented Nov 7, 2025

Also, to clarify: when the command in the test data is "/Imo Do", the test passes, but for the wrong reason. We're checking for no result for that XObject, and we get no result because it can't find an object called Imo.

When the command is "/Im0 Do", we still get no result, but we're getting it for the right reason. The code finds the XObject, sees that it's an image and then decides not to include it in the text array.

@k00ni
Copy link
Collaborator

k00ni commented Nov 24, 2025

Sorry for the delayed response.

I follow your arguments, it looks good to me. The documentation provided in #782 was very helpful.

@k00ni k00ni merged commit 6b52c6b into smalot:master Nov 24, 2025
36 checks passed
@rupertj
Copy link
Contributor Author

rupertj commented Nov 24, 2025

Thankyou!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDFObject::getTextArray() shouldn't include XObject Forms.

2 participants