Image extraction does not handle occlusions (images partially overlapped by text)

## Issue: Image extraction does not handle occlusions (images overlapped by text)

I’m working with textbook-style PDFs (e.g. educational/science books) where diagrams and figures are often **partially overlapped by text, labels, or callouts**. While PyMuPDF is excellent at extracting embedded images and layout information, I’m running into consistent issues when images are **occluded by text** or drawn as part of the page content.

### Observed behavior

In these cases:

- `page.get_images(full=True)` does not return the diagram as a clean image  
- `page.get_text("dict")` correctly reports text blocks overlapping the diagram  
- The diagram itself is either:
  - split into multiple fragments, or
  - not extractable as a single image asset, or
  - only recoverable via full-page rendering

This makes it difficult to reliably extract diagrams as **standalone images** when text overlaps them.

---

### Expected behavior

Ideally, there would be a way to:

- Extract **visual image regions** even when they are partially covered by text  
- Identify a **figure-level bounding box** that includes occluded content  
- Or have clearer guidance on whether this is intentionally unsupported due to PDF format limitations  

I understand that PDFs are geometry-based and may not encode semantic “figure” concepts, but from a user perspective it’s unclear whether this limitation is fundamental or if there are recommended PyMuPDF approaches to mitigate it.

---

### What I’ve tried

- `page.get_images(full=True)`  
- `page.get_image_bbox(xref)`  
- `page.get_text("dict")` with manual bounding-box heuristics  
- Rendering full pages as a fallback (`page.get_pixmap`)  

Rendering works, but it loses the ability to extract **individual diagrams** as separate assets.

---

### Questions

Is handling image occlusion by text:

1. A known limitation of PyMuPDF and/or the PDF format?  
2. Something that newer versions or APIs aim to improve?  
3. Out of scope by design (requiring vision-based post-processing)?  

Any clarification on the intended behavior or best practices would be really helpful.

Thanks, and I appreciate the work that’s gone into PyMuPDF.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image extraction does not handle occlusions (images partially overlapped by text) #4860

Issue: Image extraction does not handle occlusions (images overlapped by text)

Observed behavior

Expected behavior

What I’ve tried

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Image extraction does not handle occlusions (images partially overlapped by text) #4860

Description

Issue: Image extraction does not handle occlusions (images overlapped by text)

Observed behavior

Expected behavior

What I’ve tried

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions