Skip to content

Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp)#9423

Merged
NicolasHug merged 5 commits intopytorch:mainfrom
MPSFuzz:fix/jpeg-decode-leak
Mar 10, 2026
Merged

Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp)#9423
NicolasHug merged 5 commits intopytorch:mainfrom
MPSFuzz:fix/jpeg-decode-leak

Conversation

@MPSFuzz
Copy link
Contributor

@MPSFuzz MPSFuzz commented Mar 6, 2026

Refs #9383

This PR fixes an error-path memory leak in the CPU JPEG decode implementation used by torchvision.io.image.decode_jpeg().

Root cause: libjpeg reports failures via setjmp/longjmp. longjmp does not unwind C++ stack frames, so tensors allocated after the setjmp point (e.g., the output buffer and optional CMYK temp buffer) can skip destructors on decode errors. Repeated calls on malformed JPEGs would therefore accumulate leaked allocations and grow RSS until OOM.

Fix: declare the output tensor and optional CMYK line tensor before setjmp, and explicitly reset() them in the setjmp error branch before calling jpeg_destroy_decompress() and raising the error.

Repro (from #9383):

normal.jpg: RSS stays stable across repeated calls
case1.jpg: RSS no longer grows linearly after this patch
case2.jpg: RSS no longer accumulates across iterations. Note that peak RSS/HWM may still spike due to a single large allocation attempt on malformed headers. This PR intentionally does not add a size limit to avoid changing behavior for legitimately large images; it focuses on fixing the error-path leak.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9423

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 4 Pending, 2 Unrelated Failures

As of commit 6d36758 with merge base 48956e0 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Mar 6, 2026
NicolasHug
NicolasHug previously approved these changes Mar 6, 2026
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MPSFuzz , I made some minor changes and also added the same fix to the png encoder/decoder.

malfet
malfet previously approved these changes Mar 6, 2026
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but fragile.

Few suggestions:

  • Is it possible to add the tests (using valgrind for example), that check for leaks, we have some broken jpegs/pngs to validate the those will raise an error, don't we
  • Since setjmp indeed does not do support stack unwinding, we should refactor functions into clear nothrow ones that return an error, and throwing ones (i.e. .permute() can potentially throw, but this change does not account for it

@MPSFuzz
Copy link
Contributor Author

MPSFuzz commented Mar 9, 2026 via email

@MPSFuzz MPSFuzz closed this Mar 9, 2026
@NicolasHug NicolasHug reopened this Mar 10, 2026
@pytorch-bot pytorch-bot bot dismissed stale reviews from NicolasHug and malfet March 10, 2026 09:35

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MPSFuzz , I'll re-open and merge this PR now as I'll need to cherry-pick it as a bugfix in the release branch. Let's follow-up on the suggested improvements in #9429. Do you mind also addressing the encoder and png decoder/encoder changes over there as well?

Thank you!

@NicolasHug NicolasHug merged commit 1cc5693 into pytorch:main Mar 10, 2026
54 of 61 checks passed
@github-actions
Copy link

Hey @NicolasHug!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

NicolasHug added a commit to NicolasHug/vision that referenced this pull request Mar 10, 2026
…p) (pytorch#9423)

Co-authored-by: MPSFuzz <2286770808@qq.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <nh.nicolas.hug@gmail.com>
@MPSFuzz
Copy link
Contributor Author

MPSFuzz commented Mar 11, 2026 via email

@MPSFuzz
Copy link
Contributor Author

MPSFuzz commented Mar 11, 2026 via email

@MPSFuzz
Copy link
Contributor Author

MPSFuzz commented Mar 12, 2026

I’ve now addressed the PNG-side follow-up as well. The PNG decoder changes have been pushed to #9429, following the same refactoring/error-path handling approach. Details are in #9429.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants