Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp)#9423
Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp)#9423NicolasHug merged 5 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9423
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 4 Pending, 2 Unrelated FailuresAs of commit 6d36758 with merge base 48956e0 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks @MPSFuzz , I made some minor changes and also added the same fix to the png encoder/decoder.
malfet
left a comment
There was a problem hiding this comment.
This is fine, but fragile.
Few suggestions:
- Is it possible to add the tests (using valgrind for example), that check for leaks, we have some broken jpegs/pngs to validate the those will raise an error, don't we
- Since
setjmpindeed does not do support stack unwinding, we should refactor functions into clearnothrowones that return an error, and throwing ones (i.e..permute()can potentially throw, but this change does not account for it
|
Thank you for your reply and suggestions. We will try to write a more comprehensive error path handling.
…-----原始邮件-----
发件人:"Nikita Shulga" ***@***.***>
发送时间:2026-03-07 01:41:41 (星期六)
收件人: pytorch/vision ***@***.***>
抄送: MPSFuzz ***@***.***>, Mention ***@***.***>
主题: Re: [pytorch/vision] Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp) (PR #9423)
@malfet approved this pull request.
This is fine, but fragile.
Few suggestions:
Is it possible to add the tests (using valgrind for example), that check for leaks, we have some broken jpegs/pngs to validate the those will raise an error, don't we
Since setjmp indeed does not do support stack unwinding, we should refactor functions into clear nothrow ones that return an error, and throwing ones (i.e. .permute() can potentially throw, but this change does not account for it
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
|
Hey @NicolasHug! You merged this PR, but no labels were added. |
…p) (pytorch#9423) Co-authored-by: MPSFuzz <2286770808@qq.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <nh.nicolas.hug@gmail.com>
|
Thanks for reopening and merging the PR, and for accepting the fix.
I’ll continue following up in #9429 and work on the suggested improvements there.
Best,
SCUer
…-----原始邮件-----
发件人:"Nicolas Hug" ***@***.***>
发送时间:2026-03-10 18:49:17 (星期二)
收件人: pytorch/vision ***@***.***>
抄送: MPSFuzz ***@***.***>, Mention ***@***.***>
主题: Re: [pytorch/vision] Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp) (PR #9423)
@NicolasHug approved this pull request.
Thanks @MPSFuzz , I'll re-open and merge this PR now as I'll need to cherry-pick it as a bugfix in the release branch. Let's follow-up on the suggested improvements in #9429. Do you mind also addressing the encoder and png decoder/encoder changes over there as well?
Thank you!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
Hi,
Thanks again for reopening and merging the fix.
I also wanted to ask whether the team would consider this issue eligible for a CVE or security advisory. I originally reported it through the security channel, and the fix has already required multiple rounds of work. Now that the scope is also extending beyond the JPEG decoder to the encoder and PNG decoder/encoder paths, this is turning into a fairly substantial fix rather than a very small patch.
I’m happy to continue following up on the improvements in #9429, but I wanted to ask whether you could help with the CVE process, or let me know if this is something the project would be willing to request.
Best,
SCUer
…-----原始邮件-----
发件人:"Nicolas Hug" ***@***.***>
发送时间:2026-03-10 18:49:24 (星期二)
收件人: pytorch/vision ***@***.***>
抄送: MPSFuzz ***@***.***>, Mention ***@***.***>
主题: Re: [pytorch/vision] Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp) (PR #9423)
Merged #9423 into main.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Refs #9383
This PR fixes an error-path memory leak in the CPU JPEG decode implementation used by torchvision.io.image.decode_jpeg().
Root cause: libjpeg reports failures via setjmp/longjmp. longjmp does not unwind C++ stack frames, so tensors allocated after the setjmp point (e.g., the output buffer and optional CMYK temp buffer) can skip destructors on decode errors. Repeated calls on malformed JPEGs would therefore accumulate leaked allocations and grow RSS until OOM.
Fix: declare the output tensor and optional CMYK line tensor before setjmp, and explicitly reset() them in the setjmp error branch before calling jpeg_destroy_decompress() and raising the error.
Repro (from #9383):
normal.jpg: RSS stays stable across repeated calls
case1.jpg: RSS no longer grows linearly after this patch
case2.jpg: RSS no longer accumulates across iterations. Note that peak RSS/HWM may still spike due to a single large allocation attempt on malformed headers. This PR intentionally does not add a size limit to avoid changing behavior for legitimately large images; it focuses on fixing the error-path leak.