Fix error-path leak in CPU JPEG decode (setjmp/longjmp) by MPSFuzz · Pull Request #9429 · pytorch/vision

MPSFuzz · 2026-03-09T08:28:22Z

Refs #9383
Thanks for the feedback and suggestions! Following the guidance provided, I've refactored the code to improve its structure. The setjmp/longjmp (libjpeg) region is now separated from potentially-throwing Torch operations. Specifically, the libjpeg decoding path is now encapsulated in the helper function decode_jpeg_hwc_impl, which returns an HWC tensor and optionally reads EXIF orientation. The outer decode_jpeg function performs the permute() and EXIF transformation after libjpeg has finished and the decompression object is destroyed, ensuring that throwing operations are outside the longjmp zone.

I have validated this fix using Valgrind on several broken JPEG files, including those from the PoC case mentioned in #9383 (https://github.com/MPSFuzz/images/tree/master/torch_poc) as well as additional files in test/assets/damaged_jpeg. Below are the leak summaries detected by Valgrind:

Case 1:

root@f6dd0ec04f4a:~# grep -n "LEAK SUMMARY" -A25 /root/valgrind_case1.log
881500:==2225004== LEAK SUMMARY:
881501-==2225004==    definitely lost: 0 bytes in 1 blocks
881502-==2225004==    indirectly lost: 0 bytes in 0 blocks
881503-==2225004==      possibly lost: 492,322 bytes in 310 blocks
881504-==2225004==    still reachable: 25,377,811 bytes in 101,103 blocks
881505-==2225004==         suppressed: 0 bytes in 0 blocks

Case 2:

root@f6dd0ec04f4a:~# grep -n "LEAK SUMMARY" -A25 /root/valgrind_case2.log
609727:==2232009== LEAK SUMMARY:
609728-==2232009==    definitely lost: 0 bytes in 1 blocks
609729-==2232009==    indirectly lost: 0 bytes in 0 blocks
609730-==2232009==      possibly lost: 492,322 bytes in 310 blocks
609731-==2232009==    still reachable: 25,377,811 bytes in 101,103 blocks
609732-==2232009==         suppressed: 0 bytes in 0 blocks

Official assert:

==== /root/vg_bad_huffman.log ====
609717:==2246780== LEAK SUMMARY:
609718-==2246780==    definitely lost: 0 bytes in 1 blocks
609719-==2246780==    indirectly lost: 0 bytes in 0 blocks
609720-==2246780==      possibly lost: 492,322 bytes in 310 blocks
609721-==2246780==    still reachable: 25,377,796 bytes in 101,103 blocks
609722-==2246780==         suppressed: 0 bytes in 0 blocks
609723-==2246780== 
==== /root/vg_corrupt.log ====
609717:==2247747== LEAK SUMMARY:
609718-==2247747==    definitely lost: 0 bytes in 1 blocks
609719-==2247747==    indirectly lost: 0 bytes in 0 blocks
609720-==2247747==      possibly lost: 492,322 bytes in 310 blocks
609721-==2247747==    still reachable: 25,377,802 bytes in 101,103 blocks
609722-==2247747==         suppressed: 0 bytes in 0 blocks
609723-==2247747== 
==== /root/vg_corrupt34_2.log ====
609717:==2248607== LEAK SUMMARY:
609718-==2248607==    definitely lost: 0 bytes in 1 blocks
609719-==2248607==    indirectly lost: 0 bytes in 0 blocks
609720-==2248607==      possibly lost: 492,322 bytes in 310 blocks
609721-==2248607==    still reachable: 25,377,796 bytes in 101,103 blocks
609722-==2248607==         suppressed: 0 bytes in 0 blocks
609723-==2248607== 
==== /root/vg_corrupt34_3.log ====
609717:==2249501== LEAK SUMMARY:
609718-==2249501==    definitely lost: 0 bytes in 1 blocks
609719-==2249501==    indirectly lost: 0 bytes in 0 blocks
609720-==2249501==      possibly lost: 492,322 bytes in 310 blocks
609721-==2249501==    still reachable: 25,377,799 bytes in 101,103 blocks
609722-==2249501==         suppressed: 0 bytes in 0 blocks
609723-==2249501== 
==== /root/vg_corrupt34_4.log ====
609717:==2250387== LEAK SUMMARY:
609718-==2250387==    definitely lost: 0 bytes in 1 blocks
609719-==2250387==    indirectly lost: 0 bytes in 0 blocks
609720-==2250387==      possibly lost: 492,322 bytes in 310 blocks
609721-==2250387==    still reachable: 25,377,787 bytes in 101,103 blocks
609722-==2250387==         suppressed: 0 bytes in 0 blocks
609723-==2250387==

The "possibly lost" memory is a result of untracked allocations (as expected), but nothing is definitely lost.

pytorch-bot · 2026-03-09T08:28:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9429

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

MPSFuzz · 2026-03-12T03:03:52Z

Following the suggestion in #9423, I applied the same error-path handling/refactoring pattern to the PNG decode path as well.

In particular, decode_png is now structured similarly to the updated decode_jpeg path:

keep the libpng setjmp/longjmp region contained,
move potentially-throwing Torch ops outside that region,
and align the PNG error-path cleanup logic with the JPEG fix.

I tested this with torchvision_png_leak_case from:
https://github.com/MPSFuzz/images/tree/master/torch_poc

python3 poc_png.py torchvision_png_leak_case --iters 100000 --report-every 10000

With the rebuilt patched `torchvision`, RSS stays flat and `delta_kb` no longer increases:

[INFO] input_file=torchvision_png_leak_case  
[INFO] input_size=297  
[INFO] iters=100000, report_every=10000  
[INFO] start_rss_kb=627612  
[REPORT] iter=1 rss_kb=627612 delta_kb=0 elapsed=0.00s  
[REPORT] iter=10000 rss_kb=627612 delta_kb=0 elapsed=0.15s  
[REPORT] iter=20000 rss_kb=627612 delta_kb=0 elapsed=0.31s  
[REPORT] iter=30000 rss_kb=627612 delta_kb=0 elapsed=0.47s  
[REPORT] iter=40000 rss_kb=627612 delta_kb=0 elapsed=0.63s  
[REPORT] iter=50000 rss_kb=627612 delta_kb=0 elapsed=0.78s  
[REPORT] iter=60000 rss_kb=627612 delta_kb=0 elapsed=0.93s  
[REPORT] iter=70000 rss_kb=627612 delta_kb=0 elapsed=1.08s  
[REPORT] iter=80000 rss_kb=627612 delta_kb=0 elapsed=1.24s  
[REPORT] iter=90000 rss_kb=627612 delta_kb=0 elapsed=1.38s  
[REPORT] iter=100000 rss_kb=627612 delta_kb=0 elapsed=1.53s  
[DONE] end_rss_kb=627612 total_delta_kb=0

For comparison, in the other environment without the updated build, the same PoC still shows near-linear RSS growth:

[INFO] input_file=torchvision_png_leak_case  
[INFO] input_size=297  
[INFO] iters=100000, report_every=10000  
[INFO] start_rss_kb=637760  
[REPORT] iter=1 rss_kb=637760 delta_kb=0 elapsed=0.00s  
[REPORT] iter=10000 rss_kb=654504 delta_kb=16744 elapsed=0.18s  
[REPORT] iter=20000 rss_kb=670344 delta_kb=32584 elapsed=0.37s  
[REPORT] iter=30000 rss_kb=686184 delta_kb=48424 elapsed=0.55s  
[REPORT] iter=40000 rss_kb=701760 delta_kb=64000 elapsed=0.73s  
[REPORT] iter=50000 rss_kb=717600 delta_kb=79840 elapsed=0.91s  
[REPORT] iter=60000 rss_kb=733440 delta_kb=95680 elapsed=1.10s  
[REPORT] iter=70000 rss_kb=749280 delta_kb=111520 elapsed=1.29s  
[REPORT] iter=80000 rss_kb=764856 delta_kb=127096 elapsed=1.47s  
[REPORT] iter=90000 rss_kb=780696 delta_kb=142936 elapsed=1.65s  
[REPORT] iter=100000 rss_kb=796536 delta_kb=158776 elapsed=1.83s  
[DONE] end_rss_kb=796536 total_delta_kb=158776

poc_png.py:

#!/usr/bin/env python3
import sys
import time
import argparse

import torch
from torchvision.io import ImageReadMode
from torchvision.io.image import decode_png


def get_rss_kb():
    with open("/proc/self/status", "r", encoding="utf-8") as f:
        for line in f:
            if line.startswith("VmRSS:"):
                return int(line.split()[1])
    return -1


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("input_file", help="path to input sample")
    parser.add_argument("--iters", type=int, default=100000, help="number of iterations")
    parser.add_argument(
        "--report-every", type=int, default=1000, help="report RSS every N iterations"
    )
    args = parser.parse_args()

    with open(args.input_file, "rb") as f:
        data = f.read()

    u8 = torch.frombuffer(data, dtype=torch.uint8)

    rss0 = get_rss_kb()
    t0 = time.time()

    print(f"[INFO] input_file={args.input_file}")
    print(f"[INFO] input_size={len(data)}")
    print(f"[INFO] iters={args.iters}, report_every={args.report_every}")
    print(f"[INFO] start_rss_kb={rss0}")

    for i in range(1, args.iters + 1):
        try:
            decode_png(u8, mode=ImageReadMode.UNCHANGED)
        except Exception:
            pass

        if i == 1 or i % args.report_every == 0 or i == args.iters:
            rss = get_rss_kb()
            print(
                f"[REPORT] iter={i} rss_kb={rss} delta_kb={rss - rss0} elapsed={time.time() - t0:.2f}s",
                flush=True,
            )

    rss1 = get_rss_kb()
    print(f"[DONE] end_rss_kb={rss1} total_delta_kb={rss1 - rss0}")


if __name__ == "__main__":
    main()

Fix error-path leak in CPU JPEG decode (setjmp/longjmp)

b31f5c2

meta-cla bot added the cla signed label Mar 9, 2026

NicolasHug mentioned this pull request Mar 10, 2026

Fix CPU decode_jpeg error-path leak on malformed JPEGs (setjmp/longjmp) #9423

Merged

Refine decode_png error-path handling

5ad2ca3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error-path leak in CPU JPEG decode (setjmp/longjmp)#9429

Fix error-path leak in CPU JPEG decode (setjmp/longjmp)#9429
MPSFuzz wants to merge 2 commits intopytorch:mainfrom
MPSFuzz:fix/jpeg-decode-errorpath-leak

MPSFuzz commented Mar 9, 2026

Uh oh!

pytorch-bot bot commented Mar 9, 2026

Uh oh!

MPSFuzz commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MPSFuzz commented Mar 9, 2026

Uh oh!

pytorch-bot bot commented Mar 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9429

Uh oh!

MPSFuzz commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants