Skip to content

Memory leak / unbounded RSS growth in torchvision.io.image.decode_jpeg() on malformed JPEG (CPU) → potential DoS #9383

@MPSFuzz

Description

@MPSFuzz

🐛 Describe the bug

Summary

Repeated calls to torchvision.io.image.decode_jpeg() on a malformed JPEG cause near-linear RSS growth until OOM. Normal JPEGs do not show this behavior. This looks like an error-path memory leak in the CPU JPEG decode path.

I have checked past issues, #3613 ,#4378, those reports are about GPU/nvJPEG memory leaks. This report is CPU-only and leaks on the error path when decoding malformed JPEGs (RSS grows linearly even after gc + malloc_trim)

This issue mirrors a report I previously filed through the repo’s GitHub Security Advisory (private), including PoC and malformed JPEG samples. Since there has been no maintainer response for over 90 days, I’m posting a public issue to ensure the problem is visible and can be tracked.

For responsible disclosure, I will not publish the malformed JPEG samples here. I can provide them privately to maintainers, or they can review the samples already attached in the Security Advisory thread.

Reproduction
Command:
python poc.py case1.jpg --repeat 50 --mode RGB --quiet
Modes tested: UNCHANGED / RGB / GRAY (all leak to varying degrees)

PoC script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os, sys, argparse, contextlib, gc, ctypes

os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "")

import torch, torchvision
from torchvision.io import ImageReadMode
from torchvision.io.image import decode_jpeg

@contextlib.contextmanager
def swallow_stderr(enable=True):
    if not enable:
        yield; return
    sys.stderr.flush()
    fd = sys.stderr.fileno()
    old = os.dup(fd)
    try:
        with open(os.devnull, "wb") as null:
            os.dup2(null.fileno(), fd)
        yield
    finally:
        os.dup2(old, fd); os.close(old)

def rss_hwm_kb():
    rss = hwm = None
    with open("/proc/self/status") as f:
        for line in f:
            if line.startswith("VmRSS:"):
                rss = int(line.split()[1])
            elif line.startswith("VmHWM:"):
                hwm = int(line.split()[1])
    return rss, hwm

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("unit", help="the case path")
    ap.add_argument("--repeat", type=int, default=50)
    ap.add_argument("--mode", choices=["UNCHANGED","RGB","GRAY"], default="RGB")
    ap.add_argument("--quiet", action="store_true")
    args = ap.parse_args()

    print("torch:", torch.__version__)
    print("torchvision:", torchvision.__version__)
    print("cuda_available:", torch.cuda.is_available())

    with open(args.unit, "rb") as f:
        data = f.read()

    mode = {
        "UNCHANGED": ImageReadMode.UNCHANGED,
        "RGB":       ImageReadMode.RGB,
        "GRAY":      ImageReadMode.GRAY,
    }[args.mode]

    # reduce noise
    u8 = torch.frombuffer(bytearray(data), dtype=torch.uint8).contiguous()

    libc = ctypes.CDLL("libc.so.6")
    torch.set_num_threads(1)
    print(f"[repro] unit={args.unit} bytes={len(data)} repeat={args.repeat} mode={args.mode}")

    for i in range(1, args.repeat + 1):
        try:
            with swallow_stderr(args.quiet):
                _ = decode_jpeg(u8, mode=mode)
        except Exception as e:
            # Bad JPEG will come here: this is exactly where we need to verify if there is an 'error path leak'
            pass

        # Try to recycle the 'non leaking' parts as much as possible
        gc.collect()
        try:
            libc.malloc_trim(0)
        except Exception:
            pass

        rss, hwm = rss_hwm_kb()
        print(f"[{i}/{args.repeat}] VmRSS={rss/1024:.1f} MB  VmHWM={hwm/1024:.1f} MB", flush=True)

if __name__ == "__main__":
    main()

Observed results

Normal JPEG: RSS stabilizes around ~269 MB after repeated calls.

Malformed JPEG: RSS grows ~linearly to ~5 GB after 50 iterations (see logs below).

for normal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
...
[45/50] VmRSS=269.0 MB VmHWM=270.9 MB
[46/50] VmRSS=269.0 MB VmHWM=270.9 MB
[47/50] VmRSS=269.0 MB VmHWM=270.9 MB
[48/50] VmRSS=269.0 MB VmHWM=270.9 MB
[49/50] VmRSS=269.0 MB VmHWM=270.9 MB
[50/50] VmRSS=269.0 MB VmHWM=270.9 MB

for abnormal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
[1/50] VmRSS=363.8 MB VmHWM=366.2 MB
[2/50] VmRSS=457.4 MB VmHWM=457.4 MB
[3/50] VmRSS=551.1 MB VmHWM=551.1 MB
[4/50] VmRSS=644.7 MB VmHWM=644.7 MB
[5/50] VmRSS=738.3 MB VmHWM=738.3 MB
[6/50] VmRSS=831.9 MB VmHWM=831.9 MB
[7/50] VmRSS=925.6 MB VmHWM=925.6 MB
...
[45/50] VmRSS=4483.3 MB VmHWM=4483.3 MB
[46/50] VmRSS=4576.9 MB VmHWM=4576.9 MB
[47/50] VmRSS=4670.6 MB VmHWM=4670.6 MB
[48/50] VmRSS=4764.2 MB VmHWM=4764.2 MB
[49/50] VmRSS=4857.8 MB VmHWM=4857.8 MB
[50/50] VmRSS=4951.4 MB VmHWM=4951.4 MB

Meanwhile, you can also check the memory usage using "htop".
For case 1, the memory usage is 5GB, and for case 2, the memory usage is over 100GB.

Sample files
I can provide the malformed samples to maintainers privately.

Impact
If a service decodes untrusted user-provided JPEGs, an attacker could repeatedly submit crafted malformed images to exhaust memory and trigger DoS.

Versions

torch: 2.9.0+cpu

torchvision: 0.24.0+cpu (0.25.0 also)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions