Skip to content

BUG: kernprof executed code inconsistent with sys.modules['__main__'], causing pickling issues #422

@TTsangSC

Description

@TTsangSC

The context

I was trying to restart my pet project of extending profiling into child Python processes (e.g. via subprocess, multiprocessing, and os.fork()) with kernprof (or whatever its successor in 6.0 will be called), which in some previous PRs I've been gradually working towards. It may need a bit elbow grease and boilerplate to port all the stuff over, but it should be very doable: after all, I've already written a pytest plugin which does that for tests, and kernprof is kinda like pytest in that both run other Python code in a somewhat controlled environment.

The test

To gauge how much work is ahead of us and to demonstrate what exactly the change will be good for, I needed a test use-case. Say we have a script which goes

`multiproc-test.py`
from multiprocessing import Pool


def my_sum(x: list[int]) -> int:
    result: int = 0
    for item in x:
        result += item
    return result


def main(length: int = 1000, n: int = 4) -> None:
    my_list: list[int] = list(range(length))
    sublists: list[list[int]] = []
    subsums: list[int]
    sublength = length // n
    if sublength * n < length:
        sublength += 1
    while my_list:
        sublist, my_list = my_list[:sublength], my_list[sublength:]
        sublists.append(sublist)
    with Pool(n) as pool:
        subsums = pool.map(my_sum, sublists)
        pool.close()
        pool.join()
    print(my_sum(subsums))


if __name__ == '__main__':
    main()  # stdout: '499500\n'

... so just a very simple scenario where we distribute the workload of summing integers over multiple processes. What I expected to happen was that kernprof would only see the one call to my_sum() we made in the main process and report data thereon, but unfortunately...

The bug

... the code didn't even run.

$ kernprof -q multiproc-test.py
AttributeError: module '__main__' has no attribute 'my_sum'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  ...
_pickle.PicklingError: Can't pickle <function my_sum at 0x105db62a0>: it's not found as  __main__.my_sum
when serializing tuple item 0
when serializing tuple item 0
when serializing tuple item 3

Note how we didn't even set up (line-)profiling and yet the execution still failed. The crux of the issue is that for most of the modes under which kernprof runs stuff (except for kernprof --prof-mod=<some_explicit_target> --line-by-line -m <some_module>), the namespace in which the code is executed is not vars(sys.modules['__main__']), which results in errors when multiprocessing.Pool.map() looks in the latter to find the function to pickle and send to the child processes.

An easy fix for the bug is that as with line_profiler.autoprofile.autoprofile.run(..., as_module=True), we always use a dummy module object and its namespace, and monkey-patch it into sys.modules['__main__'], regardless of whether the code is run as a module.

Notes

This issue is mostly just for contextualization and documentation; just need to also whip up some tests and I'll push a PR to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions