-
Notifications
You must be signed in to change notification settings - Fork 133
Description
The context
I was trying to restart my pet project of extending profiling into child Python processes (e.g. via subprocess, multiprocessing, and os.fork()) with kernprof (or whatever its successor in 6.0 will be called), which in some previous PRs I've been gradually working towards. It may need a bit elbow grease and boilerplate to port all the stuff over, but it should be very doable: after all, I've already written a pytest plugin which does that for tests, and kernprof is kinda like pytest in that both run other Python code in a somewhat controlled environment.
The test
To gauge how much work is ahead of us and to demonstrate what exactly the change will be good for, I needed a test use-case. Say we have a script which goes
`multiproc-test.py`
from multiprocessing import Pool
def my_sum(x: list[int]) -> int:
result: int = 0
for item in x:
result += item
return result
def main(length: int = 1000, n: int = 4) -> None:
my_list: list[int] = list(range(length))
sublists: list[list[int]] = []
subsums: list[int]
sublength = length // n
if sublength * n < length:
sublength += 1
while my_list:
sublist, my_list = my_list[:sublength], my_list[sublength:]
sublists.append(sublist)
with Pool(n) as pool:
subsums = pool.map(my_sum, sublists)
pool.close()
pool.join()
print(my_sum(subsums))
if __name__ == '__main__':
main() # stdout: '499500\n'... so just a very simple scenario where we distribute the workload of summing integers over multiple processes. What I expected to happen was that kernprof would only see the one call to my_sum() we made in the main process and report data thereon, but unfortunately...
The bug
... the code didn't even run.
$ kernprof -q multiproc-test.py
AttributeError: module '__main__' has no attribute 'my_sum'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
_pickle.PicklingError: Can't pickle <function my_sum at 0x105db62a0>: it's not found as __main__.my_sum
when serializing tuple item 0
when serializing tuple item 0
when serializing tuple item 3Note how we didn't even set up (line-)profiling and yet the execution still failed. The crux of the issue is that for most of the modes under which kernprof runs stuff (except for kernprof --prof-mod=<some_explicit_target> --line-by-line -m <some_module>), the namespace in which the code is executed is not vars(sys.modules['__main__']), which results in errors when multiprocessing.Pool.map() looks in the latter to find the function to pickle and send to the child processes.
An easy fix for the bug is that as with line_profiler.autoprofile.autoprofile.run(..., as_module=True), we always use a dummy module object and its namespace, and monkey-patch it into sys.modules['__main__'], regardless of whether the code is run as a module.
Notes
This issue is mostly just for contextualization and documentation; just need to also whip up some tests and I'll push a PR to fix it.