Use joblib for robust parallel import scanning#209
Use joblib for robust parallel import scanning#209seddonym merged 1 commit intopython-grimp:joblib-multiprocessingfrom
Conversation
|
|
||
| # This is an arbitrary number, but setting it too low slows down our functional tests considerably. | ||
| MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50 | ||
| MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPLE_PROCESSES = 64 |
There was a problem hiding this comment.
🐼🐼🐼 My OCD => 64 is a power of two, just feels nicer
There was a problem hiding this comment.
Not sure we should include this change here, at least not without something in the changelog as it could have an impact on end users. On balance I'd prefer to concentrate on fixing the issue without making another change at the same time.
|
|
||
|
|
||
| # This is an arbitrary number, but setting it too low slows down our functional tests considerably. | ||
| MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50 |
There was a problem hiding this comment.
MULTIPLE_PROCESSES instead of MULTIPROCESSING to avoid confusion with multiprocessing module
| module_files_tuple = tuple(module_files) | ||
|
|
||
| number_of_module_files = len(module_files_tuple) | ||
| n_chunks = _decide_number_of_of_processes(number_of_module_files) |
There was a problem hiding this comment.
Typo (should really have put that in a separate commit...)
| # Don't incur the overhead of multiple processes. | ||
| return 1 | ||
| return min(multiprocessing.cpu_count(), number_of_module_files) | ||
| return min(joblib.cpu_count(), number_of_module_files) |
There was a problem hiding this comment.
joblib.cpu_count() is a thin wrapper around loky.cpu_count() https://github.com/joblib/loky/blob/d8bb877b94214883c2b69cf85ae375ea53d6cd17/loky/backend/context.py#L78
| if number_of_processes == 1: | ||
| # No need to spawn a process if there's only one chunk. | ||
| [chunk] = chunks | ||
| return _scan_chunk(import_scanner, exclude_type_checking_imports, chunk) |
There was a problem hiding this comment.
No need for this special number_of_processes == 1 case anymore - joblib will do this automatically. https://joblib.readthedocs.io/en/stable/generated/joblib.Parallel.html#joblib.Parallel
If 1 is given, no parallel computing code is used at all, and the behavior amounts to a simple python for loop.
CodSpeed Instrumentation Performance ReportMerging #209 will not alter performanceComparing Summary
|
https://joblib.readthedocs.io/en/stable/parallel.html Joblib takes some things for us. Relevant here: * Robust calculation for number of available CPUs. * Sequential calculation when n_jobs = 1. And likely other minor things I don't even understand.
417753f
into
python-grimp:joblib-multiprocessing
seddonym
left a comment
There was a problem hiding this comment.
Thanks for this - I've merged into a branch in my repo and will make a couple of tweaks before merging to master.
|
|
||
| # This is an arbitrary number, but setting it too low slows down our functional tests considerably. | ||
| MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPROCESSING = 50 | ||
| MIN_NUMBER_OF_MODULES_TO_SCAN_USING_MULTIPLE_PROCESSES = 64 |
There was a problem hiding this comment.
Not sure we should include this change here, at least not without something in the changelog as it could have an impact on end users. On balance I'd prefer to concentrate on fixing the issue without making another change at the same time.
https://joblib.readthedocs.io/en/stable/parallel.html
Joblib takes some things for us. Relevant here:
And likely other minor things I don't even understand (joblib has put a lot of thought into multi-processing, so we don't have to).