Parallel import scanning (python)#198
Conversation
2c2fa5d to
2352163
Compare
CodSpeed Instrumentation Performance ReportMerging #198 will improve performances by ×43Comparing Summary
Benchmarks breakdown
|
2352163 to
bfc829a
Compare
bfc829a to
ee3f43c
Compare
| def __reduce__(self): | ||
| return SourceSyntaxError, (self.filename, self.lineno, self.text) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Probably worth adding a comment to explain why this is necessary.
@seddonym This doesn't seem believable ☝️ - I think it can have something to do with https://docs.codspeed.io/instruments/cpu/#system-calls Benchmarks on my machine look more believable: |
seddonym
left a comment
There was a problem hiding this comment.
This is fantastic, thank you! A real game changer.
I would like to address the degraded performance when the cache is warm - but it looks like that is really easily done. I'll open a separate PR to benchmark using the cache for small numbers of changes, and we can rebase off this to check there isn't degraded performance there too.
I'm not yet sold on introducing joblib - would prefer not to introduce third party dependencies unless there's a really strong reason. Could we try this using the standard library instead?
| def __reduce__(self): | ||
| return SourceSyntaxError, (self.filename, self.lineno, self.text) |
There was a problem hiding this comment.
Probably worth adding a comment to explain why this is necessary.
src/grimp/application/usecases.py
Outdated
| for found_package in found_packages: | ||
| for module_file in found_package.module_files: | ||
| module = module_file.module | ||
| modules_files_to_scan = { |
There was a problem hiding this comment.
Should this be module_files_to_scan?
src/grimp/application/usecases.py
Outdated
| ) | ||
|
|
||
|
|
||
| def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]: |
There was a problem hiding this comment.
🐼 module_files could be Collection[ModuleFile] for more flexibility.
src/grimp/application/usecases.py
Outdated
| ) | ||
|
|
||
|
|
||
| def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]: |
There was a problem hiding this comment.
🐼 Always sceptical when I see a list 😉 .
Maybe tuple[tuple[ModuleFile], ...] would be a slightly better return type as it's immutable?
Also considered frozenset[frozenset[ModuleFile]] but I wonder if hashing might incur a penalty here.
src/grimp/application/usecases.py
Outdated
| def _scan_imports( | ||
| import_scanner: AbstractImportScanner, | ||
| exclude_type_checking_imports: bool, | ||
| module_files: List[ModuleFile], |
There was a problem hiding this comment.
Could be Iterable[ModuleFile] for greater flexibility.
src/grimp/application/usecases.py
Outdated
| for module_file in modules_files_to_scan.difference(imports_by_module_file): | ||
| imports_by_module_file[module_file] = import_scanner.scan_for_imports( | ||
| module_file.module, exclude_type_checking_imports=exclude_type_checking_imports | ||
| remaining_modules_files_to_scan = list( |
There was a problem hiding this comment.
I think the _scan_packages function has tipped over into becoming too long now - would you mind breaking it up a bit?
src/grimp/application/usecases.py
Outdated
| chunked_remaining_modules_files_to_scan = _create_chunks( | ||
| remaining_modules_files_to_scan, n_chunks=N_CPUS | ||
| ) | ||
| with parallel_config(n_jobs=N_CPUS): |
There was a problem hiding this comment.
When the cache is full, remaining_modules_files_to_scan is empty - if that's the case it's a waste of time to spin up these processes (and it noticeably degrades performance). At the very least we should wrap this in if remaining_modules_files_to_scan. I tried that locally and it recovered the degraded performance for a fully populated cache.
Also, n_jobs should be min(N_CPUS, len(remaining_module_files_to_scan)), right? I tried that locally and it improves the situation if I've only changed one file. It would be good to add a benchmark for that scenario as I don't think we currently measure it.
I'm not sure what the threshold is multiprocessing becomes worthwhile, but we could possibly improve on the logic a little by having a non-parallel track for this too, for small numbers of changes. That said, I tried the approach of spinning up only one extra process for one changed module and it wasn't noticeably slower, so it's possibly not worth it.
ee3f43c to
3f6fdc3
Compare
|
Oh - just remembered one thing, would you mind adding a line to the CHANGELOG too? |
e6de21e to
f16dff9
Compare
There was a problem hiding this comment.
@seddonym I've added a pure multiprocessing version as a fixup now. It does seem to be a bit slower.
- When running tests locally I notice the difference in the overall time of the test suite.
- When running benchmarks I see that it's quite a bit slower than joblib. Benchmarks on my machine:
------- benchmark 'test_build_django_from_cache': --------------
Name (time in ms) Mean
-----------------------------------------------------------------
test_build_django_from_cache (joblib) 37.9565
test_build_django_from_cache (master) 38.1530
test_build_django_from_cache (multiprocessing) 39.2882
-----------------------------------------------------------------
-------- benchmark 'test_build_django_uncached': ----------------
Name (time in ms) Mean
------------------------------------------------------------------
test_build_django_uncached (joblib) 250.2343
test_build_django_uncached (multiprocessing) 421.2143
test_build_django_uncached (master) 1,066.4864
------------------------------------------------------------------
I think it's likely okay to use joblib here:
- joblib has 4000 stars on GitHub, so isn't going anywhere too soon.
- joblib has no dependences beyond python and emphasises robustness over features (see "design choices" https://joblib.readthedocs.io/en/stable/why.html#design-choices)
- Absolute worst case if this dependency breaks we can just remove it again and things get a bit slower.
So I'd suggest I remove that fixup commit and we go with the joblib version.
What do you think?
64c9f7d to
49ef093
Compare
This is needed to ensure that the error can be sent between processes. This ensures that the test `test_syntax_error_includes_module` still passes after using multiprocessing.
This is helpful, to avoid having to pass the large cache object to the multiprocessing code.
613f0cd to
813881d
Compare
813881d to
ab2b66a
Compare
seddonym
left a comment
There was a problem hiding this comment.
Great stuff, thanks so much for this.
As discussed elsewhere, we should probably add an optimization for small numbers of modules where it's not worth multiprocessing. That will speed up the functional tests, if nothing else.
| ) | ||
|
|
||
|
|
||
| def _read_imports_from_cache( |
There was a problem hiding this comment.
Thanks for breaking this up, much easier to read.
| from ..domain.valueobjects import DirectImport, Module | ||
| from .config import settings | ||
|
|
||
| N_CPUS = multiprocessing.cpu_count() |
There was a problem hiding this comment.
🤔 I wonder what the overhead of this is, probably tiny but it might make sense to calculate it at runtime, only if we need it.
Maybe worth tweaking next time we're in this file.

This PR uses CPU parallelism to accelerate import scanning. For large codebases this should make building the graph much faster (on machines with multiple CPU cores).
On my machine (8 cores):
We hope to move more and more of this graph building code into rust, so this change may not live very long. From my perspective the change can still be valuable though:
A previous iteration by @duzumaki here #142. Changes compared to that PR:
test_syntax_error_includes_moduleby makingSourceSyntaxErrorpickleable. Fixes the issue here Speed up import scanning by 400-500% #142 (comment)import_scannerobject many times e.g. if we would create a job per module file. I believe this will fix the issue observed here Speed up import scanning by 400-500% #142 (comment)Benchmarks suggest building without the cache is much faster, but building with the cache is a little slower #198 (comment)