Rust Import Scanner by seddonym · Pull Request #221 · python-grimp/grimp

seddonym · 2025-06-09T14:09:17Z

Moves the ImportScanner class to Rust.

In order to do this we also need to define Rust-based swappable fake/real file systems to be passed in, so we can still unit test via Python. To reduce the amount of work involved, I've defined a narrower interface for the filesystem that only implements what is needed by ImportScanner. We can broaden it later when we come to do the same with caching / module finding.

I had trouble getting the ImportScanner to pickle/unpickle correctly, which is needed to do multiprocessing, so I've removed multiprocessing. This slows things down considerably (on a large graph, locally, building the graph goes from 5s -> 15s). So really we need to keep going and move the multiprocessing to threads before this is releasable.

There's also distinctly smelly Rust code that loads the Module and DirectImport dataclasses, rather than defining them in Rust - all in the interests of trying to limit how many changes I needed to make to keep the tests passing.

Parallelism next steps

The ImportScanner class currently requires the GIL, which means we still have a bit of work to do before we can move to multithreading. I think the best thing to do is abandon the ports-and-adapters approach for ImportScanner (we only have one of them anyway) and instead make a function that we can unit test from Python, that does them in bulk along the lines of #222. We should keep the unit tests of import scanner but just adapt them to call the bulk function instead - which at first could be in Python, but then we could push that down to Rust. That would allow us to turn the ImportScanner into a pure Rust class that doesn't need the GIL, and do any mapping to Python classes / exceptions in a wrapper function in Python.

Still to do

Address regression by reintroducing parallelism, either by making ImportScanner pickleable (temporarily) or by doing the work described above.

codspeed-hq · 2025-06-09T14:16:06Z

CodSpeed Instrumentation Performance Report

Merging #221 will degrade performances by 73.02%

_{Comparing rust-filesystem (0cfe293) with master (1d56065)}

Summary

❌ 6 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_deep_layers_large_graph_kept`	16.3 ms	19.2 ms	-15.17%
❌	`test_deep_layers_large_graph_violated`	10.4 ms	11.6 ms	-10.62%
❌	`test_no_chain`	1.1 ms	1.2 ms	-11.22%
❌	`test_no_chains`	1.1 ms	1.2 ms	-11.2%
❌	`test_build_django_from_cache_a_few_misses[350]`	298.1 ms	634.2 ms	-52.99%
❌	`test_build_django_uncached`	145.1 ms	537.8 ms	-73.02%

This necessitates us turning off multiprocessing, but we'll switch to multithreading in a subsequent commit.

seddonym · 2025-06-12T08:47:29Z

It occurred to me we could probably keep multiprocessing for the time being if, rather than passing the file system as an argument to the jobs, we pulled it from the settings object within each job. Might be worth a try so we can merge this.

seddonym · 2025-06-24T14:25:14Z

Replacing in favour of #229, which doesn't turn off the multiprocessing.

seddonym force-pushed the rust-filesystem branch from b068fbe to 5f90d48 Compare June 9, 2025 14:12

seddonym force-pushed the rust-filesystem branch 4 times, most recently from b420e06 to cf7672f Compare June 11, 2025 15:31

seddonym added 9 commits June 11, 2025 17:18

Add uv.lock to .gitignore

539901b

Define BasicFileSystem protocol

f88be04

Implement FakeBasicFileSystem

b79f5cc

Use FakeBasicFileSystem in unit test

7dd7b21

Pass BasicFileSystem to ImportScanner from use case

b51ccc8

Implement RealBasicFileSystem

781fcab

Use RealBasicFileSystem in prod

99729e6

This necessitates us turning off multiprocessing, but we'll switch to multithreading in a subsequent commit.

Strip out multiprocessing code

43cf3b2

Make AbstractImportScanner a Protocol

f17a172

seddonym force-pushed the rust-filesystem branch from cf7672f to bf68f26 Compare June 11, 2025 16:21

seddonym added 2 commits June 11, 2025 17:31

Define ImportScanner in Rust

a0564fe

Handle include_external_packages=None

0cfe293

seddonym force-pushed the rust-filesystem branch from bf68f26 to 0cfe293 Compare June 11, 2025 16:31

seddonym changed the title ~~Rust filesystem~~ Rust Import Scanner Jun 11, 2025

seddonym closed this Jun 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust Import Scanner#221

Rust Import Scanner#221
seddonym wants to merge 11 commits intomasterfrom
rust-filesystem

seddonym commented Jun 9, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jun 9, 2025 •

edited

Loading

Uh oh!

seddonym commented Jun 12, 2025

Uh oh!

seddonym commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seddonym commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parallelism next steps

Still to do

Uh oh!

codspeed-hq bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #221 will degrade performances by 73.02%

Summary

Benchmarks breakdown

Uh oh!

seddonym commented Jun 12, 2025

Uh oh!

seddonym commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seddonym commented Jun 9, 2025 •

edited

Loading

codspeed-hq bot commented Jun 9, 2025 •

edited

Loading