Skip to content

Intern import details/metadata line contents#185

Merged
seddonym merged 4 commits intopython-grimp:masterfrom
Peter554:faster-graph-clone
Feb 11, 2025
Merged

Intern import details/metadata line contents#185
seddonym merged 4 commits intopython-grimp:masterfrom
Peter554:faster-graph-clone

Conversation

@Peter554
Copy link
Collaborator

@Peter554 Peter554 commented Feb 7, 2025

In this PR:

  1. I've updated the large graph fixture to include line number and line contents. This makes the test_copy_graph benchmark much more realistic. This does lead to a regression in the benchmarks, but this regression is expected since now the graph object is much bigger.
  2. I've interned the line contents (similar to the interning of the module name). This gives a significant boost to the performance of copying the graph. This is helpful, since in import-linter we always copy the graph before passing it to a contract.
  3. I've also done the renaming that was suggested: name -> interned_name, name_as_string -> name.

Since change 1. gives a performance regression, as compared with the benchmark on master, here are the benchmark results run locally comparing the 1st commit (add line contents to large graph fixture) with the last commit (interned line contents). This shows that the changes in this PR gives ~2.5x improvement to the graph copy.

------- benchmark 'test_copy_graph': 2 tests -------
Name (time in ms)                     Mean
----------------------------------------------------
test_copy_graph (NOW)              11.1924 (1.0)
test_copy_graph (0158_4d701e8)     27.8061 (2.48)
----------------------------------------------------

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 7, 2025

CodSpeed Performance Report

Merging #185 will degrade performances by 46.5%

Comparing Peter554:faster-graph-clone (ae3ea3d) with master (830ce44)

Summary

❌ 1 (👁 1) regressions
✅ 16 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 test_copy_graph 43.1 ms 80.5 ms -46.5%

@Peter554 Peter554 changed the title Faster graph clone Intern import details/metadata line contents Feb 7, 2025
@Peter554 Peter554 marked this pull request as ready for review February 7, 2025 21:34
Else the copy benchmark isn't really fair
I'd gone to all the trouble to remove strings from the graph
in the module names, but then forgot about all the strings
in the import details!

This gives a signifcant boost to the performance of
copying the graph.
Comment on lines +26 to +27
line_number=1,
line_contents=f"import {imported}",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a regression in the benchmark for test_copy_graph, but this is reasonable since now the graph object is much bigger!

importer: ModuleToken,
imported: ModuleToken,
line_number: usize,
line_number: u16,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives a small but measurable performance benefit. I think people aren't really having import statements past line 65535 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the change - but I can believe there are modules out there in the wild that get that big. How would Grimp behave if it did encounter such a module?

Ideally we should handle it gracefully, but as long as we have the new behaviour captured in a ticket, and the change mentioned in the CHANGELOG, I'm happy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would Grimp behave if it did encounter such a module?

Yes maybe it's worth a bit of thought.

It's quite a micro-optimisation anyways, so I think instead I'll just opt to use u32 here instead - I'm near certain that nobody will overflow that!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(With u32 I wouldn't update the changelog)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm near certain that nobody will overflow that!

Agreed 😄

Copy link
Collaborator

@seddonym seddonym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff - I think the change to line_number needs at least a mention in the CHANGELOG though, would you mind adding that?

importer: ModuleToken,
imported: ModuleToken,
line_number: usize,
line_number: u16,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the change - but I can believe there are modules out there in the wild that get that big. How would Grimp behave if it did encounter such a module?

Ideally we should handle it gracefully, but as long as we have the new behaviour captured in a ticket, and the change mentioned in the CHANGELOG, I'm happy.

@Peter554 Peter554 requested a review from seddonym February 10, 2025 20:54
@seddonym seddonym merged commit 758f28b into python-grimp:master Feb 11, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants