Conversation
I'm fine either way. |
|
I've also set this up as a separate repo. It works really well separately; the one issue is that testing a new version requires it to be pushed somewhere (a PR is fine). While the in-source version here allows you to make a commit but not push it anywhere. For #1082, I was curious about always stripping first, but didn't want to push that. But other than that, it's fine. This is the config file for the repo: {
"version": 1,
"project": "packaging",
"project_url": "https//github.com/pypa/packaging",
"show_commit_url": "https://github.com/pypa/packaging/commit/",
"repo": "https://github.com/pypa/packaging.git",
"environment_type": "virtualenv",
"build_command": ["python -m pip wheel -w {build_cache_dir} {build_dir}"],
"default_benchmark_timeout": 180,
"regressions_thresholds": {
".*": 0.01
},
"pythons": ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
}And you need a .gitignore, otherwise it's the same content pretty much. Oh, and this pyrpoject.toml, mostly generated by uv: [project]
name = "packaging-benchmark"
version = "0.1.0"
description = "Benchmark suite for packaging"
readme = "README.md"
requires-python = ">=3.8"
dependencies = [
"asv",
"pip",
] |
f1bab0b to
f7e2bae
Compare
1ad6ef8 to
d00139d
Compare
benchmarks/specifiers.py
Outdated
|
|
||
| @add_attributes(pretty_name="SpecifierSet filter") | ||
| def time_filter(self) -> None: | ||
| list(SpecifierSet(">5.0").filter(self.sample_versions)) |
There was a problem hiding this comment.
The specifier used in SpecifierSet can have a lot of different impacts on the performance, which operator is chosen, if it includes a pre-release in the version, etc.
I suggest we at least start with one simple specifier like this one, and one complex specifier like >=1,~=1.1,!=1.2.1,==1.*,<1.9, we can expect such complex specifiers as multiple requirements combined during resolving requirements.
There was a problem hiding this comment.
Do we also need to fill in or clear the cache?
There was a problem hiding this comment.
Yeah, in fact there are now multiple layers. The version layer:
- Filtering non-Version objects like strings
- Filtering Version objects without key computed
- Filtering Version objects with key computed
And the spec layer:
- Filtering with _spec_version empty
- Filtering with _spec_version not empty
For testing the performance of filtering I think it makes sense to test both with and without _spec_version populated.
To populate _spec_version:
for spec in self.specs_contains_warm:
_ = spec.contains(Version("0"))And to clear it out:
for spec in self.specs_contains_cold:
for s in spec._specs:
s._spec_version = NoneI'm less sure about the different types of versions, there's an argument for testing all three types of versions against the cold/warm specifiers. But if we're only testing one version type it should be warm versions as that is then most clearly testing specifier filtering performance, not version performance.
There was a problem hiding this comment.
Would you like to update this? Feel free to push to the branch, you can as a maintainer. Currently, time_filter_complex seems to be rather unstable, varying by +/- 2x comparing commits that don't change anything.
There was a problem hiding this comment.
@henryiii I played around with this and I think we should remove the complex specifier for now.
One improvement to variance I found was to reduce the number of versions, e.g. self.sample_versions = [Version(str(i / 10)) for i in range(1, 11)] and change the specifier appropriately, I think this allows asv to run more tests and remove outliers. But this didn't improve things enough for a complex specifier not to still have a lot of variance.
I do wonder if this is telling that SpecifierSet.filter could receive some improvements that makes it more predictable, I have a few draft implementations of different approaches I've been meaning to try, when I get some time I will see if they improve variation.
Btw, this was my final version (I was having some git issues pushing to your branch and don't have time to debug):
class TimeSpecSuite:
def setup(self) -> None:
with (DIR / "specs_sample.txt").open() as f:
self.spec_strs = [s.strip() for s in f.readlines()]
# Build and warm versions
self.single_version = Version("3.12")
self.sample_versions = [Version(str(i / 10)) for i in range(1, 11)]
self.single_version._key
for v in self.sample_versions:
v._key
# Build cold specifiers
self._single_cold_spec = SpecifierSet(">0.5")
self._cold_specs = [SpecifierSet(s) for s in self.spec_strs]
# Build warm specifiers
self._single_warm_spec = SpecifierSet(">0.5")
self._warm_specs = [SpecifierSet(s) for s in self.spec_strs]
for s in self._warm_specs:
for sp in s._specs:
sp.contains(self.single_version)
for sp in self._single_warm_spec._specs:
sp.contains(self.single_version)
@add_attributes(pretty_name="SpecifierSet constructor")
def time_constructor(self) -> None:
for s in self.spec_strs:
SpecifierSet(s)
@add_attributes(pretty_name="SpecifierSet contains (cold)")
def time_contains_cold(self) -> None:
for spec in self._cold_specs:
for sp in spec._specs:
sp._spec_version = None
for spec in self._cold_specs:
spec.contains(self.single_version)
@add_attributes(pretty_name="SpecifierSet contains (warm)")
def time_contains_warm(self) -> None:
for spec in self._warm_specs:
spec.contains(self.single_version)
@add_attributes(pretty_name="SpecifierSet filter (simple, cold)")
def time_filter_simple_cold(self) -> None:
for sp in self._single_cold_spec._specs:
sp._spec_version = None
list(self._single_cold_spec.filter(self.sample_versions))
@add_attributes(pretty_name="SpecifierSet filter (simple, warm)")
def time_filter_simple_warm(self) -> None:
list(self._single_warm_spec.filter(self.sample_versions))There was a problem hiding this comment.
I was playing around with a potential simple Specifiers optimization this morning (pre-computing or caching the operator look up), using these benchmarks showed that it didn't work out!
I've pushed these changed to this branch: 57d6f01. But I think I made a bit of a mess of the commit history by rebasing, sorry.
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
c4fefdc to
bbfadf9
Compare
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
bbfadf9 to
57d6f01
Compare
This is the benchmarking suite I've been using, such as for https://iscinumpy.dev/post/packaging-faster. It could be a separate repository; there are good arguments for both.
asvsupports running from a separate repository and from a branch. Since I wrote it in a branch, opening it up here first to see what others think.I haven't used asv before, so open to suggestions on best practices, like if it is usually run from the top level or from a dir, and where to put stuff.