-
Notifications
You must be signed in to change notification settings - Fork 248
Optimize runtime complexity of linter when passing in multiple snippets #725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize runtime complexity of linter when passing in multiple snippets #725
Conversation
johnbartholomew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, apologies for the multi-year delay before anyone looked at this. Thank you very much for the contribution!
linter/linter.go
Outdated
| vars[importedPath] = info.VarAt | ||
| } | ||
| for importedPath, rootNode := range roots { | ||
| variablesInFile[importedPath] = *findVariables(nodeWithLocation{rootNode, importedPath}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this is confusing now, because the function-scoped map variablesInFile is being updated gradually in each iteration of the loop over nodes, so we don't have either a complete set of variables collected before looping over nodes, nor fully separate minimal set of variables collected individually for each node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the PR changes the variablesInFile from function-scope to snippet-scope.
Previously, all variables for were evaluated for all snippets as a whole, using them when building the type graph for each snippet which is unnecessary, as most of the time they are not used.
so we try to collect only the vars that are actually used by the snippet to speed up the type graph creation, which is the slow part afaict from profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made this more confusing by rebasing and adding a commit. But in 8893d8e it actually doesn't change the variablesInFile scope, which is why I commented on it. It still has variablesInFile at function scope, it just updates it in the snippet scope (adding to it each time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the code history, support for linting multiple files in a single invocation was added in #545, and the review at the time noted that it doesn't really take advantage of being able to collect and process the expression graphs for all the files together. But the review did not note that just partially processing the files together would itself create a performance problem.
It seems like from there we can go in one of two directions: either split processing apart more, so each listed input file is processed mostly independently of the others (which is the direction this PR goes in), or make a deeper change in the linter expression/type graph processing to more efficiently process all the input files with a single expression/type graph. Since the latter is obviously a more invasive change I'm happy to stick with this fix. It should be at least no worse than running the lint command entirely separately on each individual input file.
I do have a few smaller structural comments, added below.
|
Since this has been waiting for over 2 years, I will work under the assumption that the original author may not be available, so I'll rebase and add an adjustment and then merge this. |
ac93205 to
23a013c
Compare
|
It's a long time and I tried to get back into the PR to understand what I did there. I did some debugging at the time and some things got evaluated again and again, which the patch avoids though there are certainly better ways to do it. |
|
Sadly we don't have any automated benchmark checks that run as part of CI. However, the normal tests do report how long in total the linter test took to run. Here are recent results from the CI workflow on master (before this PR):
Here's the result for the latest CI run on this PR:
|
@netomi Oh, hi! Glad you're still around. Yeah, this is definitely an improvement and worth merging even if there are other ways. |
|
ah now I remember again what happens: in the original code we create a roots map of all snippets and when we evaluate each individual snippet, all roots are added to the type graph and prepared which takes the majority of time. With the patch, we try to be clever and only create a map of roots that are referenced via the variables that are found in the snippet. |
This fixes #557 .
Instead of calculating all roots at the start we delay that when evaluating each individual snippet.
The roots data structure is populated with all nodes of all imported jsonnet snippets, so we utilize that also in the importFunc where we try to resolve imports from the list of roots to avoid doing duplicate imports.
However this is a best effort only, as the mechanism is not smart, but is just able to resolve imports in the same path. This could be further improved, but for the test cases that is already sufficient, as all of them are in the same directory.
With this fix, the multiple tests run in around 0.23s compared to 9s before.