-
Notifications
You must be signed in to change notification settings - Fork 55
fix: simplify and clarify ubuntuIndex.fetch #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
|
Signed-off-by: Paul Mars <paul.mars@canonical.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.
EDIT: Thinking more about it, strace is a terrible tool for this job.
internal/archive/archive.go
Outdated
| suffix = "dists/" + index.suite + "/" + suffix | ||
| } | ||
|
|
||
| u, err := url.JoinPath(baseURL, suffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?
Another option is to use cleanURL instead as a variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)
Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.
I have tested running |
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.
internal/archive/archive.go
Outdated
| // Scope content fetching with the suite unless fetching a package from the pool | ||
| if !strings.HasPrefix(suffix, "pool/") { | ||
| suffix = "dists/" + index.suite + "/" + suffix | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to move the magic one layer up, so that instead of:
index.fetch("InRelease") // relative to dist/.../
index.fetch(packagesPath + ...) // relative to dist/.../
index.fetch(suffix) // full path
We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.
If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:
if !strings.HasPrefix(suffix, "pool/") {
// If path is not a package then it is relative to the suite.
suffix = "dists/" + index.suite + "/" + suffix
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.
I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.
I agree with your suggestion for the comment.
Signed-off-by: Paul Mars <paul.mars@canonical.com>
internal/archive/archive.go
Outdated
| suffix := section.Get("Filename") | ||
| logf("Fetching %s...", suffix) | ||
| reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk) | ||
| reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:
These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy
Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.
internal/archive/archive.go
Outdated
| suffix := section.Get("Filename") | ||
| logf("Fetching %s...", suffix) | ||
| reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk) | ||
| reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:
These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy
Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!
internal/archive/archive.go
Outdated
| suffix := section.Get("Filename") | ||
| logf("Fetching %s...", suffix) | ||
| reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk) | ||
| reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.
internal/archive/archive.go
Outdated
| url = baseURL + suffix | ||
| } else { | ||
| // If path is not a package then it is relative to the suite. | ||
| url = baseURL + "dists/" + index.suite + "/" + suffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment assumes all we have inside pool are packages, but this logic has no idea. Other than that it is simply restating in English what the code says more clearly: if it's not in pool it's in dists. We don't need to repeat the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is also a bit misleading because the pool/ directory stores other kinds of artifacts.
letFunny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, almost there Paul
internal/archive/archive.go
Outdated
| } | ||
|
|
||
| func (index *ubuntuIndex) fetch(suffix, digest string, flags fetchFlags) (io.ReadSeekCloser, error) { | ||
| func (index *ubuntuIndex) relativePath(suffix string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a better name here, relative could be to the root of the page as well when thinking about URLs. Can you check the documentation to see if this directory has an established name or maybe what are the type of content do we expect here so that we can name it correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://wiki.debian.org/DebianRepository/Format a better name, both more precise but not specific to the way we use it, is distPath.
Simplify and clarify
ubuntuIndex.fetchimplementation. Makes it moreconsistent with existing tools and reduces assumptions on the structure
of the archive.
Added benefit: Avoid requesting URLs detected as potential
path-traversal attacks, triggering security proxies.
Fixes #255