Skip to content

Conversation

@upils
Copy link
Collaborator

@upils upils commented Nov 21, 2025

  • Have you signed the CLA?

Simplify and clarify ubuntuIndex.fetch implementation. Makes it more
consistent with existing tools and reduces assumptions on the structure
of the archive.
Added benefit: Avoid requesting URLs detected as potential
path-traversal attacks, triggering security proxies.


Fixes #255

This prevents from submitting requests to URL containing any `../` that could trigger security proxies.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
@github-actions
Copy link

github-actions bot commented Nov 21, 2025

Command Mean [s] Min [s] Max [s] Relative
BASE 11.779 ± 0.063 11.683 11.872 1.00 ± 0.01
HEAD 11.737 ± 0.021 11.700 11.767 1.00

Signed-off-by: Paul Mars <paul.mars@canonical.com>
@upils upils changed the title fix: Resolve URL path before fetching content fix: clean URL before fetching content Nov 21, 2025
@upils upils changed the title fix: clean URL before fetching content fix: clean URLs before fetching content Nov 21, 2025
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

EDIT: Thinking more about it, strace is a terrible tool for this job.

suffix = "dists/" + index.suite + "/" + suffix
}

u, err := url.JoinPath(baseURL, suffix)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?

Another option is to use cleanURL instead as a variable name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)

Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.

@upils
Copy link
Collaborator Author

upils commented Nov 24, 2025

Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

I have tested running chisel debug check-release-archives --release ubuntu-25.04 with mitmproxy, both before and after the fix, confirming URLs are now clean.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Signed-off-by: Paul Mars <paul.mars@canonical.com>
Do not use `../../` in the path. It was trying to manipulate the behavior of
`fetch()`, making assumptions on its implementation.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.

Comment on lines 382 to 385
// Scope content fetching with the suite unless fetching a package from the pool
if !strings.HasPrefix(suffix, "pool/") {
suffix = "dists/" + index.suite + "/" + suffix
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to move the magic one layer up, so that instead of:

index.fetch("InRelease") // relative to dist/.../
index.fetch(packagesPath + ...) // relative to dist/.../
index.fetch(suffix) // full path

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:

	if !strings.HasPrefix(suffix, "pool/") {
		// If path is not a package then it is relative to the suite.
		suffix = "dists/" + index.suite + "/" + suffix
	}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.

I agree with your suggestion for the comment.

Signed-off-by: Paul Mars <paul.mars@canonical.com>
suffix := section.Get("Filename")
logf("Fetching %s...", suffix)
reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.

@upils upils requested a review from letFunny November 25, 2025 12:50
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.

suffix := section.Get("Filename")
logf("Fetching %s...", suffix)
reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!

suffix := section.Get("Filename")
logf("Fetching %s...", suffix)
reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.

url = baseURL + suffix
} else {
// If path is not a package then it is relative to the suite.
url = baseURL + "dists/" + index.suite + "/" + suffix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment assumes all we have inside pool are packages, but this logic has no idea. Other than that it is simply restating in English what the code says more clearly: if it's not in pool it's in dists. We don't need to repeat the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is also a bit misleading because the pool/ directory stores other kinds of artifacts.

@upils upils requested a review from letFunny January 13, 2026 10:55
Copy link
Collaborator

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, almost there Paul

}

func (index *ubuntuIndex) fetch(suffix, digest string, flags fetchFlags) (io.ReadSeekCloser, error) {
func (index *ubuntuIndex) relativePath(suffix string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a better name here, relative could be to the root of the page as well when thinking about URLs. Can you check the documentation to see if this directory has an established name or maybe what are the type of content do we expect here so that we can name it correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://wiki.debian.org/DebianRepository/Format a better name, both more precise but not specific to the way we use it, is distPath.

@upils upils requested a review from letFunny January 13, 2026 13:20
@upils upils changed the title fix: clean URLs before fetching content fix: simplify and clarify ubuntuIndex.fetch implementation Jan 13, 2026
@upils upils changed the title fix: simplify and clarify ubuntuIndex.fetch implementation fix: simplify and clarify ubuntuIndex.fetch Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP requests to Ubuntu archive with path traversal in the URL not working in restricted environment

3 participants