fix: simplify and clarify ubuntuIndex.fetch #257

upils · 2025-11-21T15:07:58Z

Have you signed the CLA?

Simplify and clarify ubuntuIndex.fetch implementation. Makes it more
consistent with existing tools and reduces assumptions on the structure
of the archive.
Added benefit: Avoid requesting URLs detected as potential
path-traversal attacks, triggering security proxies.

Fixes #255

This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>

Signed-off-by: Paul Mars <paul.mars@canonical.com>

github-actions · 2025-11-21T15:13:20Z

Command	Mean [s]	Min [s]	Max [s]	Relative
`BASE`	11.779 ± 0.063	11.683	11.872	1.00 ± 0.01
`HEAD`	11.737 ± 0.021	11.700	11.767	1.00

Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny

Thank you @upils, it is great to see your work so quickly. Just one question and a few minor comments. Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

EDIT: Thinking more about it, strace is a terrible tool for this job.

letFunny · 2025-11-24T11:59:33Z

internal/archive/archive.go

+		suffix = "dists/" + index.suite + "/" + suffix
+	}
+
+	u, err := url.JoinPath(baseURL, suffix)


Maybe another option would be to keep the original logic and then do url.JoinPath(baseURL, "") to signal that what is important here is to clean the URL of the possible ... What is your opinion?

Another option is to use cleanURL instead as a variable name.

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function). I am not too convinced by the cleanURL name because it makes sense when we think of it as a result of the cleaning but not when we think of it as how it will be consumed. I am proposing reqURL. WDYT?

I think using url.JoinPath already conveys the intent of building a "clean" URL (it is clearly stated in the documentation of the function)

Sure but I wouldn't know "why" only that the function does it. The why is actually only documented in the fact that the test fails if I changed it to +. I don't have a strong opinion here, but I would like to preserve somehow the intent if possible.

Good point. However the more I think about it, the more I think calling index.fetch("../../"+ suffix,...) is the real problem, especially since it seems to be to "remove" the dist/<suite>/ part added to the URL. I will try to see if we can make it cleaner.

Using ../../ was indeed to workaround the addition of dists/<suite>/ to the suffix. So I removed the ../../ and added a comment to clarify why fetch() manipulates the prefix.

internal/archive/archive.go

internal/archive/archive_test.go

upils · 2025-11-24T14:13:00Z

Have you validated using strace or similar that all the URLs in a normal installation are properly cleaned now? I don't want to miss any use, so running chisel cut with and without a clean cache with strace might give us more reassurance.

I have tested running chisel debug check-release-archives --release ubuntu-25.04 with mitmproxy, both before and after the fix, confirming URLs are now clean.

Signed-off-by: Paul Mars <paul.mars@canonical.com>

Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny

Some more comments. As we discussed offline changing the rest of the code introduces more risks. I know you fully tested the current implementation and looked at all usages but please remember to do that again before the final submission.

letFunny · 2025-11-25T12:12:00Z

internal/archive/archive.go

+	// Scope content fetching with the suite unless fetching a package from the pool
+	if !strings.HasPrefix(suffix, "pool/") {
+		suffix = "dists/" + index.suite + "/" + suffix
+	}


Another option would be to move the magic one layer up, so that instead of:

index.fetch("InRelease") // relative to dist/.../ index.fetch(packagesPath + ...) // relative to dist/.../ index.fetch(suffix) // full path

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

If we instead keep the changes I would suggest the make the message a bit simpler, add punctuation and move it after the if statement, example:

if !strings.HasPrefix(suffix, "pool/") { // If path is not a package then it is relative to the suite. suffix = "dists/" + index.suite + "/" + suffix }

We make all of the fetching ask for the full path relative to the archive root which sounds cleaner to me.

I think that is the other way around: fetching calls made by index.fetch() should by default be relative to the suite path. And fetch() takes care of removing this scope when fetching a package. As a method of ubuntuIndex (configured with a specific suite) it makes more sense to me than fetching from the root of the archive.

I agree with your suggestion for the comment.

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils · 2025-11-25T12:47:21Z

internal/archive/archive.go

 	suffix := section.Get("Filename")
 	logf("Fetching %s...", suffix)
-	reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
+	reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)


[Note to reviewer]: Adding ../../ to the suffix aimed at forcing fetch() to construct a path starting at the root of the archive and pointing at a package (not under the scope of a suite). This was acting against the implementation of fetch() that is also already handling suffixes of URLs pointing at packages (prefixed with pool/). There is no change in behavior after removing ../../ except the URL is now already clean.

This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.

letFunny

Thanks for working on this Paul. I think the latest iteration is the least controversial. You are fixing the bug without doing refactors not strictly needed for the bugfix. And thank you for testing it with mitmproxy. Just a few questions.

internal/archive/archive_test.go

letFunny · 2025-11-26T11:00:04Z

internal/archive/archive.go

 	suffix := section.Get("Filename")
 	logf("Fetching %s...", suffix)
-	reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
+	reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)


This is now subtly more brittle. I checked the Debian documentation for archives and the guarantee about Filename is that:

These fields in Packages files give the filename(s) of (the parts of) a package in the distribution directories, relative to the root of the Debian hierarchy

Which means that we are relying on the implicit assumption that pool/ is the prefix. Maybe it is not important and it will never change but I do wonder if it would be better to decouple it as I suggested and always use absolute paths. This wouldn't break if the archive changes or if users were to use Debian archives for example.

letFunny

Discussed offline with Paul, there are tradeoffs for both approaches of having absolute paths vs keeping it like it is. I see the point about trying to do the minimal change now because it is extremely unlikely that /pool/ is going to change. Anyway, I don't want to block the PR, it looks good to me. Thank you Paul!

niemeyer · 2026-01-12T11:54:35Z

internal/archive/archive.go

 	suffix := section.Get("Filename")
 	logf("Fetching %s...", suffix)
-	reader, err := index.fetch("../../"+suffix, section.Get("SHA256"), fetchBulk)
+	reader, err := index.fetch(suffix, section.Get("SHA256"), fetchBulk)


@upils This comment is from Nov 26th, and it's still hanging in here unanswered. Can we please have a proper analysis and response here of what the situation is? We don't want to assume a particular magic path in our archives, and want instead to be able to handle these paths in the same way any other tool would. To be clear, I would much rather remove this suffix, as it's doing some implicit hackery on something else that isn't implied here, but we should do that knowingly, which is what Alberto is asking about here.

niemeyer · 2026-01-12T11:55:37Z

internal/archive/archive.go

 		url = baseURL + suffix
 	} else {
+		// If path is not a package then it is relative to the suite.
 		url = baseURL + "dists/" + index.suite + "/" + suffix


This comment assumes all we have inside pool are packages, but this logic has no idea. Other than that it is simply restating in English what the code says more clearly: if it's not in pool it's in dists. We don't need to repeat the code.

And this is also a bit misleading because the pool/ directory stores other kinds of artifacts.

letFunny

Minor comment, almost there Paul

letFunny · 2026-01-13T12:04:29Z

internal/archive/archive.go

 }

-func (index *ubuntuIndex) fetch(suffix, digest string, flags fetchFlags) (io.ReadSeekCloser, error) {
+func (index *ubuntuIndex) relativePath(suffix string) string {


I think we need a better name here, relative could be to the root of the page as well when thinking about URLs. Can you check the documentation to see if this directory has an established name or maybe what are the type of content do we expect here so that we can name it correctly?

Based on https://wiki.debian.org/DebianRepository/Format a better name, both more precise but not specific to the way we use it, is distPath.

internal/archive/archive.go

upils added 2 commits November 21, 2025 16:05

fix: Resolve URL path before fetching content

638e4c8

This prevents from submitting requests to URL containing any `../` that could trigger security proxies. Signed-off-by: Paul Mars <paul.mars@canonical.com>

fix: missing import

24dc87f

Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny mentioned this pull request Nov 21, 2025

HTTP requests to Ubuntu archive with path traversal in the URL not working in restricted environment #255

Open

test: URL cleaning

183f188

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils changed the title ~~fix: Resolve URL path before fetching content~~ fix: clean URL before fetching content Nov 21, 2025

upils changed the title ~~fix: clean URL before fetching content~~ fix: clean URLs before fetching content Nov 21, 2025

letFunny reviewed Nov 24, 2025

View reviewed changes

upils added 5 commits November 24, 2025 17:20

fix: mark error as internal

0c2633d

Signed-off-by: Paul Mars <paul.mars@canonical.com>

style: improve naming

7b84848

Signed-off-by: Paul Mars <paul.mars@canonical.com>

fix(test): improve error message formatting

1fbc902

Signed-off-by: Paul Mars <paul.mars@canonical.com>

docs: clarify suffix handling

57cef9f

Signed-off-by: Paul Mars <paul.mars@canonical.com>

refactor: simplify fetching package

31a78dc

Do not use `../../` in the path. It was trying to manipulate the behavior of `fetch()`, making assumptions on its implementation. Signed-off-by: Paul Mars <paul.mars@canonical.com>

letFunny reviewed Nov 25, 2025

View reviewed changes

refactor: revert some changes to minimize diff

a8a06ee

Signed-off-by: Paul Mars <paul.mars@canonical.com>

upils commented Nov 25, 2025

View reviewed changes

upils requested a review from letFunny November 25, 2025 12:50

letFunny reviewed Nov 26, 2025

View reviewed changes

letFunny approved these changes Nov 26, 2025

View reviewed changes

style: clearer test error message

c4e2c66

niemeyer requested changes Jan 12, 2026

View reviewed changes

upils added 2 commits January 12, 2026 15:05

style: remove redondant comment

f9f7a9e

refactor: do not assume the archive structure

9cd4437

upils requested a review from letFunny January 13, 2026 10:55

style: fix formatting

073443c

letFunny reviewed Jan 13, 2026

View reviewed changes

style: improving name to get the dist path

09c9b10

upils requested a review from letFunny January 13, 2026 13:20

upils changed the title ~~fix: clean URLs before fetching content~~ fix: simplify and clarify ubuntuIndex.fetch implementation Jan 13, 2026

upils changed the title ~~fix: simplify and clarify ubuntuIndex.fetch implementation~~ fix: simplify and clarify ubuntuIndex.fetch Jan 13, 2026

fix: simplify and clarify ubuntuIndex.fetch #257

Are you sure you want to change the base?

fix: simplify and clarify ubuntuIndex.fetch #257

Conversation

upils commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

letFunny left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

upils commented Nov 24, 2025

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

letFunny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

upils commented Nov 21, 2025 •

edited

Loading

github-actions bot commented Nov 21, 2025 •

edited

Loading

letFunny left a comment •

edited

Loading