Skip to content

tar: fix etc remapping of paths with non-ASCII characters#2073

Merged
cgwalters merged 3 commits intobootc-dev:mainfrom
yeetypete:fix/tar-pax-etc-remap
Mar 17, 2026
Merged

tar: fix etc remapping of paths with non-ASCII characters#2073
cgwalters merged 3 commits intobootc-dev:mainfrom
yeetypete:fix/tar-pax-etc-remap

Conversation

@yeetypete
Copy link
Contributor

@yeetypete yeetypete commented Mar 16, 2026

PAX extended headers take precedence over basic tar header fields per POSIX. When a container layer contains PAX path or linkpath headers (e.g. for non-ASCII filenames), they override the remapped path written to the basic header, causing files that should land under /usr/etc to remain under /etc.

This PR filters out path and linkpath from PAX extensions in copy_entry before writing the output entry. The tar crate regenerates them from the remapped path passed to append_data/append_link. I also add a test to ensure that paths containing non-ASCII characters are remapped properly.

Fixes bootcrew/ubuntu-bootc#4 (I ran into the same issue building my own ubuntu-based bootc image).

Example error:

DEBUG Unpacking ce3df020c13aeeedc9f76a5c17ec16ad9cea07ebac153f28cc21de0df5a82d2f
DEBUG Unpacking 88a14deac5f276ea5cfafc9e60c0c8a23447c66ad06b0c8d4adbb15ee094e079
DEBUG Unpacking a87c799ffd61bc1624a71b732cc0ad4919c49cd924805e2f7d03b2d1883c67b0
DEBUG labeling from merged tree
DEBUG Removing usr/etc/resolv.conf
DEBUG Images found: 1
DEBUG Referenced layers: 23
DEBUG Found layers: 23
DEBUG pruned 0 layers
DEBUG Wrote merge commit 5f14c1364386257d9ca1ceff817c40c8b82393be4d8aec17beb06d269061f355
done (6 seconds)
error: Installing to disk: Creating ostree deployment: Performing deployment: Deploying tree: Initializing deployment: Preparing /etc: Tree contains both /etc and /usr/etc
Error: Process completed with exit code 1.

Disclaimer: This bug was quite hard to track down. I used claude to help find the root cause.

PAX extended headers take precedence over basic tar header fields
per POSIX. When a container layer contains PAX `path` or `linkpath`
headers (e.g. for non-ASCII filenames), they override the remapped
path written to the basic header, causing files that should land
under /usr/etc to remain under /etc.

Filter out `path` and `linkpath` from PAX extensions before writing
the output entry. The tar crate regenerates them from the remapped
path passed to append_data/append_link.

Signed-off-by: Peter Siegel <psiegel2000@icloud.com>
Verifies that PAX `path` headers (as produced by Docker/BuildKit for
non-ASCII filenames) do not bypass the /etc -> /usr/etc remap. Checks
both that no unremapped /etc PAX headers remain in the output and that
the remapped file appears under usr/etc.

Signed-off-by: Peter Siegel <psiegel2000@icloud.com>
@github-actions github-actions bot added the area/ostree Issues related to ostree label Mar 16, 2026
@bootc-bot bootc-bot bot requested a review from gursewak1997 March 16, 2026 21:26
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where PAX headers in tar archives were overriding path remapping, specifically for /etc to /usr/etc. The proposed solution correctly filters out the problematic path and linkpath headers. The changes include a new regression test to ensure the fix works as expected, including for paths with non-ASCII characters. My review focuses on improving the implementation's efficiency and making the new test more robust.

Signed-off-by: Peter Siegel <psiegel2000@icloud.com>
@yeetypete yeetypete changed the title Fix/tar pax etc remap tar: fix etc remapping of paths with non-ASCII characters Mar 16, 2026
Copy link
Contributor

@gursewak1997 gursewak1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have just two commits (add gemini suggestions into one of the previous commits) but changes lgtm

@yeetypete
Copy link
Contributor Author

Would be nice to have just two commits (add gemini suggestions into one of the previous commits) but changes lgtm

Sure, happy to do that. thanks for the quick review

@cgwalters
Copy link
Collaborator

I have an agent running on this doing some research, and man: tar is just a giant mess. There's so many incompatible extensions and variants, and some codebases don't follow the spec (because they can't because they need to care about things that don't honor the spec) etc.

In this specific case, from some agent research it looks like go archive/tar switches to PAX when it sees non-UTF8, but the Rust tar crate instead prefers GNU.

And...hmmm when we do this filtering...I think tar-rs will see that even though the header is pax it will cut over to GNU for emitting? Let me see...

@cgwalters
Copy link
Collaborator

On the positive side we don't do this tar filtering in the composefs path at least, so no bugs there.

@yeetypete
Copy link
Contributor Author

On the positive side we don't do this tar filtering in the composefs path at least, so no bugs there.

Unfortunately I'm stuck on kernel 5.15 so I wasn't quite able to get the composefs pathway working for my use-case. My end goal is trying to get bootc working on ubuntu for nvidia jetson devices. Any tips would be welcome but I guess bootc on ubuntu + jetson is probably uncharted territory :)

@cgwalters
Copy link
Collaborator

OK after some analysis...I think this is viable, though we could clearly do some more cleanup here. I did some in composefs/tar-core#19 (which will be used by composefs, not yet the ostree side yet).

Also on that topic we've got a lot of ongoing work on #20 which will eventually get us to the point where we may be able to completely rework the ostree-container storage as well such that it's based on the composefs storage, which would help avoid this (there's better stuff we can do there than filtering tar).

@cgwalters cgwalters merged commit 1fefce8 into bootc-dev:main Mar 17, 2026
57 of 61 checks passed
cgwalters added a commit to composefs/tar-core that referenced this pull request Mar 17, 2026
Motivated by bootc-dev/bootc#2073, where Go's archive/tar (used by
Docker/BuildKit) emits PAX path headers for non-ASCII filenames like
Főtanúsítvány.pem (valid UTF-8, but non-ASCII). PAX headers take
precedence over basic tar headers per POSIX, so code that remaps
paths by rewriting the basic header must also update or strip PAX
path/linkpath records.

tar-core already handles non-UTF-8 PAX path values correctly (raw
`&[u8]` throughout, matching Go archive/tar and Rust tar crate),
but this was untested. Add tests covering: parser acceptance of
non-UTF-8 PAX path bytes, lossy conversion, builder->parser roundtrip
with a >100 byte path (to actually trigger PAX emission), linkpath
preservation, and PaxExtension value_bytes() vs value() behavior.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to composefs/tar-core that referenced this pull request Mar 17, 2026
Test the PAX 'x' -> GNU 'L' -> real entry ordering, which is what
tar-rs's builder produces when you call append_pax_extensions() followed
by append_data() with a long path. This matters for ecosystem
compatibility -- bootc's copy_entry (bootc-dev/bootc#2073) generates
exactly this layout when filtering PAX extensions during path remapping.

The parser already handles this correctly via PendingMetadata
accumulation across recursive parse_header calls, but the reversed
ordering was untested. Also test that PAX path still wins over GNU
long name regardless of which comes first in the byte stream.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to composefs/tar-core that referenced this pull request Mar 17, 2026
@cgwalters
Copy link
Collaborator

Unfortunately I'm stuck on kernel 5.15 so I wasn't quite able to get the composefs pathway working for my use-case.

Hmm rhel9 is 5.14+lots-of-patches, but I am not aware of a hard reason that 5.15 (to be clear: kernel.org?) wouldn't work. Would take some investigation of course, IIRC rhel9 did backport some EROFS work, and I think the new mount API too...

@yeetypete
Copy link
Contributor Author

Hmm rhel9 is 5.14+lots-of-patches, but I am not aware of a hard reason that 5.15 (to be clear: kernel.org?) wouldn't work. Would take some investigation of course, IIRC rhel9 did backport some EROFS work, and I think the new mount API too...

I think that's exaclty what i'm missing. I get this error testing in an aarch64 qemu vm:

[8.562690] erofs: (device loop0): mounted with root inode @ nid 36.
[8.616013] overlayfs: empty lowerdir
[8.614736] initramfs-setup[316]: Error: Setting up /sysroot
             0: Mounting composefs image
             1: Failed to mount composefs image
             3: Creating filesystem mount
             4: Invalid argument (os error 22)

Unfortunately this isn't vanilla 5.15 as nvidia requires some out of tree modules for jetson. I had hoped to be able to get away with using ubuntu's prebuilt linux-nvidia-tegra-jetson kernel metapackage but I guess if that would work it would be too easy 😅

I guess i'm in for some more investigation on whether I can use a newer version of that package (they seem to also have a 6.8 kernel build) or build a custom kernel which has the fixes I need.

@cgwalters
Copy link
Collaborator

[8.614736] initramfs-setup[316]: Error: Setting up /sysroot
0: Mounting composefs image
1: Failed to mount composefs image
3: Creating filesystem mount
4: Invalid argument (os error 22)

This is an issue probably for composefs-rs; offhand it might have even been fixed by composefs/composefs-rs#265

I have no issues with trying to support 5.15 offhand, but that said we'd need to have a clean agreed reproducer environment.

@yeetypete
Copy link
Contributor Author

This is an issue probably for composefs-rs; offhand it might have even been fixed by composefs/composefs-rs#265

Thanks for the tip. I'll take another look. The good thing is I have bootc working in an aarch64 vm and on real hw for the jetson orin nano with ubuntu 24.04 and the jetson flavor of kernel 6.8. If I run into any more issues with 5.15 that are relevant for bootc I'll create some github issues / PRs and include a reproducible example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ostree Issues related to ostree

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error Linkpath can't be converted from UTF-8 to current locale

3 participants