fix unshare infinite loop without CGO_ENABLED by lyp256 · Pull Request #590 · containers/container-libs

lyp256 · 2026-01-15T02:07:23Z

fix unshare.MaybeReexecUsingUserNamespace() infinite loop without CGO_ENABLED.

fixes: #160

podmanbot · 2026-01-15T02:08:36Z

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6637

mtrmac

Thanks!

I can’t see how this can possibly work. AFAICT the code with _Containers-pid-pipe and _Containers-continue-pipe exists in C became we need it to run (and give all the process environment changes in unshare.Cmd.Start an opportunity to happen) before the Go runtime starts creating any OS-level threads.

We can mimic the child process’ behavior in Go but I don’t see how we can reliably achieve the same outcome.

Cc: @giuseppe

giuseppe

how is this supposed to work?

Is this AI-generated?

giuseppe · 2026-01-15T15:12:28Z

storage/pkg/unshare/unshare_nocgo.go

+		attr.AmbientCaps = []uintptr{
+			unix.CAP_CHOWN,
+			unix.CAP_DAC_OVERRIDE,
+			unix.CAP_DAC_READ_SEARCH,
+			unix.CAP_FOWNER,
+			unix.CAP_FSETID,
+			unix.CAP_SYS_ADMIN,
+		}
+	}


I've not tested this PR but this won't work, we need all caps because we launch containers from this environment (and they don't need to be in the ambient set)

I've compared processes with C code and those without, and you're right—the C process does include all capabilities. However, I'm not entirely sure if adding all privileges is the correct decision. The reason I added these capabilities in my code was that when using storage, it directly attempted to delete a folder with permissions r-xr-xr-x, which resulted in a Permission deniederror. Adding the capabilities was my workaround to resolve this permission issue. Based on your suggestion, I will proceed with adding all the capabilities.

I haven't found any alternative methods to add capabilities other than using the ambient set. Do you have any suggestions?

When you create a user namespace you automatically get them. Is Go dropping them?

Based on my testing, I found that child processes will not have capabilities unless the ambient method is used. I collected relevant child process information from the /proc/<pid>/status file under three scenarios: using CGO, not using CGO without setting ambient capabilities, and not using CGO with ambient capabilities set.

Using CGO:

CapInh: 0000000000000000 CapPrm: 000001ffffffffff CapEff: 000001ffffffffff CapBnd: 000001ffffffffff CapAmb: 0000000000000000

Not using CGO and not setting ambient capabilities:

CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 000001ffffffffff CapAmb: 0000000000000000

Not using CGO but setting ambient capabilities:

CapInh: 000001ffffffffff CapPrm: 000001ffffffffff CapEff: 000001ffffffffff CapBnd: 000001ffffffffff CapAmb: 000001ffffffffff

This affects also processes executed by Podman, could you investigate why this is happening? An alternative would be to drop the ambient caps once we are in the namespace

lyp256 · 2026-01-16T02:29:56Z

Thanks!

I can’t see how this can possibly work. AFAICT the code with _Containers-pid-pipe and _Containers-continue-pipe exists in C became we need it to run (and give all the process environment changes in unshare.Cmd.Start an opportunity to happen) before the Go runtime starts creating any OS-level threads.

We can mimic the child process’ behavior in Go but I don’t see how we can reliably achieve the same outcome.

Cc: @giuseppe

how is this supposed to work?

Is this AI-generated?

I needed to use statically compiled Skopeo and Buildah for container image migration tasks in an offline environment. I compiled the necessary executables using commands similar to:

CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -tags exclude_graphdriver_btrfs,containers_image_openpgp skopeo/cmd/skopeo.

However, I encountered an issue where the unsharepackage did not function properly when CGO_ENABLED=0. While Skopeo worked correctly in most scenarios, operations involving containers-storage(which call unshare.MaybeReexecUsingUserNamespace, e.g. skopeo copy docker://docker.io/library/alpine:3 containers-storage:alpine:3) resulted in an infinite loop of child process creation.

To resolve this, I modified the code to directly pass syscall.CLONE_NEWUSER via exec.Cmd.SysProcAttr.Cloneflags. In the initfunction, I implemented logic to mimic the operations related to _Containers-pid-pipe and _Containers-continue-pipe as done in the C code. After recompiling both Buildah and Skopeo with these modifications, they now function as expected.

mtrmac · 2026-01-16T16:19:31Z

I implemented logic to mimic the operations related to _Containers-pid-pipe and _Containers-continue-pipe as done in the C code. After recompiling both Buildah and Skopeo with these modifications, they now function as expected.

None of this addresses the concern that doing any such changes after the Go runtime starts is too late. What am I missing? (Giuseppe?)

The conversation in #160 was AFAICT saying that to resolve this

MaybeReexecUsingUserNamespace can fail with an useful error message

I realize that does not help your use case but that doesn’t mean that we can ignore the constraints (if there indeed are any).

lyp256 · 2026-01-19T05:09:28Z

Under Giuseppe guidance, I re-investigated the implementation of unshare. After reading the source code related to golang's exec.Cmd, I found that exec.Cmd.Start() handles uid/gid mapping by writing directly to the u(g)id_map files. Due to Linux kernel restrictions, ordinary users cannot write additional ID mappings to the uid_mapand gid_mapfiles directly; this requires using the newuidmap and newgidmap commands in conjunction with the /etc/subuid and /etc/subgid files.

The golang exec.Cmd.Start() method does not use the newu(g)idmap commands. Writing directly to the u(g)id_map files results in permission issues. If the child process does not have the uid/gid mapping set, the system will not grant capabilities to the child process.

The current approach uses the EXECVE system call to handle the issue of writing ID mappings too late (a step I had previously missed, which necessitated the Ambient set). The complete process is roughly as follows:

The main process uses clone() to create a child process -> The child process uses execve() to start the child program -> The parent and child processes interactively complete the ID mapping -> The child process uses execve() again to re-execute the child program.

Please review @mtrmac @giuseppe

Signed-off-by: lyp256 <lyp256@qq.com>

giuseppe

tested locally.

LGTM

TomSweeneyRedHat · 2026-02-11T21:12:54Z

LGTM

TomSweeneyRedHat · 2026-02-11T21:13:12Z

re-ping @mtrmac

mtrmac

I really know ~nothing about the problem space; even if this is correct, reviewing it would require much more familiarity with the user namespace mechanics than I now have.

mtrmac · 2026-02-25T22:06:20Z

storage/pkg/unshare/unshare_nocgo.go

+	if c.Cmd.SysProcAttr == nil {
+		c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
+	}
+	attr := c.Cmd.SysProcAttr


All of this could happen only inside the if below.

mtrmac · 2026-02-25T22:17:40Z

storage/pkg/unshare/unshare_nocgo.go

+	}
+	attr := c.Cmd.SysProcAttr
+	if c.UnshareFlags&syscall.CLONE_NEWUSER != 0 {
+		attr.Cloneflags = uintptr(c.UnshareFlags)


Nothing documents why we are not using “unshare” flags for unshare.

AFAICS the two functions don’t accept exactly the same sets of flags, so even if this were correct for the callers we are currently anticipating, I’d expect the code here to report a failure if any unexpected flag value is set.

(And, well, if Cloneflags works here fine nowadays, wouldn’t we want to use it in the CGo code path as well, and save us the unshare trouble? Is there more subtlety to this? There probably is, given the two separate unshare calls.)

This sort of applies generally — if we can do something natively using SysProcAttr nowadays, I’d expect that to always be done, rather than having two different code paths.)

mtrmac · 2026-02-25T22:23:53Z

storage/pkg/unshare/unshare_nocgo.go

+	attr := c.Cmd.SysProcAttr
+	if c.UnshareFlags&syscall.CLONE_NEWUSER != 0 {
+		attr.Cloneflags = uintptr(c.UnshareFlags)
+		attr.GidMappingsEnableSetgroups = c.GidMappingsEnableSetgroups


This does nothing at all unless …

mtrmac · 2026-02-25T22:25:08Z

storage/pkg/unshare/unshare_nocgo.go

+		if c.Ctty != nil {
+			index := len(c.Cmd.ExtraFiles)
+			c.Cmd.ExtraFiles = append(c.Cmd.ExtraFiles, c.Ctty)
+			attr.Ctty = index


This does nothing unless …

mtrmac · 2026-02-25T22:26:37Z

storage/pkg/unshare/unshare_nocgo.go

+	_ = os.Unsetenv(name)
+	v, err := strconv.Atoi(env)
+	if err != nil {
+		return -1


The C code differentiates between “missing” and “invalid format”

mtrmac · 2026-02-25T22:30:15Z

storage/pkg/unshare/unshare_nocgo.go

+			bailOnError(err, "Read containers continue pipe")
+		}
+		if n > 0 {
+			bailOnError(fmt.Errorf(string(buf)), "Unexpected containers continue pipe read")


Format string vulnerability.

Nothing is all that unexpected about this error reporting path, the data exists only to be reported this way.

mtrmac · 2026-02-25T22:31:41Z

storage/pkg/unshare/unshare_nocgo.go

+			bailOnError(err, "Error during setsid")
+		}
+	}
+


Setpgrp handling?

mtrmac · 2026-02-25T22:32:26Z

storage/pkg/unshare/unshare_nocgo.go

+			bailOnError(err, "Setresgid failed")
+		}
+	}
+


The C version does a two-stage unshare. Why is it necessary there and not necessary here?

mtrmac · 2026-02-25T22:34:02Z

storage/pkg/unshare/unshare_nocgo.go

+	}
+
+	// Re-invoke the execve system call to obtain capabilities.
+	err := syscall.Exec("/proc/self/exe", os.Args, os.Environ())


containers_reexec does a lot of things. I have no idea why it does all of them (and the code doesn’t document it :| ); either way, I must ask: why is all of that not necessary here?

mtrmac · 2026-02-25T22:36:25Z

storage/pkg/unshare/unshare_nocgo.go

+		if err := syscall.Setresuid(0, 0, 0); err != nil {
+			bailOnError(err, "Setresuid failed")
+		}
+		if err := syscall.Setresgid(0, 0, 0); err != nil {
+			bailOnError(err, "Setresgid failed")
+		}


Why does the UID/GID order differ from the C version?

giuseppe · 2026-03-13T14:42:18Z

@lyp256 are you still working on this PR?

github-actions bot added the storage Related to "storage" package label Jan 15, 2026

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026

dnm: Vendor changes from containers/container-libs#590

82eec1c

podmanbot mentioned this pull request Jan 15, 2026

Sync: fix unshare infinite loop without CGO_ENABLED containers/buildah#6637

Draft

lyp256 force-pushed the unshre branch from 6c1311d to 3baf01c Compare January 15, 2026 02:13

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026

dnm: Vendor changes from containers/container-libs#590

97255ea

lyp256 force-pushed the unshre branch from 3baf01c to 058d581 Compare January 15, 2026 02:17

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 15, 2026

dnm: Vendor changes from containers/container-libs#590

245cd61

mtrmac requested changes Jan 15, 2026

View reviewed changes

giuseppe reviewed Jan 15, 2026

View reviewed changes

lyp256 force-pushed the unshre branch from 058d581 to 11f917a Compare January 16, 2026 03:11

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 16, 2026

dnm: Vendor changes from containers/container-libs#590

424c162

lyp256 force-pushed the unshre branch from 11f917a to e14e0f2 Compare January 19, 2026 04:18

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 19, 2026

dnm: Vendor changes from containers/container-libs#590

e49fed0

fix unshare without CGO_ENABLED

f0c5225

Signed-off-by: lyp256 <lyp256@qq.com>

lyp256 force-pushed the unshre branch from e14e0f2 to f0c5225 Compare January 19, 2026 06:10

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Jan 19, 2026

dnm: Vendor changes from containers/container-libs#590

a37b96b

lyp256 requested a review from giuseppe January 20, 2026 01:27

giuseppe approved these changes Jan 20, 2026

View reviewed changes

lyp256 requested a review from mtrmac January 22, 2026 01:30

mtrmac requested changes Feb 25, 2026

View reviewed changes

Conversation

lyp256 commented Jan 15, 2026

Uh oh!

podmanbot commented Jan 15, 2026

Uh oh!

mtrmac left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lyp256 Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lyp256 commented Jan 16, 2026

Uh oh!

mtrmac commented Jan 16, 2026

Uh oh!

lyp256 commented Jan 19, 2026

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

TomSweeneyRedHat commented Feb 11, 2026

Uh oh!

TomSweeneyRedHat commented Feb 11, 2026

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giuseppe commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mtrmac left a comment •

edited

Loading

lyp256 Jan 16, 2026 •

edited

Loading