Skip to content

CORS-4275: Add Windows support#10041

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
bear-redhat:issue/CORS-4275
Oct 31, 2025
Merged

CORS-4275: Add Windows support#10041
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
bear-redhat:issue/CORS-4275

Conversation

@bear-redhat
Copy link
Contributor

@bear-redhat bear-redhat commented Oct 29, 2025

This PR improves cross-platform compatibility.
It solves two main issues:

  1. inconsistent line endings
  2. inconsistent path separators

Path separators, in installer, needs to target two different
environments:

  1. the OS where the installer runs
  2. the OS where the injected files been used

This PR unified path separators used in 2 to be UNIX path separators,
while in 1 to be platform-dependant.

Ref: https://forum.golangbridge.org/t/filepath-join-or-path-join/13479

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe,
and openshift-installer.exe, will not exit once installation
aborted or completed. Users need to manually terminate those
processes in task manager.

Need kubernetes-sigs/controller-tools#1297

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 29, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2025

@bear-redhat: This pull request references CORS-4275 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe, and openshift-installer.exe, will not exit once installation aborted or completed. Users need to manually terminate those processes in task manager.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from bfournie and rna-afk October 29, 2025 14:54
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2025

@bear-redhat: This pull request references CORS-4275 which is a valid jira issue.

Details

In response to this:

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe, and openshift-installer.exe, will not exit once installation aborted or completed. Users need to manually terminate those processes in task manager.

Pending for kubernetes-sigs/controller-tools#1297

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally fine with adding unofficial windows support in this manner. Adding official support would require a significant amount of effort.

You should divide this up into some commits. Particularly the controller runtime stuff needs its own commit, and based on your commit message, it seems like windows is still leaking the processes? Does that always happen or only during interrupts?

// add ZTP manifests to manifestPath
for _, file := range agentManifests.FileList {
manifestFile := ignition.FileFromBytes(filepath.Join(manifestPath, filepath.Base(file.Filename)),
manifestFile := ignition.FileFromBytes(path.Join(manifestPath, filepath.Base(file.Filename)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be helpful to add context, for instance in the commit message, about why this change is needed. it's not obvious to me how this relates to windows support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in the commit messages and PR decription

@@ -0,0 +1,4 @@
data/data/bootstrap/systemd/**/*.service text eol=lf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the other comment, it would be helpful to add context, in a commit message and/or commit on why we need this file and how it relates to windows support (I assume it's specifically to handle windows new lines?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments in .gitattribute. This is to ensure the file checked out always have Unix line ending, since these files are used in CoreOS.

@@ -0,0 +1,23 @@
//go:build !windows
// +build !windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this at runtime? It seems like we could.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but it is quite a common practice to have platform-dependent code in separate files and use conditional compiling.
Ref: https://go.dev/wiki/TargetSpecific

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I am familiar with build tags, we have them for SCOS and in previous of the installer have used them for ARO and altinfra (non-Terraform versions of the installer, which has now become the default).

In general, I think we should use them only when necessary, and it seems in this case it is not necessary. That echoes the first point in the link you sent: "Minimize code in tagged files. As much code as possible should build for every target."

But more of my concern is that it would not be obvious to users that they should include the windows build tag. It would be better if it just worked for users, and that seems possible.

Also, I'm curious, does the non-windows build still run in windows? That is, is it possible for windows users to be running the binary that is built without the windows tag? It seems like it would only affect killing the processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Bear on slack, and determined that this cannot be handled at runtime. Windows builds fail with:

# github.com/openshift/installer/pkg/clusterapi/internal/process
pkg\clusterapi\internal\process\process.go:137:3: unknown field Setpgid in struct literal of type syscall.SysProcAttr

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2025

@bear-redhat: This pull request references CORS-4275 which is a valid jira issue.

Details

In response to this:

This PR improves cross-platform compatibility.
It solves two main issues:

  1. inconsistent line endings
  2. inconsistnet path separators

Path separators, in installer, needs to target two different
environments:

  1. the OS where the installer runs
  2. the OS where the injected files been used

This PR unified path separators used in 2 to be UNIX path separators,
while in 1 to be platform-dependant.

Ref: https://forum.golangbridge.org/t/filepath-join-or-path-join/13479

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe,
and openshift-installer.exe, will not exit once installation
aborted or completed. Users need to manually terminate those
processes in task manager.

Need kubernetes-sigs/controller-tools#1297

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This PR improves cross-platform compatibility.
It solves two main issues:
1. inconsistent line endings
2. inconsistent path separators

Path separators, in installer, needs to target two different
environments:
1. the OS where the installer runs
2. the OS where the injected files been used

This PR unified path separators used in 2 to be UNIX path separators,
while in 1 to be platform-dependant.

Ref: https://forum.golangbridge.org/t/filepath-join-or-path-join/13479

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe,
and openshift-installer.exe, will not exit once installation
aborted or completed. Users need to manually terminate those
processes in task manager.
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 29, 2025

@bear-redhat: This pull request references CORS-4275 which is a valid jira issue.

Details

In response to this:

This PR improves cross-platform compatibility.
It solves two main issues:

  1. inconsistent line endings
  2. inconsistent path separators

Path separators, in installer, needs to target two different
environments:

  1. the OS where the installer runs
  2. the OS where the injected files been used

This PR unified path separators used in 2 to be UNIX path separators,
while in 1 to be platform-dependant.

Ref: https://forum.golangbridge.org/t/filepath-join-or-path-join/13479

Known issues:
The spawn processes, including etcd.exe, kube-apiserver.exe,
and openshift-installer.exe, will not exit once installation
aborted or completed. Users need to manually terminate those
processes in task manager.

Need kubernetes-sigs/controller-tools#1297

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bear-redhat
Copy link
Contributor Author

bear-redhat commented Oct 29, 2025

You should divide this up into some commits. Particularly the controller runtime stuff needs its own commit, and based on your commit message, it seems like windows is still leaking the processes? Does that always happen or only during interrupts?

It's an overall refinement on correcting the usage of path vs filepath, im not sure why/how to separate it into multiple PR, since its essentially addressing one single thing.

Does that always happen or only during interrupts?

Almost always. There are some rough edges on the localcontrolplane for terminating the sub-processes. I think they can be addressed in a separate PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2025

@bear-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 5166c3e link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-agent-two-node-fencing-ipv4 5166c3e link false /test e2e-agent-two-node-fencing-ipv4

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


const (
cvoOverridesFilename = "manifests/cvo-overrides.yaml"
cvoOverridesFilename = path.Join("manifests", "cvo-overrides.yaml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change is on purpose, changing const to var ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.Join("manifests", "cvo-overrides.yaml") (value of type string) is not constantcompiler[InvalidConstInit](https://pkg.go.dev/golang.org/x/tools/internal/typesinternal#InvalidConstInit)

@patrickdillon
Copy link
Contributor

There are some rough edges on the localcontrolplane for terminating the sub-processes. I think they can be addressed in a separate PR.

on mac & linux I'm not aware of any issues with leaked processes. I implemented the signal handling if you want to discuss further or have any questions. Leaked processes on windows is not a blocking concern for me for this PR, although it is pretty pernicious.

@patrickdillon
Copy link
Contributor

patrickdillon commented Oct 30, 2025

im not sure why/how to separate it into multiple PR, since its essentially addressing one single thing.

It's fine as it is now, but what I was trying to suggest is that you divide this PR into multiple commits (not open multiple PRs). One commit would handle updating filePath -> path and have a commit message explaining why, Another commit would have all the code updating the subprocess management.

But it's NBD, this is manageable as it is now.

@patrickdillon
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2025
@patrickdillon
Copy link
Contributor

/lgtm
/verified by e2e (no regressions)

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 31, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This PR has been marked as verified by e2e (no regressions).

Details

In response to this:

/lgtm
/verified by e2e (no regressions)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2025
for _, f := range r.File {
name := f.Name
if !p.Sources.Has(name) {
nameWithoutExt := strings.TrimSuffix(name, ".exe")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nameWithoutExt := strings.TrimSuffix(name, ".exe")
nameWithoutExt := strings.TrimSuffix(name, filepath.Ext(name))

Comment on lines +12 to +19
func flockFile(f *os.File, lock bool) error {
if lock {
var overlapped windows.Overlapped
return windows.LockFileEx(windows.Handle(f.Fd()), windows.LOCKFILE_EXCLUSIVE_LOCK, 0, 1, 0, &overlapped)
}
var overlapped windows.Overlapped
return windows.UnlockFileEx(windows.Handle(f.Fd()), 0, 1, 0, &overlapped)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like locking the first byte in the file? I guess it works, but should we lock the entire file?

const (
      reserved = 0
      allBytes = ^uint32(0)
  )

  func flockFile(f *os.File, lock bool) error {
      if lock {
          var overlapped windows.Overlapped
          return windows.LockFileEx(windows.Handle(f.Fd()), windows.LOCKFILE_EXCLUSIVE_LOCK, reserved, allBytes, allBytes, &overlapped)
      }
      var overlapped windows.Overlapped
      return windows.UnlockFileEx(windows.Handle(f.Fd()), reserved, allBytes, allBytes, &overlapped)
  }

For example: https://tip.golang.org/src/cmd/go/internal/lockedfile/internal/filelock/filelock_windows.go

Comment on lines +10 to +11
func setSysProcAttr(cmd *exec.Cmd) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func setSysProcAttr(cmd *exec.Cmd) {
      cmd.SysProcAttr = &syscall.SysProcAttr{
          CreationFlags: syscall.CREATE_NEW_PROCESS_GROUP,
      }
  }

We should probably isolate the local controlplane into a separate process group?

This way, if the install process is interrupted/terminated, the envtest environment clean-up can be handled by the installer (instead of the child processes terminating on itself).

Comment on lines +13 to +15
func stopProcess(ps *State) error {
return ps.Cmd.Process.Kill()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for grateful shutdown on windows 👀?

Maybe, like:

func stopProcess(ps *State) error {
      err := windows.GenerateConsoleCtrlEvent(windows.CTRL_BREAK_EVENT, uint32(ps.Cmd.Process.Pid))
      if err != nil {
          return err
      }
      return nil
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reading https://pkg.go.dev/os/signal#hdr-Windows so that hack comes to mind, but I don't exactly know how windows work 😞

@openshift-merge-bot openshift-merge-bot bot merged commit 8fe3603 into openshift:main Oct 31, 2025
26 of 28 checks passed
@tthvo
Copy link
Member

tthvo commented Oct 31, 2025

Please feel free to open a new PR if the suggestions works 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants