From 6c087ef4077bd70ceb27134f69833919327bc062 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 17:07:46 +0000 Subject: [PATCH 01/22] Initial plan --- konnectivity-plan.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 konnectivity-plan.md diff --git a/konnectivity-plan.md b/konnectivity-plan.md new file mode 100644 index 0000000000..7337adee57 --- /dev/null +++ b/konnectivity-plan.md @@ -0,0 +1,40 @@ +High level goal: +kube-apiserver (KAS) running in the bootstrap environment cannot connect to a +webhook whose endpoint is in the pod network. I would like to make an +architectural change which permits the bootstrap KAS to be able to access +webhooks hosted in the cluster. Non-bootstrap KAS instances are not affected. + + +Architecture change: +Provide connectivity to the bootstrap KAS using Konnectivity. +* Deploy a Konnectivity server in the bootstrap environment +* Deploy the bootstrap KAS with an EgresSelectorConfiguration to proxy cluster + traffic to the local Konnectivity server. +* Deploy a DaemonSet which deploys a Konnectivity agent on all Nodes which + connects back to the Konnectivity server in the bootstrap environment. +* During teardown of the bootstrap environment, also remove the Konnectivity + agent DaemonSet. + + +To validate: +I believe this will permit the bootstrap KAS to access a webhook hosted in the +running cluster. + +I believe this Konnectivity deployment will not impact the non-bootstrap KAS +instances, which will continue to route normally to cluster-hosted webhooks. + + +Investigation tasks: +Find the relevant code in installer which creates the bootstrap environment. + +Determine what is different about the bootstrap KAS's networking environment +which means it cannot route to the pod network. + +Validate my assumptions above (I believe...) + +Determine if Konnectivity is already available in the OpenShift payload. I +believe HyperShift uses Konnectivity already. The code is available at +/home/mbooth/src/openshift/hypershift. + +Determine how the Konnectivity agent would authenticate to the Konnectivity +server. From cb87adc604c96d8dff68881cbc7d9919116e41f6 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 17:21:49 +0000 Subject: [PATCH 02/22] Claude: plan update --- konnectivity-plan.md | 216 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 187 insertions(+), 29 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 7337adee57..21c26ceca8 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -1,40 +1,198 @@ -High level goal: -kube-apiserver (KAS) running in the bootstrap environment cannot connect to a -webhook whose endpoint is in the pod network. I would like to make an -architectural change which permits the bootstrap KAS to be able to access -webhooks hosted in the cluster. Non-bootstrap KAS instances are not affected. +# Konnectivity Bootstrap Integration Plan +## Goal -Architecture change: -Provide connectivity to the bootstrap KAS using Konnectivity. -* Deploy a Konnectivity server in the bootstrap environment -* Deploy the bootstrap KAS with an EgresSelectorConfiguration to proxy cluster - traffic to the local Konnectivity server. -* Deploy a DaemonSet which deploys a Konnectivity agent on all Nodes which - connects back to the Konnectivity server in the bootstrap environment. -* During teardown of the bootstrap environment, also remove the Konnectivity - agent DaemonSet. +Enable the bootstrap kube-apiserver (KAS) to access webhooks hosted in the pod network by deploying Konnectivity in the bootstrap environment. +**Scope**: All platforms, proof-of-concept quality. Prioritize a working demonstration over edge case handling. -To validate: -I believe this will permit the bootstrap KAS to access a webhook hosted in the -running cluster. +## Architecture Overview -I believe this Konnectivity deployment will not impact the non-bootstrap KAS -instances, which will continue to route normally to cluster-hosted webhooks. +1. Deploy a Konnectivity server in the bootstrap environment +2. Configure the bootstrap KAS with an EgressSelectorConfiguration to proxy cluster traffic through the local Konnectivity server +3. Deploy a DaemonSet that runs a Konnectivity agent on all nodes, connecting back to the bootstrap Konnectivity server +4. Remove the Konnectivity agent DaemonSet during bootstrap teardown +## Assumptions to Validate -Investigation tasks: -Find the relevant code in installer which creates the bootstrap environment. +- [ ] The bootstrap KAS will be able to access cluster-hosted webhooks via Konnectivity +- [ ] Non-bootstrap KAS instances will not be impacted and will continue routing normally -Determine what is different about the bootstrap KAS's networking environment -which means it cannot route to the pod network. +--- -Validate my assumptions above (I believe...) +## Phase 1: Investigation -Determine if Konnectivity is already available in the OpenShift payload. I -believe HyperShift uses Konnectivity already. The code is available at -/home/mbooth/src/openshift/hypershift. +All investigation tasks must be completed before proceeding to implementation. -Determine how the Konnectivity agent would authenticate to the Konnectivity -server. +### Task 1.1: Understand Bootstrap Environment Creation + +**Objective**: Identify where and how the bootstrap environment is created in the installer. + +**Deliverables**: +- Document which assets create the bootstrap machine configuration +- Identify where the bootstrap KAS is configured +- Locate the bootstrap Ignition config generation code +- List the relevant source files and their responsibilities + +**Search starting points**: +- `pkg/asset/` directory +- Bootstrap-related assets and ignition configs + +--- + +### Task 1.2: Understand Bootstrap KAS Network Limitations + +**Objective**: Determine why the bootstrap KAS cannot route to the pod network. + +**Deliverables**: +- Document the network topology of the bootstrap environment +- Explain why pod network connectivity is unavailable from the bootstrap node +- Identify any existing documentation on bootstrap networking constraints + +--- + +### Task 1.3: Investigate Bootstrap Teardown Mechanism + +**Objective**: Find the existing mechanism for cleaning up bootstrap resources and determine how to integrate Konnectivity agent removal. + +**Deliverables**: +- Document the bootstrap teardown/completion flow +- Identify where cluster resources are cleaned up when bootstrap completes +- Determine how to add the Konnectivity DaemonSet removal to this flow +- List relevant source files + +--- + +### Task 1.4: Investigate Konnectivity in OpenShift Payload + +**Objective**: Determine if Konnectivity components are already available in the OpenShift payload. + +**Deliverables**: +- Confirm whether Konnectivity server and agent images are in the payload +- If present, document the image references and how to obtain them +- If not present, document alternative sources for Konnectivity binaries/images + +--- + +### Task 1.5: Investigate HyperShift Konnectivity Deployment + +**Objective**: Examine how HyperShift deploys Konnectivity to inform our implementation. + +**Deliverables**: +- Document where HyperShift obtains Konnectivity components +- Extract relevant configuration patterns (server config, agent config, certificates) +- Identify any reusable manifests or configuration templates +- Note any authentication/certificate setup between agent and server + +**Source**: `/home/mbooth/src/openshift/hypershift` + +--- + +### Task 1.6: Research Konnectivity Authentication + +**Objective**: Determine how the Konnectivity agent authenticates to the Konnectivity server. + +**Deliverables**: +- Document the authentication mechanism (mTLS, tokens, etc.) +- Identify what certificates or credentials need to be generated +- Determine how to provision these credentials in the bootstrap environment +- Reference upstream Konnectivity documentation as needed + +--- + +### Task 1.7: Validate Assumptions via Documentation + +**Objective**: Verify the architectural assumptions through documentation research. + +**Deliverables**: +- Confirm that EgressSelectorConfiguration can route webhook traffic through Konnectivity +- Confirm that non-bootstrap KAS instances (without EgressSelectorConfiguration) route normally +- Document any caveats or limitations found + +**Sources**: +- Kubernetes EgressSelectorConfiguration documentation +- Konnectivity project documentation +- Existing OpenShift/HyperShift design docs + +--- + +## Phase 2: Implementation + +### Task 2.1: Generate Konnectivity Certificates + +**Objective**: Create the necessary certificates for Konnectivity server and agent authentication. + +**Deliverables**: +- Add certificate generation to the installer's asset pipeline +- Generate server certificate/key +- Generate agent certificate/key (or CA for agent authentication) +- Ensure certificates are included in appropriate Ignition configs + +--- + +### Task 2.2: Deploy Konnectivity Server on Bootstrap + +**Objective**: Add a Konnectivity server to the bootstrap environment. + +**Deliverables**: +- Create Konnectivity server configuration +- Add server deployment to bootstrap Ignition config (likely as a static pod or systemd unit) +- Configure server to listen for agent connections +- Ensure server starts before or alongside the bootstrap KAS + +--- + +### Task 2.3: Configure Bootstrap KAS with EgressSelectorConfiguration + +**Objective**: Configure the bootstrap KAS to proxy cluster-bound traffic through Konnectivity. + +**Deliverables**: +- Create EgressSelectorConfiguration for the bootstrap KAS +- Configure the KAS to use Konnectivity for "Cluster" egress traffic +- Add configuration to bootstrap KAS static pod manifest + +--- + +### Task 2.4: Create Konnectivity Agent DaemonSet + +**Objective**: Deploy Konnectivity agents on cluster nodes to connect back to the bootstrap server. + +**Deliverables**: +- Create DaemonSet manifest for Konnectivity agent +- Configure agent to connect to the bootstrap node's Konnectivity server +- Include necessary certificates/credentials for authentication +- Add manifest to bootstrap resources that get applied to the cluster + +--- + +### Task 2.5: Integrate Teardown with Bootstrap Completion + +**Objective**: Remove the Konnectivity agent DaemonSet when bootstrap completes. + +**Deliverables**: +- Add Konnectivity DaemonSet deletion to the bootstrap teardown flow +- Ensure clean removal without impacting cluster operation +- Test that non-bootstrap KAS continues operating normally after removal + +--- + +### Task 2.6: Manual Validation + +**Objective**: Verify the proof-of-concept works end-to-end. + +**Validation steps**: +1. Deploy a cluster with the modified installer +2. Verify Konnectivity server is running on bootstrap node +3. Verify Konnectivity agents are running on cluster nodes +4. Deploy a test webhook in the cluster +5. Confirm the bootstrap KAS can reach the webhook +6. Complete bootstrap teardown and verify agent DaemonSet is removed +7. Confirm non-bootstrap KAS can still reach webhooks normally + +--- + +## Open Questions + +- Which port should the Konnectivity server listen on? +- Should the Konnectivity server be a static pod or systemd service? +- What is the expected lifecycle overlap between bootstrap and cluster control plane? From 8df10ff14abe2f89502b658e41dbdc8d8eceabc3 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 17:39:55 +0000 Subject: [PATCH 03/22] Claude: tasks 1.1 and 1.1.1 --- konnectivity-plan.md | 188 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 179 insertions(+), 9 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 21c26ceca8..eca844b62a 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -28,15 +28,185 @@ All investigation tasks must be completed before proceeding to implementation. **Objective**: Identify where and how the bootstrap environment is created in the installer. -**Deliverables**: -- Document which assets create the bootstrap machine configuration -- Identify where the bootstrap KAS is configured -- Locate the bootstrap Ignition config generation code -- List the relevant source files and their responsibilities - -**Search starting points**: -- `pkg/asset/` directory -- Bootstrap-related assets and ignition configs +**Status**: ✅ Complete + +#### Findings + +##### Bootstrap Ignition Assets + +The bootstrap environment is created through a chain of assets in `pkg/asset/ignition/bootstrap/`: + +| File | Responsibility | +|------|----------------| +| [common.go](pkg/asset/ignition/bootstrap/common.go) | Core bootstrap config generation (774 lines). Defines `bootstrapTemplateData`, gathers dependencies, adds files/units from `data/data/bootstrap/` | +| [bootstrap.go](pkg/asset/ignition/bootstrap/bootstrap.go) | Main `Bootstrap` asset wrapper, produces `bootstrap.ign` | +| [bootstrap_in_place.go](pkg/asset/ignition/bootstrap/bootstrap_in_place.go) | Single-node variant, produces `bootstrap-in-place-for-live-iso.ign` | +| [cvoignore.go](pkg/asset/ignition/bootstrap/cvoignore.go) | CVO manifest overrides | + +Platform-specific modules exist in subdirectories (e.g., `bootstrap/baremetal/`, `bootstrap/aws/`). + +##### Bootstrap KAS Configuration + +The bootstrap KAS is **not pre-generated as a static pod manifest**. Instead, it is dynamically rendered during bootstrap execution by the `cluster-kube-apiserver-operator`: + +1. The bootstrap machine boots with `bootstrap.ign` +2. `bootkube.service` starts and runs `/usr/local/bin/bootkube.sh` +3. The **kube-apiserver-bootstrap** stage (lines 247-279 in [bootkube.sh.template](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template)) renders the KAS: + ```bash + KUBE_APISERVER_OPERATOR_IMAGE=$(image_for cluster-kube-apiserver-operator) + bootkube_podman_run \ + --name kube-apiserver-render \ + "${KUBE_APISERVER_OPERATOR_IMAGE}" \ + /usr/bin/cluster-kube-apiserver-operator render \ + --manifest-etcd-serving-ca=etcd-ca-bundle.crt \ + --manifest-etcd-server-urls="${ETCD_ENDPOINTS}" \ + ... + ``` +4. The rendered config is copied to `/etc/kubernetes/bootstrap-configs/kube-apiserver-config.yaml` +5. Static pod manifests are placed in `/etc/kubernetes/manifests/` for kubelet to pick up + +##### Bootstrap Ignition Generation Flow + +``` +InstallConfig + ↓ +├─→ Manifests (cluster-config.yaml, etc.) +├─→ TLS Assets (177+ certificate/key pairs) +├─→ KubeConfigs (Admin, Kubelet, Loopback) +├─→ Machines (Master, Worker definitions) +├─→ ReleaseImage +└─→ RHCOS Image + ↓ +Common.generateConfig() → bootstrap.ign +``` + +Key operations in `generateConfig()`: +1. Initialize Ignition v3.2 config +2. Add files from `data/data/bootstrap/files/` (processed as Go templates) +3. Add systemd units from `data/data/bootstrap/systemd/` +4. Inject manifests, TLS assets, and kubeconfigs via `addParentFiles()` +5. Apply platform-specific customizations +6. Serialize to JSON + +##### Data Files and Templates + +Bootstrap templates live in `data/data/bootstrap/`: + +``` +bootstrap/ +├── files/ +│ └── usr/local/bin/ +│ ├── bootkube.sh.template # Main bootstrap orchestrator +│ ├── kubelet.sh.template +│ └── ... +├── systemd/common/units/ +│ ├── bootkube.service # Runs bootkube.sh +│ ├── kubelet.service.template +│ └── ... +└── / # Platform-specific overrides +``` + +##### Key Insight for Konnectivity Integration + +Since the bootstrap KAS is rendered at runtime by `cluster-kube-apiserver-operator`, adding EgressSelectorConfiguration requires: +1. **Option A**: Modify the `cluster-kube-apiserver-operator` to support EgressSelectorConfiguration during bootstrap render (requires changes outside installer) +2. **Option B**: Post-process the rendered KAS config in `bootkube.sh` to inject EgressSelectorConfiguration +3. **Option C**: Add EgressSelectorConfiguration support to the installer's bootstrap TLS/config assets that get passed to the operator + +The Konnectivity server itself can be added as a static pod or systemd unit in the bootstrap Ignition, similar to how other bootstrap components are configured. + +**DECISION**: Based on task 1.1.1, use the existing capabilities of cluster-kube-apiserver-operator to render the EgressSelectorConfiguration. + +--- + +### Task 1.1.1: Investigate KAS Operator Render Command Extensibility + +**Objective**: Determine if `cluster-kube-apiserver-operator render` can be extended to output additional KAS arguments (specifically EgressSelectorConfiguration), or if post-processing is required. + +**Source**: `/home/mbooth/src/openshift/cluster-kube-apiserver-operator` + +**Status**: ✅ Complete + +#### Findings + +##### Built-in Extensibility Mechanism + +**Good news**: The render command already supports config overrides via `--config-override-files` flag. + +The flag can be specified multiple times and each file is: +1. Rendered as a Go text/template +2. Merged into the final `KubeAPIServerConfig` using `resourcemerge.MergeProcessConfig()` + +Merge order: `defaultConfig` → `bootstrapOverrides` → `--config-override-files` (in order provided) + +##### Render Command Output + +| Output | Format | Location | +|--------|--------|----------| +| KubeAPIServerConfig | YAML | `--config-output-file` (e.g., `/assets/kube-apiserver-bootstrap/config`) | +| Bootstrap manifests | YAML | `--asset-output-dir/bootstrap-manifests/` | +| Runtime manifests | YAML | `--asset-output-dir/manifests/` | + +##### EgressSelectorConfiguration Injection + +EgressSelectorConfiguration is a top-level field in `KubeAPIServerConfig`: + +```yaml +apiVersion: kubecontrolplane.config.openshift.io/v1 +kind: KubeAPIServerConfig +egressSelectorConfiguration: + egressSelections: + - name: "cluster" + connection: + proxyProtocol: "CONNECT" + transport: + type: "TCP" + tcp: + url: "http://127.0.0.1:" +``` + +##### Recommended Approach: Config Override File ✅ + +**No operator modifications required.** Use the existing `--config-override-files` mechanism: + +1. Create `data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml.template`: + ```yaml + apiVersion: kubecontrolplane.config.openshift.io/v1 + kind: KubeAPIServerConfig + egressSelectorConfiguration: + egressSelections: + - name: "cluster" + connection: + proxyProtocol: "CONNECT" + transport: + type: "TCP" + tcp: + url: "http://127.0.0.1:{{.KonnectivityPort}}" + ``` + +2. Modify [bootkube.sh.template](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template) to add the flag: + ```bash + /usr/bin/cluster-kube-apiserver-operator render \ + ... existing args ... + --config-override-files=/assets/egress-selector-config.yaml \ + ... + ``` + +##### Key Implementation Details + +- The bootstrap KAS reads config via `--openshift-config` flag from a YAML file +- EgressSelectorConfiguration in the merged YAML is picked up automatically +- Konnectivity server URL should use `127.0.0.1` since it runs on the same bootstrap node +- Transport can be plain TCP (simplest) or TLS (requires additional cert provisioning) + +##### Key Files + +| Component | Location | +|-----------|----------| +| Render command | `cluster-kube-apiserver-operator/pkg/cmd/render/render.go` | +| Config merge logic | `vendor/github.com/openshift/library-go/pkg/operator/render/options/generic.go:210-259` | +| Bootstrap KAS pod template | `cluster-kube-apiserver-operator/bindata/bootkube/bootstrap-manifests/kube-apiserver-pod.yaml` | +| Current render invocation | `installer/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template:247-269` --- From cc8f87527b4c9857e5d8ff2adbd7cd6e863d3d37 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 17:45:11 +0000 Subject: [PATCH 04/22] Claude: clarifications --- konnectivity-plan.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index eca844b62a..f1efcafa59 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -63,7 +63,9 @@ The bootstrap KAS is **not pre-generated as a static pod manifest**. Instead, it ... ``` 4. The rendered config is copied to `/etc/kubernetes/bootstrap-configs/kube-apiserver-config.yaml` -5. Static pod manifests are placed in `/etc/kubernetes/manifests/` for kubelet to pick up +5. Static pod manifests are placed in `/etc/kubernetes/manifests/` for the bootstrap kubelet + +**Note on bootstrap kubelet**: The bootstrap node runs a kubelet in "standalone" mode ([kubelet.sh.template](data/data/bootstrap/files/usr/local/bin/kubelet.sh.template)). It watches `--pod-manifest-path=/etc/kubernetes/manifests` for static pods but has no `--kubeconfig`, so it never connects to the API server and does not register as a Node object. ##### Bootstrap Ignition Generation Flow @@ -108,14 +110,14 @@ bootstrap/ ##### Key Insight for Konnectivity Integration -Since the bootstrap KAS is rendered at runtime by `cluster-kube-apiserver-operator`, adding EgressSelectorConfiguration requires: -1. **Option A**: Modify the `cluster-kube-apiserver-operator` to support EgressSelectorConfiguration during bootstrap render (requires changes outside installer) -2. **Option B**: Post-process the rendered KAS config in `bootkube.sh` to inject EgressSelectorConfiguration -3. **Option C**: Add EgressSelectorConfiguration support to the installer's bootstrap TLS/config assets that get passed to the operator +Since the bootstrap KAS is rendered at runtime by `cluster-kube-apiserver-operator`, adding EgressSelectorConfiguration requires passing a config override file to the render command. -The Konnectivity server itself can be added as a static pod or systemd unit in the bootstrap Ignition, similar to how other bootstrap components are configured. +**Approach** (validated in Task 1.1.1): +1. Create an egress selector config template in `data/data/bootstrap/files/` +2. Add `--config-override-files=` to the existing render invocation in `bootkube.sh.template` +3. The operator's built-in config merging will inject `egressSelectorConfiguration` into the final KubeAPIServerConfig -**DECISION**: Based on task 1.1.1, use the existing capabilities of cluster-kube-apiserver-operator to render the EgressSelectorConfiguration. +The Konnectivity server itself can be added as a static pod manifest in `/etc/kubernetes/manifests/`, similar to how the bootstrap KAS, etcd, and other control plane components run. --- From b997c3e7cf0f507d0be3dd6626e28c5596a2d567 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 17:55:38 +0000 Subject: [PATCH 05/22] Claude: bootstrap investigation --- konnectivity-plan.md | 88 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 84 insertions(+), 4 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index f1efcafa59..8d8b706ca0 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -216,10 +216,90 @@ egressSelectorConfiguration: **Objective**: Determine why the bootstrap KAS cannot route to the pod network. -**Deliverables**: -- Document the network topology of the bootstrap environment -- Explain why pod network connectivity is unavailable from the bootstrap node -- Identify any existing documentation on bootstrap networking constraints +**Status**: ✅ Complete + +#### Findings + +##### Bootstrap Network Topology + +The bootstrap node operates in **host network mode only**: + +1. **All containers use host networking** - From [bootkube.sh.template](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template) line 24-27: + ```bash + bootkube_podman_run() { + # we run all commands in the host-network to prevent IP conflicts with + # end-user infrastructure. + podman run --quiet --net=host --rm --log-driver=k8s-file "${@}" + } + ``` + +2. **Bootstrap kubelet has no CNI** - The standalone kubelet ([kubelet.sh.template](data/data/bootstrap/files/usr/local/bin/kubelet.sh.template)) runs without: + - `--cni-bin-dir` + - `--cni-conf-dir` + - Any network plugin configuration + +3. **Static pods run in host network namespace** - They get host IPs, not pod network IPs + +##### Why Pod Network is Unreachable + +| Bootstrap Node | Cluster Nodes | +|----------------|---------------| +| Host network only (`--net=host`) | Pod network + host network | +| No CNI plugin | OVN-Kubernetes or OpenShift-SDN | +| Infrastructure network (e.g., 10.0.0.0/24) | Pod CIDR routable (e.g., 10.128.0.0/14) | +| No routes to pod/service CIDRs | Full CNI routing stack | + +**Root cause**: The cluster-network-operator is deployed **after** bootstrap completes. During bootstrap: +- No CNI plugin is installed +- No VXLAN/Geneve tunnels exist +- No routes to pod CIDR (10.128.0.0/14) or service CIDR (172.30.0.0/16) + +##### Bootstrap Lifecycle vs Network Operator + +1. **Bootstrap phase**: Bootstrap KAS runs on host network, applies manifests via cluster-bootstrap +2. **CVO deploys operators**: Including the cluster-network-operator +3. **Pod network becomes operational**: CNI is configured, pod CIDR is routable on cluster nodes +4. **Teardown conditions met**: Required control plane pods running, CEO complete, etc. +5. **Bootstrap node exits**: After all teardown conditions are satisfied + +The pod network is operational *before* bootstrap completes. However, the bootstrap node itself never gains access to the pod network - it remains on host networking throughout its lifecycle. This is intentional: the bootstrap node is temporary and isolated from the production network by design. + +##### Bootstrap Teardown Conditions + +The bootstrap node marks itself complete (`/opt/openshift/.bootkube.done`) after these conditions are met: + +1. **cluster-bootstrap succeeds** - Applies manifests and waits for required pods: + ``` + openshift-kube-apiserver/kube-apiserver + openshift-kube-scheduler/openshift-kube-scheduler + openshift-kube-controller-manager/kube-controller-manager + openshift-cluster-version/cluster-version-operator + ``` + +2. **CVO overrides restored** - Patches `clusterversion.config.openshift.io/version` + +3. **CEO (Cluster Etcd Operator) completes** - Runs `wait-for-ceo` command + +4. **API DNS checks pass** - Both `API_URL` and `API_INT_URL` are reachable + +For HA clusters, the installer's `wait-for-bootstrap-complete` stage additionally waits for: +- At least 2 nodes with successful KAS revision rollout (checked via `kubeapiservers.operator.openshift.io/cluster` status) + +**Key insight**: Bootstrap teardown happens when the *control plane pods* are running on cluster nodes, but the *network operator* may still be initializing. The pod network becomes fully operational after bootstrap exits. + +##### Webhook Reachability Problem + +When a webhook is deployed in the pod network: +- Bootstrap KAS tries to call webhook at pod IP (e.g., `10.128.2.5:8443`) +- Pod lives in cluster network CIDR +- Bootstrap node has no route to that CIDR +- Result: **connection refused / no route to host** + +##### Why CNI Cannot Run on Bootstrap + +1. **Dependency cycle**: CNI operator needs API server, but API server needs network for webhooks +2. **Bootstrap is temporary**: Not part of production cluster topology +3. **Design isolation**: Host network prevents conflicts with user infrastructure --- From 0f6b480066631ee261c794841eb8eb82dae0bed1 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 18:02:57 +0000 Subject: [PATCH 06/22] Claude: teardown --- konnectivity-plan.md | 84 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 79 insertions(+), 5 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 8d8b706ca0..dd368313e7 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -307,11 +307,85 @@ When a webhook is deployed in the pod network: **Objective**: Find the existing mechanism for cleaning up bootstrap resources and determine how to integrate Konnectivity agent removal. -**Deliverables**: -- Document the bootstrap teardown/completion flow -- Identify where cluster resources are cleaned up when bootstrap completes -- Determine how to add the Konnectivity DaemonSet removal to this flow -- List relevant source files +**Status**: ✅ Complete + +#### Findings + +##### Bootstrap Teardown Flow + +``` +Bootstrap Node (bootkube.sh): +├─ cluster-bootstrap completes +├─ CVO overrides restored +├─ CEO wait completes +├─ touch /opt/openshift/.bootkube.done ← Bootstrap script complete +└─ bootkube.service exits + +Installer Process (cmd/openshift-install/create.go): +├─ waitForBootstrapComplete() ← Polls ConfigMap status +├─ gatherBootstrapLogs() (optional) +├─ destroybootstrap.Destroy() ← Deletes bootstrap infrastructure +│ └─ Platform-specific VM/network deletion +└─ WaitForInstallComplete() ← Waits for cluster ready +``` + +##### Cluster Resource Cleanup: Current State + +**Key finding**: The installer has **NO built-in mechanism** to delete cluster resources (DaemonSets, Deployments, etc.) after bootstrap completes. It only deletes infrastructure (VMs, networks, etc.). + +Evidence: +- No kubectl/oc calls in `DestroyBootstrap()` +- No manifest deletion in post-bootstrap flow +- The only cleanup example is removing the MCO bootstrap static pod from the bootstrap node's local filesystem (line 621 in bootkube.sh.template) + +##### Integration Point for Konnectivity Cleanup + +**Location**: [pkg/destroy/bootstrap/bootstrap.go](pkg/destroy/bootstrap/bootstrap.go) + +**Timing**: Delete the DaemonSet **after** infrastructure teardown completes. + +**Rationale**: +- Konnectivity tunnel remains available until the bootstrap node is destroyed +- Agents become orphaned when bootstrap disappears (expected) +- We then clean up the orphaned DaemonSet resources +- Cluster API remains accessible via production KAS (already running on cluster nodes) + +```go +// In pkg/destroy/bootstrap/bootstrap.go +func Destroy(ctx context.Context, dir string) error { + // ... existing platform setup ... + + // EXISTING: Platform-specific infrastructure cleanup + if err := provider.DestroyBootstrap(ctx, dir); err != nil { + return fmt.Errorf("error destroying bootstrap resources %w", err) + } + + // NEW: Clean up bootstrap-only cluster resources after infrastructure is gone + if err := deleteBootstrapClusterResources(ctx, dir); err != nil { + logrus.Warningf("Failed to clean up bootstrap cluster resources: %v", err) + // Don't fail - infrastructure is already destroyed + } + + return nil +} +``` + +##### Key Files + +| Purpose | File | +|---------|------| +| Bootstrap completion detection | [cmd/openshift-install/create.go:164](cmd/openshift-install/create.go) | +| Bootstrap infrastructure destroy | [pkg/destroy/bootstrap/bootstrap.go](pkg/destroy/bootstrap/bootstrap.go) | +| CAPI machine deletion | [pkg/infrastructure/clusterapi/clusterapi.go](pkg/infrastructure/clusterapi/clusterapi.go) | +| Bootstrap script | [bootkube.sh.template:574-655](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template) | + +##### Recommendation + +Add Konnectivity cleanup to `pkg/destroy/bootstrap/bootstrap.go`: +1. Create function to delete Konnectivity DaemonSet (and any related resources) +2. Call it **after** `provider.DestroyBootstrap()` returns successfully +3. Handle errors gracefully (warn but don't fail - infrastructure is already gone) +4. Delete order: DaemonSet → ConfigMaps → Namespace (if bootstrap-only) --- From eb160d3f4d00580c5edb6c8b495e17a1688f85bc Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 18:28:47 +0000 Subject: [PATCH 07/22] Claude: HyperShift investigation --- konnectivity-plan.md | 147 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 10 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index dd368313e7..24bb8cd2f6 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -393,10 +393,22 @@ Add Konnectivity cleanup to `pkg/destroy/bootstrap/bootstrap.go`: **Objective**: Determine if Konnectivity components are already available in the OpenShift payload. -**Deliverables**: -- Confirm whether Konnectivity server and agent images are in the payload -- If present, document the image references and how to obtain them -- If not present, document alternative sources for Konnectivity binaries/images +**Status**: ✅ Complete (answered by Task 1.5) + +#### Findings + +**Image name**: `apiserver-network-proxy` + +**Availability**: Present in OpenShift release payload + +**Binaries included**: +- `/usr/bin/proxy-server` - Konnectivity server +- `/usr/bin/proxy-agent` - Konnectivity agent + +**How to obtain**: Use the existing `image_for` function in bootkube.sh: +```bash +KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy) +``` --- @@ -404,14 +416,129 @@ Add Konnectivity cleanup to `pkg/destroy/bootstrap/bootstrap.go`: **Objective**: Examine how HyperShift deploys Konnectivity to inform our implementation. -**Deliverables**: -- Document where HyperShift obtains Konnectivity components -- Extract relevant configuration patterns (server config, agent config, certificates) -- Identify any reusable manifests or configuration templates -- Note any authentication/certificate setup between agent and server - **Source**: `/home/mbooth/src/openshift/hypershift` +**Status**: ✅ Complete + +#### Findings + +##### Image Source + +**Image name**: `apiserver-network-proxy` +- Available in OpenShift release payload +- Contains both `/usr/bin/proxy-server` and `/usr/bin/proxy-agent` + +##### Konnectivity Server Configuration + +HyperShift runs the server as a **sidecar container** in the kube-apiserver pod: + +```bash +/usr/bin/proxy-server + --logtostderr=true + --cluster-cert /etc/konnectivity/cluster/tls.crt + --cluster-key /etc/konnectivity/cluster/tls.key + --server-cert /etc/konnectivity/server/tls.crt + --server-key /etc/konnectivity/server/tls.key + --server-ca-cert /etc/konnectivity/ca/ca.crt + --server-port 8090 # Client endpoint (KAS connects here) + --agent-port 8091 # Agent endpoint (agents connect here) + --health-port 2041 + --admin-port 8093 + --mode http-connect + --proxy-strategies destHost,defaultRoute + --keepalive-time 30s + --frontend-keepalive-time 30s +``` + +**Key ports**: +- **8090**: Server endpoint (for KAS/proxy clients via HTTPConnect) +- **8091**: Agent endpoint (where agents connect) +- **2041**: Health check + +##### Konnectivity Agent Configuration + +HyperShift runs agents as a **DaemonSet** on worker nodes: + +```bash +/usr/bin/proxy-agent + --logtostderr=true + --ca-cert /etc/konnectivity/ca/ca.crt + --agent-cert /etc/konnectivity/agent/tls.crt + --agent-key /etc/konnectivity/agent/tls.key + --proxy-server-host + --proxy-server-port 8091 + --health-server-port 2041 + --agent-identifiers default-route=true + --keepalive-time 30s + --probe-interval 5s + --sync-interval 5s + --sync-interval-cap 30s +``` + +**DaemonSet settings**: +- `hostNetwork: true` (except IBMCloud) +- `dnsPolicy: Default` (use host resolver) +- Tolerates all taints +- Rolling update with 10% maxUnavailable + +##### Authentication: mTLS + +All Konnectivity communication uses **mutual TLS**. HyperShift generates these certificates: + +| Secret | Purpose | +|--------|---------| +| `konnectivity-signer` | Self-signed CA (signs all other certs) | +| `konnectivity-server` | Server cert for client endpoint (port 8090) | +| `konnectivity-cluster` | Server cert for agent endpoint (port 8091) | +| `konnectivity-client` | Client cert for KAS/proxies → server | +| `konnectivity-agent` | Agent cert for agents → server | + +##### EgressSelectorConfiguration + +HyperShift configures KAS with this egress selector: + +```yaml +apiVersion: apiserver.k8s.io/v1beta1 +kind: EgressSelectorConfiguration +egressSelections: +- name: controlplane + connection: + proxyProtocol: Direct +- name: etcd + connection: + proxyProtocol: Direct +- name: cluster + connection: + proxyProtocol: HTTPConnect + transport: + tcp: + url: https://konnectivity-server-local:8090 + tlsConfig: + caBundle: /etc/kubernetes/certs/konnectivity-ca/ca.crt + clientCert: /etc/kubernetes/certs/konnectivity-client/tls.crt + clientKey: /etc/kubernetes/certs/konnectivity-client/tls.key +``` + +**Key insight**: The `cluster` egress type uses `HTTPConnect` protocol with TLS, not plain TCP. + +##### Simplifications for Bootstrap POC + +For our proof-of-concept, we can simplify: + +1. **Server location**: Static pod on bootstrap node (not sidecar) +2. **Agent connection**: Agents connect to bootstrap node IP directly (no Route/Service) +3. **Authentication**: Can start with plain TCP, add mTLS later +4. **Single server**: No need for HA server count + +##### Key HyperShift Files + +| Component | File | +|-----------|------| +| PKI setup | `control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` | +| Server config | `control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go` | +| Agent DaemonSet | `control-plane-operator/hostedclusterconfigoperator/controllers/resources/konnectivity/reconcile.go` | +| Egress selector | `control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/egress-selector-config.yaml` | + --- ### Task 1.6: Research Konnectivity Authentication From 384ede1618a3f35cc47528a305a206b9dd364837 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 18:33:32 +0000 Subject: [PATCH 08/22] Decision to skip authentication --- konnectivity-plan.md | 62 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 52 insertions(+), 10 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 24bb8cd2f6..39d249a51a 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -545,11 +545,55 @@ For our proof-of-concept, we can simplify: **Objective**: Determine how the Konnectivity agent authenticates to the Konnectivity server. -**Deliverables**: -- Document the authentication mechanism (mTLS, tokens, etc.) -- Identify what certificates or credentials need to be generated -- Determine how to provision these credentials in the bootstrap environment -- Reference upstream Konnectivity documentation as needed +**Status**: ✅ Complete + +#### Decision: No Authentication for PoC + +For the proof-of-concept, **we will skip authentication entirely**: + +1. **KAS → Konnectivity Server**: The bootstrap KAS connects to the Konnectivity server over `localhost` (127.0.0.1). Since both run on the same bootstrap node, no authentication is required for this connection. + +2. **Agents → Konnectivity Server**: Agents connect to the Konnectivity server over **unauthenticated TCP**. The server will accept connections without mTLS client certificates. + +##### Security Considerations + +This approach is **insecure** but acceptable for a PoC because: +- The bootstrap environment is temporary and isolated +- The Konnectivity server only exists during cluster bootstrap +- Production-ready implementations should use mTLS (as HyperShift does) + +##### Server Configuration (Unauthenticated) + +```bash +/usr/bin/proxy-server + --logtostderr=true + --server-port 8090 # KAS connects here (localhost) + --agent-port 8091 # Agents connect here (no auth) + --health-port 2041 + --mode http-connect + --proxy-strategies destHost,defaultRoute + --keepalive-time 30s + --frontend-keepalive-time 30s + --uds-name "" # Disable UDS, use TCP only +``` + +##### Agent Configuration (Unauthenticated) + +```bash +/usr/bin/proxy-agent + --logtostderr=true + --proxy-server-host + --proxy-server-port 8091 + --health-server-port 2041 + --agent-identifiers default-route=true + --keepalive-time 30s + --probe-interval 5s + --sync-interval 5s +``` + +##### Future Work + +For production readiness, Task 2.1 (certificate generation) would need to be completed to add mTLS authentication. This is out of scope for the initial PoC. --- @@ -573,13 +617,11 @@ For our proof-of-concept, we can simplify: ### Task 2.1: Generate Konnectivity Certificates +**Status**: ⏭️ Skipped (out of scope for PoC) + **Objective**: Create the necessary certificates for Konnectivity server and agent authentication. -**Deliverables**: -- Add certificate generation to the installer's asset pipeline -- Generate server certificate/key -- Generate agent certificate/key (or CA for agent authentication) -- Ensure certificates are included in appropriate Ignition configs +**Note**: Per Task 1.6 decision, the PoC uses unauthenticated TCP connections. Certificate generation is deferred to future production-ready work. --- From 1e9dded8eb848dec1b336349dbf2d5ced4f649cf Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 18:40:01 +0000 Subject: [PATCH 09/22] Validate assumptions and select UDS for bootstrap transport --- konnectivity-plan.md | 74 ++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 23 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 39d249a51a..c27177c04b 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -15,8 +15,8 @@ Enable the bootstrap kube-apiserver (KAS) to access webhooks hosted in the pod n ## Assumptions to Validate -- [ ] The bootstrap KAS will be able to access cluster-hosted webhooks via Konnectivity -- [ ] Non-bootstrap KAS instances will not be impacted and will continue routing normally +- [x] The bootstrap KAS will be able to access cluster-hosted webhooks via Konnectivity (validated in Task 1.7) +- [x] Non-bootstrap KAS instances will not be impacted and will continue routing normally (validated in Task 1.7) --- @@ -160,11 +160,10 @@ egressSelectorConfiguration: egressSelections: - name: "cluster" connection: - proxyProtocol: "CONNECT" + proxyProtocol: "HTTPConnect" transport: - type: "TCP" - tcp: - url: "http://127.0.0.1:" + uds: + udsName: "/etc/kubernetes/konnectivity-server/konnectivity-server.socket" ``` ##### Recommended Approach: Config Override File ✅ @@ -179,11 +178,10 @@ egressSelectorConfiguration: egressSelections: - name: "cluster" connection: - proxyProtocol: "CONNECT" + proxyProtocol: "HTTPConnect" transport: - type: "TCP" - tcp: - url: "http://127.0.0.1:{{.KonnectivityPort}}" + uds: + udsName: "/etc/kubernetes/konnectivity-server/konnectivity-server.socket" ``` 2. Modify [bootkube.sh.template](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template) to add the flag: @@ -551,7 +549,7 @@ For our proof-of-concept, we can simplify: For the proof-of-concept, **we will skip authentication entirely**: -1. **KAS → Konnectivity Server**: The bootstrap KAS connects to the Konnectivity server over `localhost` (127.0.0.1). Since both run on the same bootstrap node, no authentication is required for this connection. +1. **KAS → Konnectivity Server**: The bootstrap KAS connects to the Konnectivity server via **Unix Domain Socket (UDS)**. Since both run on the same bootstrap node, UDS is the recommended transport and no authentication is required. 2. **Agents → Konnectivity Server**: Agents connect to the Konnectivity server over **unauthenticated TCP**. The server will accept connections without mTLS client certificates. @@ -567,14 +565,13 @@ This approach is **insecure** but acceptable for a PoC because: ```bash /usr/bin/proxy-server --logtostderr=true - --server-port 8090 # KAS connects here (localhost) - --agent-port 8091 # Agents connect here (no auth) + --uds-name=/etc/kubernetes/konnectivity-server/konnectivity-server.socket # KAS connects via UDS + --agent-port 8091 # Agents connect here (TCP, no auth) --health-port 2041 --mode http-connect --proxy-strategies destHost,defaultRoute --keepalive-time 30s --frontend-keepalive-time 30s - --uds-name "" # Disable UDS, use TCP only ``` ##### Agent Configuration (Unauthenticated) @@ -601,15 +598,46 @@ For production readiness, Task 2.1 (certificate generation) would need to be com **Objective**: Verify the architectural assumptions through documentation research. -**Deliverables**: -- Confirm that EgressSelectorConfiguration can route webhook traffic through Konnectivity -- Confirm that non-bootstrap KAS instances (without EgressSelectorConfiguration) route normally -- Document any caveats or limitations found - -**Sources**: -- Kubernetes EgressSelectorConfiguration documentation -- Konnectivity project documentation -- Existing OpenShift/HyperShift design docs +**Status**: ✅ Complete + +#### Assumption 1: EgressSelectorConfiguration Routes Webhook Traffic ✅ Confirmed + +The `cluster` egress selector covers **all traffic destined for the cluster**, including: +- Webhooks (ValidatingWebhookConfiguration, MutatingWebhookConfiguration) +- Aggregated API servers (APIService resources) +- Pod operations (exec, attach, logs, port-forward) +- Node and service proxy requests + +From the [KEP-1281 Network Proxy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/1281-network-proxy/README.md): +> "Pod requests (and pod sub-resource requests) are meant for the cluster and will be routed based on the 'cluster' NetworkContext." + +#### Assumption 2: Non-Bootstrap KAS Routes Normally ✅ Confirmed + +The Konnectivity feature is **opt-in** and disabled by default. From KEP-1281: +> "The feature is turned off in the KAS by default. Enabled by adding ConnectivityServiceConfiguration." + +When no `--egress-selector-config-file` flag is set, the KAS uses direct connections via standard network routing. This means: +- Production KAS instances on cluster nodes (without EgressSelectorConfiguration) will continue to route traffic directly +- Only the bootstrap KAS (with explicit configuration) will use the Konnectivity tunnel + +#### Caveats and Limitations + +1. **Flat network requirement**: The proxy requires non-overlapping IP ranges between control plane and cluster networks. This is satisfied in OpenShift since bootstrap and cluster networks are distinct. + +2. **DNS resolution**: DNS lookups are performed by the Konnectivity server/agent, not the KAS. This should be transparent but may affect debugging. + +3. **Protocol consistency**: The `proxyProtocol` (GRPC or HTTPConnect) must match between the KAS EgressSelectorConfiguration and Konnectivity server arguments. + +4. **Transport options**: + - **UDS (Unix Domain Socket)**: Recommended when KAS and Konnectivity server are co-located. **We will use UDS** for KAS → Konnectivity server communication since both run on the bootstrap node. + - **TCP**: Used for agent → server communication (agents connect from cluster nodes to bootstrap node) + +#### Sources + +- [Set up Konnectivity service | Kubernetes](https://kubernetes.io/docs/tasks/extend-kubernetes/setup-konnectivity/) +- [KEP-1281: Network Proxy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/1281-network-proxy/README.md) +- [apiserver-network-proxy (Konnectivity)](https://github.com/kubernetes-sigs/apiserver-network-proxy) +- [Konnectivity in HyperShift](https://hypershift.pages.dev/reference/konnectivity/) --- From b3a2c27d05c2a3f357748f3fda8d112f8bb99f23 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Thu, 29 Jan 2026 19:08:59 +0000 Subject: [PATCH 10/22] Implementation phase clarifications --- konnectivity-plan.md | 350 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 320 insertions(+), 30 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index c27177c04b..aff6b90f1e 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -163,7 +163,7 @@ egressSelectorConfiguration: proxyProtocol: "HTTPConnect" transport: uds: - udsName: "/etc/kubernetes/konnectivity-server/konnectivity-server.socket" + udsName: "/etc/kubernetes/bootstrap-configs/konnectivity-server.socket" ``` ##### Recommended Approach: Config Override File ✅ @@ -181,7 +181,7 @@ egressSelectorConfiguration: proxyProtocol: "HTTPConnect" transport: uds: - udsName: "/etc/kubernetes/konnectivity-server/konnectivity-server.socket" + udsName: "/etc/kubernetes/bootstrap-configs/konnectivity-server.socket" ``` 2. Modify [bootkube.sh.template](data/data/bootstrap/files/usr/local/bin/bootkube.sh.template) to add the flag: @@ -549,7 +549,7 @@ For our proof-of-concept, we can simplify: For the proof-of-concept, **we will skip authentication entirely**: -1. **KAS → Konnectivity Server**: The bootstrap KAS connects to the Konnectivity server via **Unix Domain Socket (UDS)**. Since both run on the same bootstrap node, UDS is the recommended transport and no authentication is required. +1. **KAS → Konnectivity Server**: The bootstrap KAS connects to the Konnectivity server via **Unix Domain Socket (UDS)** at `/etc/kubernetes/bootstrap-configs/konnectivity-server.socket`. This path is already mounted in the KAS pod (as the `config` volume), so no additional volume mounts are needed. Since both run on the same bootstrap node, UDS is the recommended transport and no authentication is required. 2. **Agents → Konnectivity Server**: Agents connect to the Konnectivity server over **unauthenticated TCP**. The server will accept connections without mTLS client certificates. @@ -565,7 +565,7 @@ This approach is **insecure** but acceptable for a PoC because: ```bash /usr/bin/proxy-server --logtostderr=true - --uds-name=/etc/kubernetes/konnectivity-server/konnectivity-server.socket # KAS connects via UDS + --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket # KAS connects via UDS (already mounted in KAS pod) --agent-port 8091 # Agents connect here (TCP, no auth) --health-port 2041 --mode http-connect @@ -657,11 +657,62 @@ When no `--egress-selector-config-file` flag is set, the KAS uses direct connect **Objective**: Add a Konnectivity server to the bootstrap environment. -**Deliverables**: -- Create Konnectivity server configuration -- Add server deployment to bootstrap Ignition config (likely as a static pod or systemd unit) -- Configure server to listen for agent connections -- Ensure server starts before or alongside the bootstrap KAS +**Approach**: Deploy as a **static pod** (consistent with other bootstrap components like KAS, etcd). + +**Files to modify**: + +| File | Purpose | +|------|---------| +| `data/data/bootstrap/files/usr/local/bin/bootkube.sh.template` | Add `KONNECTIVITY_IMAGE` variable and create static pod manifest inline | + +**Image injection approach**: + +The Konnectivity image is obtained using the `image_for` shell function (defined in [release-image.sh.template](data/data/bootstrap/files/usr/local/bin/release-image.sh.template)), which extracts images from the release payload. + +1. Add to bootkube.sh.template (around line 49-68 with other image definitions): + ```bash + KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy) + ``` + +2. Create the static pod manifest inline in bootkube.sh.template (before the KAS render stage), using the shell variable: + ```bash + # Create Konnectivity server static pod manifest + cat > /etc/kubernetes/manifests/konnectivity-server-pod.yaml < manifests/konnectivity-agent-daemonset.yaml < Date: Thu, 29 Jan 2026 22:42:22 +0000 Subject: [PATCH 11/22] Expand testing sections --- konnectivity-plan.md | 153 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 143 insertions(+), 10 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index aff6b90f1e..c3d7d411dd 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -910,18 +910,63 @@ func deleteKonnectivityResources(ctx context.Context, dir string) error { --- -### Task 2.6: Manual Validation +### Task 2.6: Ignition Config Validation -**Objective**: Verify the proof-of-concept works end-to-end. +**Objective**: Validate implementation without deploying a cluster. -**Validation steps**: -1. Deploy a cluster with the modified installer -2. Verify Konnectivity server static pod is running on bootstrap node -3. Verify Konnectivity agent pods are running on cluster nodes (check `kubectl get pods -n kube-system -l app=konnectivity-agent`) -4. Deploy a test webhook in the cluster (e.g., a simple validating webhook) -5. Confirm the bootstrap KAS can reach the webhook (create a resource that triggers the webhook) -6. Complete bootstrap teardown and verify agent DaemonSet is removed -7. Confirm production KAS on cluster nodes can still reach webhooks normally +Generate and inspect ignition configs to verify template processing: + +```bash +# Create manifests (intermediate step, allows inspection) +openshift-install --dir test-cluster create manifests + +# Create full ignition configs +openshift-install --dir test-cluster create ignition-configs +``` + +#### Validation Checklist + +After generating ignition configs, verify: + +1. **EgressSelectorConfiguration file exists**: + ```bash + jq -r '.storage.files[] | select(.path == "/opt/openshift/egress-selector-config.yaml") | .contents.source' \ + test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d + ``` + Expected: YAML with `egressSelectorConfiguration` containing UDS path `/etc/kubernetes/bootstrap-configs/konnectivity-server.socket` + +2. **bootkube.sh contains Konnectivity additions**: + ```bash + jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A5 "KONNECTIVITY_IMAGE" + ``` + Expected: `KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy)` + +3. **Konnectivity server static pod manifest creation in bootkube.sh**: + ```bash + jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A20 "konnectivity-server-pod.yaml" + ``` + Expected: Static pod manifest with correct image variable and UDS socket path + +4. **Konnectivity agent DaemonSet manifest creation in bootkube.sh**: + ```bash + jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A30 "konnectivity-agent-daemonset.yaml" + ``` + Expected: DaemonSet manifest with `hostNetwork: true`, bootstrap node IP, and correct image variable + +5. **KAS render command includes config override**: + ```bash + jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep "config-override-files" + ``` + Expected: `--config-override-files=/assets/egress-selector-config.yaml` + +**Limitations**: +- Runtime behavior (shell script execution) not tested +- Image extraction from release payload not tested +- Actual Konnectivity connectivity not validated --- @@ -959,6 +1004,94 @@ These should be investigated during implementation if they become blocking: --- +## Manual Testing + +This section describes manual testing procedures for validating the Konnectivity integration on a real cluster. + +### Obtaining the Bootstrap IP + +The bootstrap node's public IP can be extracted from the CAPI Machine manifest stored in `.clusterapi_output/`: + +```bash +# Find the bootstrap machine manifest and extract the external IP +yq '.status.addresses[] | select(.type == "ExternalIP") | .address' \ + .clusterapi_output/Machine-openshift-cluster-api-guests-*-bootstrap.yaml +``` + +Alternatively, view all addresses: +```bash +cat .clusterapi_output/Machine-openshift-cluster-api-guests-*-bootstrap.yaml | grep -A10 "addresses:" +``` + +Note: The `openshift-install gather bootstrap` command will automatically extract this IP if the `--bootstrap` flag is not provided. + +### SSH Access to Bootstrap Node + +The installer always generates a bootstrap SSH key pair (see [bootstrapsshkeypair.go](pkg/asset/tls/bootstrapsshkeypair.go)). The public key is embedded in the bootstrap ignition and added to the `core` user's authorized_keys alongside any user-provided SSH key. + +**Option 1**: Use user-provided SSH key (if configured in install-config.yaml): +```bash +ssh -i ~/.ssh/ core@ +``` + +**Option 2**: Extract the generated key from the installer state file: +```bash +# Extract the private key from .openshift_install_state.json +jq -r '.["*tls.BootstrapSSHKeyPair"]["Priv"]' .openshift_install_state.json | base64 -d > bootstrap-ssh.key +chmod 600 bootstrap-ssh.key + +# SSH to bootstrap node +ssh -i bootstrap-ssh.key core@ +``` + +Note: The `openshift-install gather bootstrap` command automatically uses the generated key, so manual extraction is only needed for interactive debugging sessions. + +### Bootstrap Service Debugging + +If bootstrap changes cause issues, use the service recording mechanism documented in [bootstrap_services.md](docs/dev/bootstrap_services.md): + +- Progress tracked in `/var/log/openshift/.json` +- Use `record_service_stage_start` / `record_service_stage_success` functions in shell scripts +- Failed-units captured in log bundle via `openshift-install gather bootstrap` + +### Log Gathering on Failure + +```bash +openshift-install gather bootstrap --bootstrap --master --master --master +``` + +The log bundle includes: +- `bootstrap/journals/bootkube.log` - Main bootstrap orchestration +- `bootstrap/containers/` - Container logs and descriptions +- `bootstrap/pods/` - Render command outputs + +### End-to-End Validation Steps + +1. Deploy a cluster with the modified installer +2. SSH to the bootstrap node and verify: + - Konnectivity server static pod is running: `crictl ps | grep konnectivity-server` + - UDS socket exists: `ls -la /etc/kubernetes/bootstrap-configs/konnectivity-server.socket` + - Server is listening on agent port: `ss -tlnp | grep 8091` +3. Verify Konnectivity agent pods are running on cluster nodes: + ```bash + kubectl get pods -n kube-system -l app=konnectivity-agent + ``` +4. Check agent connectivity to server (from bootstrap node): + ```bash + # Check server logs for agent connections + crictl logs $(crictl ps -q --name konnectivity-server) + ``` +5. Deploy a test webhook in the cluster (e.g., a simple validating webhook) +6. Confirm the bootstrap KAS can reach the webhook (create a resource that triggers the webhook) +7. Complete bootstrap teardown and verify agent DaemonSet is removed: + ```bash + kubectl get daemonset -n kube-system konnectivity-agent + # Should return "not found" after teardown + ``` +8. Confirm production KAS on cluster nodes can still reach webhooks normally + +--- + ## Post-PoC Considerations The following items are out of scope for the initial proof-of-concept but should be addressed for a production-ready implementation: From 5482a3fb0a7145116aef2b1f371d691773a10d2a Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 09:31:52 +0000 Subject: [PATCH 12/22] Initial implementation --- .../opt/openshift/egress-selector-config.yaml | 10 ++ .../files/usr/local/bin/bootkube.sh.template | 119 +++++++++++++++++- pkg/destroy/bootstrap/bootstrap.go | 41 ++++++ 3 files changed, 169 insertions(+), 1 deletion(-) create mode 100644 data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml diff --git a/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml b/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml new file mode 100644 index 0000000000..544fc76bab --- /dev/null +++ b/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml @@ -0,0 +1,10 @@ +apiVersion: kubecontrolplane.config.openshift.io/v1 +kind: KubeAPIServerConfig +egressSelectorConfiguration: + egressSelections: + - name: "cluster" + connection: + proxyProtocol: "HTTPConnect" + transport: + uds: + udsName: "/etc/kubernetes/bootstrap-configs/konnectivity-server.socket" diff --git a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template index 764f5bf5f3..fd30525978 100755 --- a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template @@ -66,6 +66,7 @@ OPENSHIFT_HYPERKUBE_IMAGE=$(image_for hyperkube) OPENSHIFT_CLUSTER_POLICY_IMAGE=$(image_for cluster-policy-controller) CLUSTER_BOOTSTRAP_IMAGE=$(image_for cluster-bootstrap) +KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy) mkdir --parents ./{bootstrap-manifests,manifests} @@ -244,6 +245,59 @@ then record_service_stage_success fi +# Create Konnectivity server static pod manifest +# This runs on the bootstrap node and provides a proxy for the bootstrap KAS +# to reach webhooks in the cluster's pod network. +if [ ! -f konnectivity-server-bootstrap.done ] +then + record_service_stage_start "konnectivity-server-bootstrap" + echo "Creating Konnectivity server static pod manifest..." + + cat > /etc/kubernetes/manifests/konnectivity-server-pod.yaml < manifests/konnectivity-agent-daemonset.yaml < Date: Fri, 30 Jan 2026 09:39:39 +0000 Subject: [PATCH 13/22] Claude: update plan with test performed --- konnectivity-plan.md | 246 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 209 insertions(+), 37 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index c3d7d411dd..429ef66134 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -912,61 +912,233 @@ func deleteKonnectivityResources(ctx context.Context, dir string) error { ### Task 2.6: Ignition Config Validation +**Status**: ✅ Complete + **Objective**: Validate implementation without deploying a cluster. -Generate and inspect ignition configs to verify template processing: +#### Prerequisites + +Build the installer with the Konnectivity changes: + +```bash +SKIP_TERRAFORM=y hack/build.sh +``` + +#### Test Setup + +Create a test directory with a minimal install-config.yaml. The "none" platform is used because it doesn't require cloud provider credentials or infrastructure validation: + +```bash +# Create test directory +mkdir -p /tmp/konnectivity-test +cd /tmp/konnectivity-test + +# Generate an SSH key for the test +ssh-keygen -t ed25519 -f /tmp/konnectivity-test/test-key -N "" -q + +# Create install-config.yaml +cat > install-config.yaml << EOF +apiVersion: v1 +baseDomain: example.com +metadata: + name: test-cluster +networking: + networkType: OVNKubernetes + clusterNetwork: + - cidr: 10.128.0.0/14 + hostPrefix: 23 + serviceNetwork: + - 172.30.0.0/16 + machineNetwork: + - cidr: 192.168.1.0/24 +controlPlane: + name: master + replicas: 3 +compute: +- name: worker + replicas: 2 +platform: + none: {} +pullSecret: '{"auths":{"fake":{"auth":"aWQ6cGFzcwo="}}}' +sshKey: '$(cat /tmp/konnectivity-test/test-key.pub)' +EOF +``` + +#### Generate Ignition Configs ```bash -# Create manifests (intermediate step, allows inspection) -openshift-install --dir test-cluster create manifests +cd /tmp/konnectivity-test +/path/to/installer/bin/openshift-install create ignition-configs +``` -# Create full ignition configs -openshift-install --dir test-cluster create ignition-configs +Expected output: +``` +level=warning msg=Release Image Architecture not detected. Release Image Architecture is unknown +level=info msg=Consuming Install Config from target directory +level=info msg=Successfully populated MCS CA cert information: root-ca ... +level=info msg=Successfully populated MCS TLS cert information: root-ca ... +level=info msg=Ignition-Configs created in: . and auth ``` #### Validation Checklist -After generating ignition configs, verify: +After generating ignition configs, run these validation commands: -1. **EgressSelectorConfiguration file exists**: - ```bash - jq -r '.storage.files[] | select(.path == "/opt/openshift/egress-selector-config.yaml") | .contents.source' \ - test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d - ``` - Expected: YAML with `egressSelectorConfiguration` containing UDS path `/etc/kubernetes/bootstrap-configs/konnectivity-server.socket` +##### 1. EgressSelectorConfiguration file exists -2. **bootkube.sh contains Konnectivity additions**: - ```bash - jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ - test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A5 "KONNECTIVITY_IMAGE" - ``` - Expected: `KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy)` +```bash +jq -r '.storage.files[] | select(.path == "/opt/openshift/egress-selector-config.yaml") | .contents.source' \ + bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d +``` -3. **Konnectivity server static pod manifest creation in bootkube.sh**: - ```bash - jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ - test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A20 "konnectivity-server-pod.yaml" - ``` - Expected: Static pod manifest with correct image variable and UDS socket path +**Expected output**: +```yaml +apiVersion: kubecontrolplane.config.openshift.io/v1 +kind: KubeAPIServerConfig +egressSelectorConfiguration: + egressSelections: + - name: "cluster" + connection: + proxyProtocol: "HTTPConnect" + transport: + uds: + udsName: "/etc/kubernetes/bootstrap-configs/konnectivity-server.socket" +``` -4. **Konnectivity agent DaemonSet manifest creation in bootkube.sh**: - ```bash - jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ - test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A30 "konnectivity-agent-daemonset.yaml" - ``` - Expected: DaemonSet manifest with `hostNetwork: true`, bootstrap node IP, and correct image variable +##### 2. KONNECTIVITY_IMAGE variable defined -5. **KAS render command includes config override**: - ```bash - jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ - test-cluster/bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep "config-override-files" - ``` - Expected: `--config-override-files=/assets/egress-selector-config.yaml` +```bash +jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A2 "KONNECTIVITY_IMAGE" +``` + +**Expected output**: +``` +KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy) + +mkdir --parents ./{bootstrap-manifests,manifests} +-- + image: ${KONNECTIVITY_IMAGE} + command: + - /usr/bin/proxy-server +-- + image: ${KONNECTIVITY_IMAGE} + command: + - /usr/bin/proxy-agent +``` + +##### 3. Konnectivity server static pod manifest creation + +```bash +jq -r '.storage.files[] | select(.path == "/usr/local/bin/bootkube.sh") | .contents.source' \ + bootstrap.ign | sed 's/data:text\/plain;charset=utf-8;base64,//' | base64 -d | grep -A40 "konnectivity-server-pod.yaml" | head -45 +``` + +**Expected output** (partial): +```yaml + cat > /etc/kubernetes/manifests/konnectivity-server-pod.yaml < manifests/konnectivity-agent-daemonset.yaml < Date: Fri, 30 Jan 2026 11:03:24 +0000 Subject: [PATCH 14/22] Claude: Add agent authentication investigation plan --- konnectivity-plan.md | 57 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 429ef66134..b594570444 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -641,6 +641,63 @@ When no `--egress-selector-config-file` flag is set, the KAS uses direct connect --- +### Task 1.8: Investigate Konnectivity Agent Authentication Options + +**Status**: 🔄 Pending + +**Objective**: Determine how to authenticate Konnectivity agents to the server, leveraging existing bootstrap certificate infrastructure where possible. + +**Context**: The current PoC implementation fails because the Konnectivity agent requires a CA certificate to validate the server connection: + +``` +Error: failed to run proxy connection with failed to read CA cert : read .: is a directory +``` + +The agent expects `--ca-cert`, `--agent-cert`, and `--agent-key` flags, which were intentionally omitted in the unauthenticated PoC design (Task 1.6). However, the agent appears to require at least a CA cert to function. + +#### Questions to Answer + +1. **Bootstrap TLS Assets Inventory** + - What certificates are already generated during bootstrap? + - Is there a CA that could sign Konnectivity certificates? + - Can we reuse an existing certificate (e.g., kubelet client cert, aggregator CA)? + - Where is the TLS asset generation code in the installer? (likely `pkg/asset/tls/`) + +2. **HyperShift Agent Authentication Deep Dive** + - How does HyperShift generate agent certificates? + - Does each agent pod require a unique client certificate, or do all agents share one? + - What CA signs the agent certificates? + - How are certificates distributed to the DaemonSet pods? + - Review: `control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` + +3. **Konnectivity Server Certificate Requirements** + - What certificates does the server need for the agent-facing endpoint (port 8091)? + - Can the server use an existing bootstrap certificate for client validation? + - What is the minimum viable certificate configuration? + +4. **Certificate Distribution Mechanism** + - How would certificates be mounted into the agent DaemonSet pods? + - Can we use a Kubernetes Secret created during bootstrap? + - How does HyperShift distribute certificates to agents? + +#### Files to Investigate + +| Area | Files | +|------|-------| +| Installer TLS assets | `pkg/asset/tls/*.go` | +| HyperShift Konnectivity PKI | `hypershift/control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` | +| HyperShift agent deployment | `hypershift/control-plane-operator/hostedclusterconfigoperator/controllers/resources/konnectivity/reconcile.go` | +| Bootstrap certificate usage | `data/data/bootstrap/files/usr/local/bin/bootkube.sh.template` | + +#### Expected Outcomes + +- Identify whether a shared agent certificate (single cert for all agents) or per-agent certificates are needed +- Determine if an existing bootstrap CA can sign Konnectivity certificates +- Define the minimum certificate configuration to get agents connecting +- Propose an implementation approach that integrates with the installer's existing TLS pipeline + +--- + ## Phase 2: Implementation ### Task 2.1: Generate Konnectivity Certificates From 3c2c5f0f238535fd3fa1dd0eb522c3f136e78424 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 12:20:02 +0000 Subject: [PATCH 15/22] Claude: Update plan to address agent failure to start --- konnectivity-plan.md | 599 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 530 insertions(+), 69 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index b594570444..44e5996056 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -643,7 +643,7 @@ When no `--egress-selector-config-file` flag is set, the KAS uses direct connect ### Task 1.8: Investigate Konnectivity Agent Authentication Options -**Status**: 🔄 Pending +**Status**: ✅ Complete **Objective**: Determine how to authenticate Konnectivity agents to the server, leveraging existing bootstrap certificate infrastructure where possible. @@ -653,60 +653,405 @@ When no `--egress-selector-config-file` flag is set, the KAS uses direct connect Error: failed to run proxy connection with failed to read CA cert : read .: is a directory ``` -The agent expects `--ca-cert`, `--agent-cert`, and `--agent-key` flags, which were intentionally omitted in the unauthenticated PoC design (Task 1.6). However, the agent appears to require at least a CA cert to function. +#### Findings + +##### 1. Konnectivity Binary Command-Line Flags (Validated via Execution) + +The actual command-line flags were validated by executing the binaries from the OpenShift release payload: + +**proxy-agent flags** (certificate-related): +| Flag | Default | Description | +|------|---------|-------------| +| `--ca-cert` | `""` (empty) | "If non-empty the CAs we use to validate clients." | +| `--agent-cert` | `""` (empty) | "If non-empty secure communication with this cert." | +| `--agent-key` | `""` (empty) | "If non-empty secure communication with this key." | + +**proxy-server flags** (certificate-related): +| Flag | Default | Description | +|------|---------|-------------| +| `--cluster-cert` | `""` (empty) | "If non-empty secure communication with this cert." (agent-facing) | +| `--cluster-key` | `""` (empty) | "If non-empty secure communication with this key." (agent-facing) | +| `--cluster-ca-cert` | `""` (empty) | "If non-empty the CA we use to validate Agent clients." | +| `--server-cert` | `""` (empty) | "If non-empty secure communication with this cert." (KAS-facing) | +| `--server-key` | `""` (empty) | "If non-empty secure communication with this key." (KAS-facing) | +| `--server-ca-cert` | `""` (empty) | "If non-empty the CA we use to validate KAS clients." | +| `--uds-name` | `""` (empty) | "uds-name should be empty for TCP traffic. For UDS set to its name." | + +**Key finding**: All certificate flags default to empty strings with "If non-empty" descriptions, suggesting unauthenticated mode should be supported. However, **validation testing revealed this is broken** (see below). + +##### 2. Root Cause of the Errors (Validated via Testing) + +Two separate issues were identified through container execution testing: + +**Issue A: Upstream Bug - CA Cert Path Handling** + +The error `failed to read CA cert : read .: is a directory` is caused by a bug in the upstream `apiserver-network-proxy` code: + +```go +// In pkg/util/certificates.go - getCACertPool() +os.ReadFile(filepath.Clean(caFile)) +``` + +When `caFile` is empty (`""`), `filepath.Clean("")` returns `"."` (current directory), and `os.ReadFile(".")` fails with "read .: is a directory". + +**Validation:** +```bash +# Empty --ca-cert triggers the bug immediately +$ podman run --rm /usr/bin/proxy-agent --proxy-server-host=127.0.0.1 --proxy-server-port=8091 +Error: failed to run proxy connection with failed to read CA cert : read .: is a directory + +# With valid CA cert, agent runs correctly (just needs server) +$ podman run --rm -v /tmp/ca.crt:/tmp/ca.crt:ro /usr/bin/proxy-agent \ + --ca-cert=/tmp/ca.crt --proxy-server-host=127.0.0.1 --proxy-server-port=8091 +E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection refused" +``` + +**Conclusion**: Unauthenticated mode is **broken upstream**. A valid CA certificate is required. + +**Issue B: Empty Bootstrap IP** + +The current PoC has `--proxy-server-host=` (empty) because `{{.BootstrapNodeIP}}` is not populated at ignition generation time: + +```bash +# From pod describe output: +Args: + --proxy-server-host= # Empty! + --proxy-server-port=8091 +``` + +**Root Cause Analysis:** + +The `BootstrapNodeIP` template variable is sourced from the `OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP` environment variable ([common.go:347-351](pkg/asset/ignition/bootstrap/common.go#L347)): + +```go +bootstrapNodeIP := os.Getenv("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP") +``` + +This environment variable is **not automatically set** by the installer for IPI (Installer Provisioned Infrastructure) deployments because: +1. Ignition configs are generated **before** the bootstrap VM is created +2. The bootstrap VM's IP is assigned dynamically by the cloud provider or DHCP +3. The IP isn't known until after Cluster API provisions the infrastructure + +**Why `api-int` DNS Cannot Be Used:** + +The initial thought was to use `api-int.` as the server host since it's a well-known DNS name. However, this approach is **NOT viable**: + +- `api-int` resolves to a **VIP** (baremetal/vSphere) or **load balancer** (cloud), NOT specifically to the bootstrap node +- On baremetal, the VIP can move to control plane nodes via VRRP before bootstrap teardown +- On cloud platforms, load balancers distribute traffic to any healthy backend, including control plane nodes +- Konnectivity agents might connect to a node that doesn't have the Konnectivity server running + +**Two Templating Layers:** + +The DaemonSet manifest in `bootkube.sh.template` uses two templating layers: +1. **Go templates** (`{{.BootstrapNodeIP}}`) - Resolved at `openshift-install create ignition-configs` time +2. **Shell variables** (`${KONNECTIVITY_IMAGE}`) - Resolved at runtime when bootkube.sh runs + +The problem is that `{{.BootstrapNodeIP}}` is resolved in Layer 1, when the IP isn't known yet. + +**Validation:** +```bash +# Empty host dials to ":8091" (no host) +$ podman run --rm -v /tmp/ca.crt:/tmp/ca.crt:ro /usr/bin/proxy-agent \ + --ca-cert=/tmp/ca.crt --proxy-server-host= --proxy-server-port=8091 +E0130 "cannot connect once" err="...dial tcp :8091: connect: connection refused" +``` + +**Solution: Runtime IP Detection** -#### Questions to Answer +Detect the bootstrap node IP at **shell runtime** using `ip route get` instead of relying on the Go template variable. This command queries the kernel routing table and returns the source IP that would be used to reach an external destination: -1. **Bootstrap TLS Assets Inventory** - - What certificates are already generated during bootstrap? - - Is there a CA that could sign Konnectivity certificates? - - Can we reuse an existing certificate (e.g., kubelet client cert, aggregator CA)? - - Where is the TLS asset generation code in the installer? (likely `pkg/asset/tls/`) +```bash +# IPv4 +BOOTSTRAP_NODE_IP=$(ip route get 1.1.1.1 2>/dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') + +# IPv6 +BOOTSTRAP_NODE_IP=$(ip -6 route get 2001:4860:4860::8888 2>/dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') +``` + +This approach: +- Works on any platform - detects the actual IP at runtime +- No dependency on DNS resolution or VIPs +- Uses reliable Linux networking commands +- Handles IPv4/IPv6 correctly via Go template conditional for the `ip` command variant + +**See Task 2.7** for the implementation details. + +##### 3. Bootstrap TLS Assets Inventory + +The installer generates ~10 Certificate Authorities in `pkg/asset/tls/`. Key CAs that could potentially be reused: + +| CA | File | Validity | Purpose | Reuse Candidate? | +|----|------|----------|---------|------------------| +| **RootCA** | `tls/root-ca.crt` | 10 years | Signs MCS, journal-gatewayd, IRI certs | ✅ Yes - general bootstrap CA | +| AdminKubeConfigSignerCertKey | `tls/admin-kubeconfig-signer.crt` | 10 years | Signs admin kubeconfig | Maybe | +| KubeletBootstrapCertSigner | `tls/kubelet-bootstrap-kubeconfig-signer.crt` | 10 years | Signs kubelet bootstrap cert | Maybe | +| AggregatorSignerCertKey | `tls/aggregator-signer.crt` | 1 day | Signs aggregator client | No (short-lived) | +| KubeAPIServerToKubeletSignerCertKey | `tls/kube-apiserver-to-kubelet-signer.crt` | 1 year | KAS→kubelet auth | No (wrong purpose) | + +**Recommended CA**: Reuse the existing **RootCA** (`/opt/openshift/tls/root-ca.crt` and `.key`) which is already available on the bootstrap node. This avoids creating new installer TLS assets while providing full mTLS support. Certificates signed by RootCA will have 1-day validity, appropriate for the temporary bootstrap phase. + +##### 4. HyperShift Konnectivity PKI Implementation + +HyperShift creates a dedicated PKI for Konnectivity in [pki/konnectivity.go](hypershift/control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go): + +| Function | CN | Purpose | Usage | +|----------|----|---------| ------| +| `ReconcileKonnectivitySignerSecret` | `konnectivity-signer` | Self-signed CA | Signs all Konnectivity certs | +| `ReconcileKonnectivityServerSecret` | `konnectivity-server-local` | Server cert for KAS endpoint | `--server-cert/key` | +| `ReconcileKonnectivityClusterSecret` | `konnectivity-server` | Server cert for agent endpoint | `--cluster-cert/key` | +| `ReconcileKonnectivityClientSecret` | `konnectivity-client` | Client cert for KAS | Used in EgressSelectorConfiguration | +| `ReconcileKonnectivityAgentSecret` | `konnectivity-agent` | Client cert for agents | `--agent-cert/key` | + +**Key insight**: All agents share a **single client certificate** (`konnectivity-agent`). Per-agent certificates are not required. + +##### 5. Certificate Distribution Mechanism (HyperShift) + +HyperShift distributes certificates via Kubernetes resources: + +1. **Konnectivity CA** → `ConfigMap` named `konnectivity-ca-bundle` in `kube-system` +2. **Agent certificate** → `Secret` named `konnectivity-agent` in `kube-system` + +The agent DaemonSet mounts these as volumes: +```yaml +volumes: +- name: agent-certs + secret: + secretName: konnectivity-agent # Contains tls.crt, tls.key +- name: konnectivity-ca + configMap: + name: konnectivity-ca-bundle # Contains ca.crt +``` + +Container mounts: +```yaml +volumeMounts: +- name: agent-certs + mountPath: /etc/konnectivity/agent +- name: konnectivity-ca + mountPath: /etc/konnectivity/ca +``` + +Agent args reference these paths: +```bash +--ca-cert=/etc/konnectivity/ca/ca.crt +--agent-cert=/etc/konnectivity/agent/tls.crt +--agent-key=/etc/konnectivity/agent/tls.key +``` + +##### 6. Minimum Viable Certificate Configuration + +For the bootstrap scenario, the minimum configuration is: -2. **HyperShift Agent Authentication Deep Dive** - - How does HyperShift generate agent certificates? - - Does each agent pod require a unique client certificate, or do all agents share one? - - What CA signs the agent certificates? - - How are certificates distributed to the DaemonSet pods? - - Review: `control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` +**Option A: Fully Unauthenticated (Current PoC)** ❌ NOT VIABLE +- Server: No `--cluster-*` flags, no `--server-*` flags (uses UDS for KAS) +- Agent: No `--ca-cert`, `--agent-cert`, `--agent-key` flags +- **Status**: ❌ **Broken due to upstream bug** - `filepath.Clean("")` returns `"."` causing immediate crash -3. **Konnectivity Server Certificate Requirements** - - What certificates does the server need for the agent-facing endpoint (port 8091)? - - Can the server use an existing bootstrap certificate for client validation? - - What is the minimum viable certificate configuration? +**Option B: Server-Only TLS (Agent validates server, no client auth)** ✅ MINIMUM VIABLE +- Server: `--cluster-cert`, `--cluster-key` (no `--cluster-ca-cert`) +- Agent: `--ca-cert` only (no `--agent-cert`, `--agent-key`) +- Agents verify the server but server accepts any agent +- **Status**: ✅ Works - validated by providing CA cert without agent certs -4. **Certificate Distribution Mechanism** - - How would certificates be mounted into the agent DaemonSet pods? - - Can we use a Kubernetes Secret created during bootstrap? - - How does HyperShift distribute certificates to agents? +**Option C: Full mTLS (Production-ready)** +- Server: `--cluster-cert`, `--cluster-key`, `--cluster-ca-cert` +- Agent: `--ca-cert`, `--agent-cert`, `--agent-key` +- Mutual authentication (recommended for production) -#### Files to Investigate +#### Recommended Implementation Approach -| Area | Files | -|------|-------| -| Installer TLS assets | `pkg/asset/tls/*.go` | -| HyperShift Konnectivity PKI | `hypershift/control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` | -| HyperShift agent deployment | `hypershift/control-plane-operator/hostedclusterconfigoperator/controllers/resources/konnectivity/reconcile.go` | -| Bootstrap certificate usage | `data/data/bootstrap/files/usr/local/bin/bootkube.sh.template` | +Generate certificates at **runtime on the bootstrap node** using the existing `RootCA`. This approach is simpler than creating new installer TLS assets and solves the "bootstrap IP not known at ignition time" problem. + +1. **Certificate generation** (in `bootkube.sh` at runtime): + - Use `openssl` to generate Konnectivity server cert (1-day validity, signed by RootCA) + - Use `openssl` to generate shared agent client cert (1-day validity, signed by RootCA) + - RootCA is already available at `/opt/openshift/tls/root-ca.{crt,key}` + +2. **Certificate deployment**: + - Server certs: Written to bootstrap node filesystem, mounted into Konnectivity server static pod + - Agent certs: Packaged into a `Secret` manifest, applied by cluster-bootstrap, mounted by DaemonSet + +3. **Runtime generation flow**: + ``` + bootkube.sh (runtime on bootstrap node) + │ + ├─→ Generate konnectivity-server.{crt,key} (signed by RootCA) + │ └─→ Mount in Konnectivity server static pod + │ + ├─→ Generate konnectivity-agent.{crt,key} (signed by RootCA) + │ └─→ Package into Secret manifest + │ + └─→ Write manifests/konnectivity-agent-certs.yaml + └─→ Applied by cluster-bootstrap alongside DaemonSet + ``` -#### Expected Outcomes +4. **Files to modify**: + | File | Purpose | + |------|---------| + | `data/data/bootstrap/files/usr/local/bin/bootkube.sh.template` | Certificate generation, server pod, agent Secret | -- Identify whether a shared agent certificate (single cert for all agents) or per-agent certificates are needed -- Determine if an existing bootstrap CA can sign Konnectivity certificates -- Define the minimum certificate configuration to get agents connecting -- Propose an implementation approach that integrates with the installer's existing TLS pipeline +5. **Benefits**: + - No new Go TLS assets required + - Certificates generated when bootstrap IP is known + - Short 1-day validity appropriate for temporary bootstrap agents + - Full mTLS authentication between server and agents + - All agents share a single client certificate + +#### Unauthenticated Mode: Validated as Broken ❌ + +Testing confirmed that unauthenticated mode does **not work** due to an upstream bug: + +```bash +# Test image extracted from OCP 4.17 release payload +IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:01745f6ea0f7763b86ee62e324402aec620bb4cd5f096a10962264cb4d68cd2d + +# Without CA cert - crashes immediately +$ podman run --rm $IMAGE /usr/bin/proxy-agent \ + --proxy-server-host=127.0.0.1 --proxy-server-port=8091 +Error: failed to run proxy connection with failed to read CA cert : read .: is a directory + +# With CA cert - runs correctly (just needs server to connect to) +$ openssl req -x509 -newkey rsa:2048 -keyout /tmp/ca.key -out /tmp/ca.crt -days 365 -nodes -subj "/CN=test-ca" +$ podman run --rm -v /tmp/ca.crt:/tmp/ca.crt:ro $IMAGE /usr/bin/proxy-agent \ + --ca-cert=/tmp/ca.crt --proxy-server-host=127.0.0.1 --proxy-server-port=8091 +E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection refused" +``` + +**Conclusion**: TLS certificates are **required** for the Konnectivity agent to function. The PoC must be updated to generate and deploy certificates. --- ## Phase 2: Implementation -### Task 2.1: Generate Konnectivity Certificates +### Task 2.1: Generate Konnectivity Certificates at Runtime -**Status**: ⏭️ Skipped (out of scope for PoC) +**Status**: ⏳ Pending -**Objective**: Create the necessary certificates for Konnectivity server and agent authentication. +**Objective**: Generate Konnectivity server and agent certificates at runtime on the bootstrap node using the existing RootCA. -**Note**: Per Task 1.6 decision, the PoC uses unauthenticated TCP connections. Certificate generation is deferred to future production-ready work. +**Approach**: Use `openssl` commands in `bootkube.sh` to generate certificates signed by `/opt/openshift/tls/root-ca.{crt,key}`. + +**Files to modify**: + +| File | Purpose | +|------|---------| +| `data/data/bootstrap/files/usr/local/bin/bootkube.sh.template` | Add certificate generation before static pod creation | + +**Implementation** (add to bootkube.sh.template before Konnectivity server static pod creation): + +```bash + # Generate Konnectivity certificates signed by RootCA (1-day validity) + KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity + mkdir -p "${KONNECTIVITY_CERT_DIR}" + + # Server certificate for agent endpoint + openssl req -new -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/server.key" \ + -out "${KONNECTIVITY_CERT_DIR}/server.csr" \ + -subj "/CN=konnectivity-server/O=openshift" + + openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/server.csr" \ + -CA /opt/openshift/tls/root-ca.crt \ + -CAkey /opt/openshift/tls/root-ca.key \ + -CAcreateserial \ + -out "${KONNECTIVITY_CERT_DIR}/server.crt" \ + -days 1 \ + -extfile <(printf "extendedKeyUsage=serverAuth\nsubjectAltName=IP:${BOOTSTRAP_NODE_IP}") + + # Agent client certificate (shared by all agents) + openssl req -new -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/agent.key" \ + -out "${KONNECTIVITY_CERT_DIR}/agent.csr" \ + -subj "/CN=konnectivity-agent/O=openshift" + + openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/agent.csr" \ + -CA /opt/openshift/tls/root-ca.crt \ + -CAkey /opt/openshift/tls/root-ca.key \ + -CAcreateserial \ + -out "${KONNECTIVITY_CERT_DIR}/agent.crt" \ + -days 1 \ + -extfile <(printf "extendedKeyUsage=clientAuth") + + # Copy CA cert to konnectivity directory for server to validate agents + cp /opt/openshift/tls/root-ca.crt "${KONNECTIVITY_CERT_DIR}/ca.crt" + + # Clean up CSR files + rm -f "${KONNECTIVITY_CERT_DIR}"/*.csr + + echo "Generated Konnectivity certificates in ${KONNECTIVITY_CERT_DIR}" +``` + +**Certificate details**: + +| File | CN | ExtKeyUsage | Validity | SAN | +|------|-------|-------------|----------|-----| +| `server.crt` | `konnectivity-server` | `serverAuth` | 1 day | `IP:${BOOTSTRAP_NODE_IP}` | +| `agent.crt` | `konnectivity-agent` | `clientAuth` | 1 day | None | +| `ca.crt` | `root-ca` | N/A | 10 years | N/A (copy of RootCA) | + +**Ordering dependency**: Certificate generation must occur **after** `BOOTSTRAP_NODE_IP` detection (Task 2.7) and **before** static pod manifest creation (Task 2.2). + +--- + +### Task 2.1.1: Create Agent Certificate Secret + +**Status**: ⏳ Pending + +**Objective**: Package the agent certificate into a Kubernetes Secret for deployment to cluster nodes. + +**Approach**: Generate the Secret manifest in `bootkube.sh` and write it to `manifests/` for cluster-bootstrap to apply. + +**Implementation** (add to bootkube.sh.template after certificate generation): + +```bash + # Create Secret manifest for Konnectivity agent certificates + cat > manifests/konnectivity-agent-certs.yaml < /etc/kubernetes/manifests/konnectivity-server-pod.yaml < manifests/konnectivity-agent-daemonset.yaml </dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') +{{- else }} + BOOTSTRAP_NODE_IP=$(ip route get 1.1.1.1 2>/dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') +{{- end }} + echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}" +``` + +Change line 665 from: +```yaml + - --proxy-server-host={{.BootstrapNodeIP}} +``` + +To: +```yaml + - --proxy-server-host=${BOOTSTRAP_NODE_IP} +``` + +**Why `api-int` DNS cannot be used**: +- `api-int.` resolves to a VIP (baremetal) or load balancer (cloud) +- The VIP can move to control plane nodes via VRRP before bootstrap teardown +- Load balancers distribute traffic to any healthy backend, including control plane nodes +- Konnectivity agents might connect to a node without the Konnectivity server + +**Verification**: + +1. Generate ignition configs with the modified template +2. Decode bootkube.sh from bootstrap.ign and verify: + - IP detection command is present with correct IPv4/IPv6 selection + - DaemonSet manifest uses `${BOOTSTRAP_NODE_IP}` shell variable +3. Deploy a test cluster and verify: + - Bootstrap node logs show detected IP + - Konnectivity agents on cluster nodes connect successfully + +--- + ## Open Questions ### Resolved @@ -1213,9 +1681,10 @@ rm -rf /tmp/konnectivity-test These should be investigated during implementation if they become blocking: -1. **Bootstrap node IP discovery** ✅ Resolved +1. **Bootstrap node IP discovery** ✅ Resolved (Updated) - **Issue**: The agent DaemonSet needs to know the bootstrap node's IP address to connect to the Konnectivity server. - - **Solution**: Use `{{.BootstrapNodeIP}}` Go template variable, which is already available in `bootstrapTemplateData` (see [common.go:94](pkg/asset/ignition/bootstrap/common.go)). This is the same variable used by `kubelet.sh.template` for the `--node-ip` flag. + - **Original Solution** ❌: Use `{{.BootstrapNodeIP}}` Go template variable. This doesn't work because the variable is only populated from the `OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP` environment variable, which isn't set for IPI deployments. + - **Updated Solution** ✅: Detect the bootstrap IP at **shell runtime** using `ip route get`. See Task 1.8 "Issue B" for root cause analysis and Task 2.7 for implementation details. 2. **KAS pod volume mount for UDS socket** ✅ Resolved - **Issue**: The bootstrap KAS pod needs access to the Konnectivity UDS socket. @@ -1312,10 +1781,11 @@ The log bundle includes: ``` 5. Deploy a test webhook in the cluster (e.g., a simple validating webhook) 6. Confirm the bootstrap KAS can reach the webhook (create a resource that triggers the webhook) -7. Complete bootstrap teardown and verify agent DaemonSet is removed: +7. Complete bootstrap teardown and verify Konnectivity resources are removed: ```bash + # Both should return "not found" after teardown kubectl get daemonset -n kube-system konnectivity-agent - # Should return "not found" after teardown + kubectl get secret -n kube-system konnectivity-agent-certs ``` 8. Confirm production KAS on cluster nodes can still reach webhooks normally @@ -1356,18 +1826,9 @@ The following items are out of scope for the initial proof-of-concept but should - Follows existing patterns (similar to other manifests in `pkg/asset/manifests/`) - Requires solving bootstrap IP availability at manifest generation time -### 3. Agent-to-Server Authentication (mTLS) +### 3. ~~Agent-to-Server Authentication (mTLS)~~ ✅ Now Implemented -**Current approach**: Agents connect to the Konnectivity server over unauthenticated TCP on port 8091. +**Status**: This is now implemented in the PoC via Task 2.1 and Task 2.1.1. -**Security risks**: -- Any process on the infrastructure network could connect to the Konnectivity server -- Potential for unauthorized tunnel access during the bootstrap window -- Does not meet production security requirements +The implementation uses the existing `RootCA` to sign both server and agent certificates at runtime on the bootstrap node. Certificates have 1-day validity, appropriate for the temporary bootstrap phase. -**Post-PoC implementation** (see Task 2.1 for deferred work): -- Generate a Konnectivity CA certificate as part of the installer's TLS asset pipeline -- Generate server certificate for the Konnectivity server (agent endpoint) -- Generate agent certificates or use a shared CA for agent authentication -- Configure mTLS on both server (`--cluster-cert`, `--cluster-key`, `--server-ca-cert`) and agents (`--ca-cert`, `--agent-cert`, `--agent-key`) -- Reference HyperShift's implementation in `control-plane-operator/controllers/hostedcontrolplane/pki/konnectivity.go` From 3674072008d4b6b584a1f8e6da2314a333599401 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 12:32:55 +0000 Subject: [PATCH 16/22] Implement potential fix for agent startup --- .../files/usr/local/bin/bootkube.sh.template | 109 +++++++++++++++++- konnectivity-plan.md | 35 ++++-- pkg/destroy/bootstrap/bootstrap.go | 8 ++ 3 files changed, 143 insertions(+), 9 deletions(-) diff --git a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template index fd30525978..77decc0348 100755 --- a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template @@ -245,6 +245,64 @@ then record_service_stage_success fi +# Detect bootstrap node IP at runtime using the default route source address +# This is needed for Konnectivity agents to connect back to the bootstrap server. +{{- if .UseIPv6ForNodeIP }} +BOOTSTRAP_NODE_IP=$(ip -6 route get 2001:4860:4860::8888 2>/dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') +{{- else }} +BOOTSTRAP_NODE_IP=$(ip route get 1.1.1.1 2>/dev/null | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1); exit}') +{{- end }} +echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}" + +# Generate Konnectivity certificates signed by RootCA (1-day validity) +# These are needed for mTLS between Konnectivity server and agents. +if [ ! -f konnectivity-certs.done ] +then + record_service_stage_start "konnectivity-certs" + echo "Generating Konnectivity certificates..." + + KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity + mkdir -p "${KONNECTIVITY_CERT_DIR}" + + # Server certificate for agent endpoint + openssl req -new -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/server.key" \ + -out "${KONNECTIVITY_CERT_DIR}/server.csr" \ + -subj "/CN=konnectivity-server/O=openshift" + + openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/server.csr" \ + -CA /opt/openshift/tls/root-ca.crt \ + -CAkey /opt/openshift/tls/root-ca.key \ + -CAcreateserial \ + -out "${KONNECTIVITY_CERT_DIR}/server.crt" \ + -days 1 \ + -extfile <(printf "extendedKeyUsage=serverAuth\nsubjectAltName=IP:${BOOTSTRAP_NODE_IP}") + + # Agent client certificate (shared by all agents) + openssl req -new -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/agent.key" \ + -out "${KONNECTIVITY_CERT_DIR}/agent.csr" \ + -subj "/CN=konnectivity-agent/O=openshift" + + openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/agent.csr" \ + -CA /opt/openshift/tls/root-ca.crt \ + -CAkey /opt/openshift/tls/root-ca.key \ + -CAcreateserial \ + -out "${KONNECTIVITY_CERT_DIR}/agent.crt" \ + -days 1 \ + -extfile <(printf "extendedKeyUsage=clientAuth") + + # Copy CA cert to konnectivity directory for server to validate agents + cp /opt/openshift/tls/root-ca.crt "${KONNECTIVITY_CERT_DIR}/ca.crt" + + # Clean up CSR files + rm -f "${KONNECTIVITY_CERT_DIR}"/*.csr + + echo "Generated Konnectivity certificates in ${KONNECTIVITY_CERT_DIR}" + touch konnectivity-certs.done + record_service_stage_success +fi + # Create Konnectivity server static pod manifest # This runs on the bootstrap node and provides a proxy for the bootstrap KAS # to reach webhooks in the cluster's pod network. @@ -272,6 +330,9 @@ spec: args: - --logtostderr=true - --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket + - --cluster-cert=/etc/konnectivity/server.crt + - --cluster-key=/etc/konnectivity/server.key + - --cluster-ca-cert=/etc/konnectivity/ca.crt - --agent-port=8091 - --health-port=2041 - --mode=http-connect @@ -287,11 +348,18 @@ spec: volumeMounts: - name: config-dir mountPath: /etc/kubernetes/bootstrap-configs + - name: konnectivity-certs + mountPath: /etc/konnectivity + readOnly: true volumes: - name: config-dir hostPath: path: /etc/kubernetes/bootstrap-configs type: DirectoryOrCreate + - name: konnectivity-certs + hostPath: + path: /opt/openshift/tls/konnectivity + type: Directory EOF touch konnectivity-server-bootstrap.done @@ -620,6 +688,34 @@ then record_service_stage_success fi +# Create Konnectivity agent certificate Secret manifest for cluster deployment +# This Secret contains the TLS certs agents use to authenticate to the server. +if [ ! -f konnectivity-agent-certs-manifest.done ] +then + record_service_stage_start "konnectivity-agent-certs-manifest" + echo "Creating Konnectivity agent certificate Secret manifest..." + + KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity + cat > manifests/konnectivity-agent-certs.yaml < Date: Fri, 30 Jan 2026 12:36:17 +0000 Subject: [PATCH 17/22] Claude: add note on OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP --- konnectivity-plan.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index a8e67a6424..848bdab3aa 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -1709,6 +1709,27 @@ These should be investigated during implementation if they become blocking: This section describes manual testing procedures for validating the Konnectivity integration on a real cluster. +### Preserving the Bootstrap Node for Debugging + +By default, the installer destroys the bootstrap node after cluster creation completes. To preserve it for debugging: + +```bash +export OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP=true +openshift-install create cluster --dir +``` + +When set, the installer will: +- Skip destroying bootstrap infrastructure +- Log a warning: "OPENSHIFT_INSTALL_PRESERVE_BOOTSTRAP is set, not destroying bootstrap resources." +- Still shut down the local CAPI control plane + +After debugging, manually destroy the bootstrap resources: +```bash +openshift-install destroy bootstrap --dir +``` + +**Warning**: Do not leave bootstrap resources running long-term. The bootstrap node has elevated privileges and should be removed once debugging is complete. + ### Obtaining the Bootstrap IP The bootstrap node's public IP can be extracted from the CAPI Machine manifest stored in `.clusterapi_output/`: From 65a327d71f337106980012d11ef7ee8892cca45a Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 14:36:37 +0000 Subject: [PATCH 18/22] Claude: Iterating on PKI --- konnectivity-plan.md | 61 +++++++++++++++++++++++++++----------------- 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 848bdab3aa..c16bd12344 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -590,7 +590,9 @@ This approach is **insecure** but acceptable for a PoC because: ##### Future Work -For production readiness, Task 2.1 (certificate generation) would need to be completed to add mTLS authentication. This is out of scope for the initial PoC. +~~For production readiness, Task 2.1 (certificate generation) would need to be completed to add mTLS authentication. This is out of scope for the initial PoC.~~ + +**Update**: mTLS authentication is now implemented via Task 2.1 using a self-signed Konnectivity CA generated at runtime. --- @@ -789,7 +791,9 @@ The installer generates ~10 Certificate Authorities in `pkg/asset/tls/`. Key CAs | AggregatorSignerCertKey | `tls/aggregator-signer.crt` | 1 day | Signs aggregator client | No (short-lived) | | KubeAPIServerToKubeletSignerCertKey | `tls/kube-apiserver-to-kubelet-signer.crt` | 1 year | KAS→kubelet auth | No (wrong purpose) | -**Recommended CA**: Reuse the existing **RootCA** (`/opt/openshift/tls/root-ca.crt` and `.key`) which is already available on the bootstrap node. This avoids creating new installer TLS assets while providing full mTLS support. Certificates signed by RootCA will have 1-day validity, appropriate for the temporary bootstrap phase. +~~**Recommended CA**: Reuse the existing **RootCA** (`/opt/openshift/tls/root-ca.crt` and `.key`) which is already available on the bootstrap node.~~ + +**Correction**: The RootCA private key (`root-ca.key`) is **NOT available on the bootstrap node** - only the certificate is copied (see [common.go:688-690](pkg/asset/ignition/bootstrap/common.go#L688-L690)). The recommended approach is to generate a **self-signed Konnectivity CA** at runtime on the bootstrap node using `openssl`. This is simpler (no Go code changes), more secure (isolated from RootCA), and allows the server certificate to include the correct bootstrap IP SAN. See Task 2.1 for implementation. ##### 4. HyperShift Konnectivity PKI Implementation @@ -861,12 +865,14 @@ For the bootstrap scenario, the minimum configuration is: #### Recommended Implementation Approach -Generate certificates at **runtime on the bootstrap node** using the existing `RootCA`. This approach is simpler than creating new installer TLS assets and solves the "bootstrap IP not known at ignition time" problem. +Generate certificates at **runtime on the bootstrap node** using a **self-signed Konnectivity CA**. This approach is simpler than creating new installer TLS assets and solves the "bootstrap IP not known at ignition time" problem. + +**Note**: The original plan recommended using RootCA, but `root-ca.key` is not available on the bootstrap node. A self-signed Konnectivity CA is more secure (isolated) and equally effective. 1. **Certificate generation** (in `bootkube.sh` at runtime): - - Use `openssl` to generate Konnectivity server cert (1-day validity, signed by RootCA) - - Use `openssl` to generate shared agent client cert (1-day validity, signed by RootCA) - - RootCA is already available at `/opt/openshift/tls/root-ca.{crt,key}` + - Use `openssl` to generate a self-signed Konnectivity CA (1-day validity) + - Use `openssl` to generate Konnectivity server cert (1-day validity, signed by Konnectivity CA) + - Use `openssl` to generate shared agent client cert (1-day validity, signed by Konnectivity CA) 2. **Certificate deployment**: - Server certs: Written to bootstrap node filesystem, mounted into Konnectivity server static pod @@ -876,10 +882,12 @@ Generate certificates at **runtime on the bootstrap node** using the existing `R ``` bootkube.sh (runtime on bootstrap node) │ - ├─→ Generate konnectivity-server.{crt,key} (signed by RootCA) + ├─→ Generate konnectivity-signer.{crt,key} (self-signed CA) + │ + ├─→ Generate konnectivity-server.{crt,key} (signed by Konnectivity CA) │ └─→ Mount in Konnectivity server static pod │ - ├─→ Generate konnectivity-agent.{crt,key} (signed by RootCA) + ├─→ Generate konnectivity-agent.{crt,key} (signed by Konnectivity CA) │ └─→ Package into Secret manifest │ └─→ Write manifests/konnectivity-agent-certs.yaml @@ -893,10 +901,11 @@ Generate certificates at **runtime on the bootstrap node** using the existing `R 5. **Benefits**: - No new Go TLS assets required - - Certificates generated when bootstrap IP is known - - Short 1-day validity appropriate for temporary bootstrap agents + - Certificates generated when bootstrap IP is known (allows correct IP SAN) + - Short 1-day validity appropriate for temporary bootstrap components - Full mTLS authentication between server and agents - All agents share a single client certificate + - Isolated from RootCA (more secure) #### Unauthenticated Mode: Validated as Broken ❌ @@ -926,11 +935,13 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection ### Task 2.1: Generate Konnectivity Certificates at Runtime -**Status**: ✅ Complete +**Status**: 🔄 In Progress -**Objective**: Generate Konnectivity server and agent certificates at runtime on the bootstrap node using the existing RootCA. +**Objective**: Generate Konnectivity server and agent certificates at runtime on the bootstrap node using a self-signed Konnectivity CA. -**Approach**: Use `openssl` commands in `bootkube.sh` to generate certificates signed by `/opt/openshift/tls/root-ca.{crt,key}`. +**Approach**: Use `openssl` commands in `bootkube.sh` to generate a self-signed Konnectivity CA, then sign the server and agent certificates with it. This approach is entirely self-contained on the bootstrap node and requires no Go code changes. + +**Why self-signed CA**: The original approach attempted to use RootCA, but `root-ca.key` is not available on the bootstrap node (only the certificate is copied). A self-signed Konnectivity CA is simpler, more secure (isolated from RootCA), and can be generated entirely at runtime. **Files to modify**: @@ -941,10 +952,17 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection **Implementation** (add to bootkube.sh.template before Konnectivity server static pod creation): ```bash - # Generate Konnectivity certificates signed by RootCA (1-day validity) + # Generate Konnectivity certificates with self-signed CA (1-day validity) KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity mkdir -p "${KONNECTIVITY_CERT_DIR}" + # Generate self-signed Konnectivity CA + openssl req -x509 -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/ca.key" \ + -out "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -days 1 \ + -subj "/CN=konnectivity-signer/O=openshift" + # Server certificate for agent endpoint openssl req -new -newkey rsa:2048 -nodes \ -keyout "${KONNECTIVITY_CERT_DIR}/server.key" \ @@ -952,8 +970,8 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection -subj "/CN=konnectivity-server/O=openshift" openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/server.csr" \ - -CA /opt/openshift/tls/root-ca.crt \ - -CAkey /opt/openshift/tls/root-ca.key \ + -CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \ -CAcreateserial \ -out "${KONNECTIVITY_CERT_DIR}/server.crt" \ -days 1 \ @@ -966,16 +984,13 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection -subj "/CN=konnectivity-agent/O=openshift" openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/agent.csr" \ - -CA /opt/openshift/tls/root-ca.crt \ - -CAkey /opt/openshift/tls/root-ca.key \ + -CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \ -CAcreateserial \ -out "${KONNECTIVITY_CERT_DIR}/agent.crt" \ -days 1 \ -extfile <(printf "extendedKeyUsage=clientAuth") - # Copy CA cert to konnectivity directory for server to validate agents - cp /opt/openshift/tls/root-ca.crt "${KONNECTIVITY_CERT_DIR}/ca.crt" - # Clean up CSR files rm -f "${KONNECTIVITY_CERT_DIR}"/*.csr @@ -986,9 +1001,9 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection | File | CN | ExtKeyUsage | Validity | SAN | |------|-------|-------------|----------|-----| +| `ca.crt` | `konnectivity-signer` | CA | 1 day | None | | `server.crt` | `konnectivity-server` | `serverAuth` | 1 day | `IP:${BOOTSTRAP_NODE_IP}` | | `agent.crt` | `konnectivity-agent` | `clientAuth` | 1 day | None | -| `ca.crt` | `root-ca` | N/A | 10 years | N/A (copy of RootCA) | **Ordering dependency**: Certificate generation must occur **after** `BOOTSTRAP_NODE_IP` detection (Task 2.7) and **before** static pod manifest creation (Task 2.2). @@ -1854,7 +1869,7 @@ The following items are out of scope for the initial proof-of-concept but should **Status**: This is now implemented in the PoC via Task 2.1 and Task 2.1.1. -The implementation uses the existing `RootCA` to sign both server and agent certificates at runtime on the bootstrap node. Certificates have 1-day validity, appropriate for the temporary bootstrap phase. +The implementation generates a self-signed Konnectivity CA at runtime on the bootstrap node, then uses it to sign both server and agent certificates. This approach is more secure than using RootCA (isolation) and simpler than creating Go assets (no code changes required). Certificates have 1-day validity, appropriate for the temporary bootstrap phase. ### 4. Bootstrap Node IP Detection Refinement From 4d487f0817de915d0ae5c4325e0af0e8ac34197c Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 14:43:34 +0000 Subject: [PATCH 19/22] Update Konnectivity PKI --- .../files/usr/local/bin/bootkube.sh.template | 23 +++++++++++-------- konnectivity-plan.md | 8 ++++++- 2 files changed, 21 insertions(+), 10 deletions(-) diff --git a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template index 77decc0348..e4c838adf5 100755 --- a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template @@ -254,8 +254,9 @@ BOOTSTRAP_NODE_IP=$(ip route get 1.1.1.1 2>/dev/null | awk '{for(i=1;i<=NF;i++) {{- end }} echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}" -# Generate Konnectivity certificates signed by RootCA (1-day validity) +# Generate Konnectivity certificates with self-signed CA (1-day validity) # These are needed for mTLS between Konnectivity server and agents. +# We use a self-signed CA because root-ca.key is not available on the bootstrap node. if [ ! -f konnectivity-certs.done ] then record_service_stage_start "konnectivity-certs" @@ -264,6 +265,13 @@ then KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity mkdir -p "${KONNECTIVITY_CERT_DIR}" + # Generate self-signed Konnectivity CA + openssl req -x509 -newkey rsa:2048 -nodes \ + -keyout "${KONNECTIVITY_CERT_DIR}/ca.key" \ + -out "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -days 1 \ + -subj "/CN=konnectivity-signer/O=openshift" + # Server certificate for agent endpoint openssl req -new -newkey rsa:2048 -nodes \ -keyout "${KONNECTIVITY_CERT_DIR}/server.key" \ @@ -271,8 +279,8 @@ then -subj "/CN=konnectivity-server/O=openshift" openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/server.csr" \ - -CA /opt/openshift/tls/root-ca.crt \ - -CAkey /opt/openshift/tls/root-ca.key \ + -CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \ -CAcreateserial \ -out "${KONNECTIVITY_CERT_DIR}/server.crt" \ -days 1 \ @@ -285,16 +293,13 @@ then -subj "/CN=konnectivity-agent/O=openshift" openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/agent.csr" \ - -CA /opt/openshift/tls/root-ca.crt \ - -CAkey /opt/openshift/tls/root-ca.key \ + -CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \ + -CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \ -CAcreateserial \ -out "${KONNECTIVITY_CERT_DIR}/agent.crt" \ -days 1 \ -extfile <(printf "extendedKeyUsage=clientAuth") - # Copy CA cert to konnectivity directory for server to validate agents - cp /opt/openshift/tls/root-ca.crt "${KONNECTIVITY_CERT_DIR}/ca.crt" - # Clean up CSR files rm -f "${KONNECTIVITY_CERT_DIR}"/*.csr @@ -709,7 +714,7 @@ type: Opaque data: tls.crt: $(base64 -w0 "${KONNECTIVITY_CERT_DIR}/agent.crt") tls.key: $(base64 -w0 "${KONNECTIVITY_CERT_DIR}/agent.key") - ca.crt: $(base64 -w0 /opt/openshift/tls/root-ca.crt) + ca.crt: $(base64 -w0 "${KONNECTIVITY_CERT_DIR}/ca.crt") EOF touch konnectivity-agent-certs-manifest.done diff --git a/konnectivity-plan.md b/konnectivity-plan.md index c16bd12344..1a09851a2c 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -935,7 +935,7 @@ E0130 "cannot connect once" err="...dial tcp 127.0.0.1:8091: connect: connection ### Task 2.1: Generate Konnectivity Certificates at Runtime -**Status**: 🔄 In Progress +**Status**: ✅ Complete **Objective**: Generate Konnectivity server and agent certificates at runtime on the bootstrap node using a self-signed Konnectivity CA. @@ -1072,6 +1072,8 @@ volumeMounts: ### Task 2.2: Deploy Konnectivity Server on Bootstrap +**Status**: ✅ Complete + **Objective**: Add a Konnectivity server to the bootstrap environment. **Approach**: Deploy as a **static pod** (consistent with other bootstrap components like KAS, etcd). @@ -1150,6 +1152,8 @@ This follows the same pattern used for other bootstrap components where images a ### Task 2.3: Configure Bootstrap KAS with EgressSelectorConfiguration +**Status**: ✅ Complete + **Objective**: Configure the bootstrap KAS to proxy cluster-bound traffic through Konnectivity. **Approach**: Use `--config-override-files` flag (validated in Task 1.1.1). @@ -1193,6 +1197,8 @@ This avoids any need for pod manifest post-processing. ### Task 2.4: Create Konnectivity Agent DaemonSet +**Status**: ✅ Complete + **Objective**: Deploy Konnectivity agents on cluster nodes to connect back to the bootstrap server. **Important**: The DaemonSet runs on **cluster nodes** (masters/workers), not the bootstrap node. The bootstrap node's kubelet runs in standalone mode and doesn't connect to the API server. The DaemonSet must be deployed into the cluster via the bootstrap KAS. From e740f6df13b26ac001936a570930a6759cd17d44 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 15:31:46 +0000 Subject: [PATCH 20/22] Fix server and agent startup --- .../bootstrap/files/usr/local/bin/bootkube.sh.template | 3 ++- pkg/asset/manifests/aws/cluster.go | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template index e4c838adf5..bea0e68fd0 100755 --- a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template @@ -334,10 +334,11 @@ spec: - /usr/bin/proxy-server args: - --logtostderr=true - - --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket - --cluster-cert=/etc/konnectivity/server.crt - --cluster-key=/etc/konnectivity/server.key - --cluster-ca-cert=/etc/konnectivity/ca.crt + - --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket + - --server-port=0 - --agent-port=8091 - --health-port=2041 - --mode=http-connect diff --git a/pkg/asset/manifests/aws/cluster.go b/pkg/asset/manifests/aws/cluster.go index 6be52b1674..c7b611631b 100644 --- a/pkg/asset/manifests/aws/cluster.go +++ b/pkg/asset/manifests/aws/cluster.go @@ -141,6 +141,13 @@ func GenerateClusterAssets(ic *installconfig.InstallConfig, clusterID *installco ToPort: 10259, SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"}, }, + { + Description: "Konnectivity agent traffic from cluster nodes", + Protocol: capa.SecurityGroupProtocolTCP, + FromPort: 8091, + ToPort: 8091, + SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"}, + }, { Description: BootstrapSSHDescription, Protocol: capa.SecurityGroupProtocolTCP, From 22c145d0e2b693708268bf24123de678937579bb Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 15:32:22 +0000 Subject: [PATCH 21/22] Claude: Add task to fix security group on other platforms --- konnectivity-plan.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/konnectivity-plan.md b/konnectivity-plan.md index 1a09851a2c..ccb3401366 100644 --- a/konnectivity-plan.md +++ b/konnectivity-plan.md @@ -1893,3 +1893,25 @@ The implementation generates a self-signed Konnectivity CA at runtime on the boo - Test behavior in air-gapped environments where external IPs are unreachable - Investigate platform-specific solutions (e.g., metadata service for cloud platforms) +### 5. Firewall Rules for Konnectivity Port on All Platforms + +**Current approach**: AWS security group rules for port 8091 were added to [pkg/asset/manifests/aws/cluster.go](pkg/asset/manifests/aws/cluster.go) in `AdditionalControlPlaneIngressRules`. This allows Konnectivity agents on cluster nodes to connect to the Konnectivity server on the bootstrap node. + +**Platforms requiring similar changes**: + +| Platform | File/Location | Mechanism | +|----------|---------------|-----------| +| AWS | `pkg/asset/manifests/aws/cluster.go` | ✅ Implemented - `AdditionalControlPlaneIngressRules` | +| Azure | `pkg/asset/manifests/azure/` | Network Security Group rules | +| GCP | `pkg/asset/manifests/gcp/` | Firewall rules | +| vSphere | N/A (typically no installer-managed firewall) | Documentation only | +| OpenStack | `pkg/asset/manifests/openstack/` | Security group rules | +| Nutanix | `pkg/asset/manifests/nutanix/` | Security group rules | +| IBM Cloud | `pkg/asset/manifests/ibmcloud/` | Security group rules | +| Bare Metal | N/A (no installer-managed firewall) | Documentation only | + +**Post-PoC implementation**: +- Add port 8091 ingress rules to each platform's infrastructure manifests +- For platforms without installer-managed firewalls (vSphere, bare metal), document the requirement in UPI documentation +- Test Konnectivity connectivity on each platform during cluster bootstrap + From 9917b66bbfd0f5c659c7e11d5213d4c689ff2d00 Mon Sep 17 00:00:00 2001 From: Matthew Booth Date: Fri, 30 Jan 2026 17:22:20 +0000 Subject: [PATCH 22/22] Fix bootstrap KAS arguments --- .../opt/openshift/egress-selector-config.yaml | 25 +++++++++++-------- .../konnectivity-config-override.yaml | 5 ++++ .../files/usr/local/bin/bootkube.sh.template | 4 ++- 3 files changed, 23 insertions(+), 11 deletions(-) create mode 100644 data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml diff --git a/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml b/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml index 544fc76bab..49332e9b47 100644 --- a/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml +++ b/data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml @@ -1,10 +1,15 @@ -apiVersion: kubecontrolplane.config.openshift.io/v1 -kind: KubeAPIServerConfig -egressSelectorConfiguration: - egressSelections: - - name: "cluster" - connection: - proxyProtocol: "HTTPConnect" - transport: - uds: - udsName: "/etc/kubernetes/bootstrap-configs/konnectivity-server.socket" +apiVersion: apiserver.k8s.io/v1beta1 +kind: EgressSelectorConfiguration +egressSelections: +- name: "cluster" + connection: + proxyProtocol: "HTTPConnect" + transport: + uds: + udsName: "/etc/kubernetes/config/konnectivity-server.socket" +- name: "controlplane" + connection: + proxyProtocol: "Direct" +- name: "etcd" + connection: + proxyProtocol: "Direct" diff --git a/data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml b/data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml new file mode 100644 index 0000000000..034779e03f --- /dev/null +++ b/data/data/bootstrap/files/opt/openshift/konnectivity-config-override.yaml @@ -0,0 +1,5 @@ +apiVersion: kubecontrolplane.config.openshift.io/v1 +kind: KubeAPIServerConfig +apiServerArguments: + egress-selector-config-file: + - "/etc/kubernetes/config/egress-selector-config.yaml" diff --git a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template index bea0e68fd0..3578770661 100755 --- a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template @@ -397,9 +397,11 @@ then --rendered-manifest-files=/assets/manifests \ --payload-version=$VERSION \ --operand-kubernetes-version="${KUBERNETES_VERSION}" \ - --config-override-files=/assets/egress-selector-config.yaml + --config-override-files=/assets/konnectivity-config-override.yaml cp kube-apiserver-bootstrap/config /etc/kubernetes/bootstrap-configs/kube-apiserver-config.yaml + # Copy egress selector config to bootstrap-configs where KAS can read it + cp /opt/openshift/egress-selector-config.yaml /etc/kubernetes/bootstrap-configs/egress-selector-config.yaml cp kube-apiserver-bootstrap/bootstrap-manifests/* bootstrap-manifests/ cp kube-apiserver-bootstrap/manifests/* manifests/