CNTRLPLANE-2640: Add HyperShift private CAPI types enhancement by csrwng · Pull Request #1927 · openshift/enhancements

csrwng · 2026-01-21T17:12:12Z

This enhancement proposes isolating HyperShift's Cluster API (CAPI)
CRDs from those installed by the OpenShift platform on management
clusters. As OpenShift evolves toward using CAPI for standalone
cluster machine management, a conflict emerges: both the platform
and HyperShift need to install CAPI CRDs on the same management
cluster, potentially with incompatible versions.

The proposal introduces two major components:

Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs
using the cluster.hypershift.openshift.io group, with an API proxy
sidecar to transparently translate between standard and private
CAPI types.
Automatic Migration: A migration controller that automatically
converts existing hosted clusters from standard to private CAPI
types without disrupting operations.

This enables independent version management for both HyperShift and
platform CAPI dependencies while maintaining transparent operation
for hosted cluster administrators and workloads.

openshift-ci · 2026-01-21T17:12:31Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-01-21T17:14:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

enhancements/hypershift/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bryan-cox · 2026-01-21T20:07:01Z

We should call out all the CAPI platforms we support: CAPA, CAPZ, CAPV, CAP-Agent, CAPG, etc.

openshift-ci-robot · 2026-01-26T20:44:11Z

@csrwng: This pull request references OCPSTRAT-2789 which is a valid jira issue.

Details

In response to this:

This enhancement proposes isolating HyperShift's Cluster API (CAPI)
CRDs from those installed by the OpenShift platform on management
clusters. As OpenShift evolves toward using CAPI for standalone
cluster machine management, a conflict emerges: both the platform
and HyperShift need to install CAPI CRDs on the same management
cluster, potentially with incompatible versions.

The proposal introduces two major components:

Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs
using the cluster.hypershift.openshift.io group, with an API proxy
sidecar to transparently translate between standard and private
CAPI types.

Automatic Migration: A migration controller that automatically
converts existing hosted clusters from standard to private CAPI
types without disrupting operations.

This enables independent version management for both HyperShift and
platform CAPI dependencies while maintaining transparent operation
for hosted cluster administrators and workloads.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-26T20:53:36Z

@csrwng: This pull request references CNTRLPLANE-2640 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This enhancement proposes isolating HyperShift's Cluster API (CAPI)
CRDs from those installed by the OpenShift platform on management
clusters. As OpenShift evolves toward using CAPI for standalone
cluster machine management, a conflict emerges: both the platform
and HyperShift need to install CAPI CRDs on the same management
cluster, potentially with incompatible versions.

The proposal introduces two major components:

Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs
using the cluster.hypershift.openshift.io group, with an API proxy
sidecar to transparently translate between standard and private
CAPI types.

Automatic Migration: A migration controller that automatically
converts existing hosted clusters from standard to private CAPI
types without disrupting operations.

This enables independent version management for both HyperShift and
platform CAPI dependencies while maintaining transparent operation
for hosted cluster administrators and workloads.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This enhancement proposes isolating HyperShift's Cluster API (CAPI) CRDs from those installed by the OpenShift platform on management clusters. As OpenShift evolves toward using CAPI for standalone cluster machine management, a conflict emerges: both the platform and HyperShift need to install CAPI CRDs on the same management cluster, potentially with incompatible versions. The proposal introduces two major components: 1. Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs using the cluster.hypershift.openshift.io group, with an API proxy sidecar to transparently translate between standard and private CAPI types. 2. Automatic Migration: A migration controller that automatically converts existing hosted clusters from standard to private CAPI types without disrupting operations. This enables independent version management for both HyperShift and platform CAPI dependencies while maintaining transparent operation for hosted cluster administrators and workloads.

openshift-ci · 2026-01-26T21:24:24Z

@csrwng: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

enxebre · 2026-01-27T11:32:54Z

enhancements/hypershift/private-capi-types.md

+
+   d. **Workload Update and Resume**:
+      - Update CAPI-dependent workload deployments to include the API proxy sidecar
+      - For deployments managed by the Control Plane Operator, update the CPO deployment to signal it should add proxy sidecars


does this imply a backport to the minimum supported hc version, so autoscaler/machine-approver get their spec updated with the proxy?

I saw that's clarified below

enxebre · 2026-01-27T11:41:02Z

enhancements/hypershift/private-capi-types.md

+Historically, HyperShift management clusters did not use Cluster API (CAPI) for their own machine management, relying instead on the OpenShift Machine API. This allowed HyperShift to install and manage its own version of CAPI CRDs, effectively owning the CAPI types on the management cluster.
+
+With OpenShift's evolution toward using CAPI for standalone cluster machine management, a critical conflict emerges: both the platform and HyperShift will need to install CAPI CRDs on the same management cluster. If these CRD versions are incompatible, neither the platform nor HyperShift can function correctly.
+


might be worth mentioning MCE which is the delivery mechanism for self hosted hcp has also a desire to handle their cluster.x-k8s.io CRDs

also standalone has a toggle to don't clobber this CRDs

enxebre · 2026-01-27T11:46:09Z

enhancements/hypershift/private-capi-types.md

+
+This enhancement introduces new CRDs that mirror the standard CAPI CRDs but use the `cluster.hypershift.openshift.io` API group:
+
+- `Cluster.cluster.hypershift.openshift.io`


also MHC and other CRDs the controllers require to work

enxebre · 2026-01-27T11:51:42Z

enhancements/hypershift/private-capi-types.md

+
+**dual operator architecture** consists of two HyperShift operator instances running simultaneously: one supporting private CAPI types (new) and one using standard CAPI types (legacy).
+
+1. The platform administrator upgrades their HyperShift operator to a version that supports the `--private-capi-types` flag and runs `hypershift install --private-capi-types`.


having this as a flag would require update the delivery mechanisms for managed and selfhosted. Does it really need to be? when would you opt-out?

enxebre · 2026-01-27T11:53:48Z

enhancements/hypershift/private-capi-types.md

+   | `hypershift.openshift.io/private-capi-types: "true"` | New Operator | Successfully migrated clusters |
+   | `hypershift.openshift.io/scope: "legacy"` | Legacy Operator | Existing clusters awaiting migration |
+   | `hypershift.openshift.io/migration-in-progress: "true"` | Migration Controller | Clusters actively being migrated (neither operator reconciles) |
+   | `hypershift.openshift.io/migration-failed` | Legacy Operator | Previous migration failed; requires SRE remediation before retry |


I think this whole process is more sensitive for self hosted in which case this all would need to be documented and would impact user directly with additional burden.

Also even though is not recommended, there's users who might be consuming Machine CRs directly specially in baremetal scenarios to .e.g. annotate next one for deletion on scale down. It would be good to collect some feedback.

enxebre · 2026-01-27T12:04:22Z

enhancements/hypershift/private-capi-types.md

+* Isolate HyperShift's CAPI CRD dependencies from the platform's CAPI CRDs by using a distinct API group (`cluster.hypershift.openshift.io`).
+* Enable HyperShift components to continue using standard CAPI client libraries without modification through a transparent API proxy.
+* Automatically migrate existing HyperShift installations to use the private CAPI types without user intervention or hosted cluster downtime.
+* Ensure zero user-facing impact - hosted cluster administrators and workloads should experience no behavioral changes.


should probably call out there's a period where ability to operate would be degraded e.g. ability to scale dataplane while the controllers are scaled down

We don't expect anyone ever does oc get machines.cluster.x-k8s.io?

enxebre · 2026-01-27T14:14:52Z

enhancements/hypershift/private-capi-types.md

+
+### Alternative 1: Coordinate CAPI Versions Between Platform and HyperShift
+
+Instead of isolating CAPI types, ensure that the platform and HyperShift always use compatible CAPI versions through tight coordination.


Coupling from standalone management clusters would be solved by the flag they provide to not clobber the CRDs. Leaving the only possible conflict with MCE. Each MCE version bundles a pinned version of hypershift. What would prevent MCE and HO from running with the same latest capi APIs release for each downstream cycle?

The mechanism we've designed was also intended to extend to MCE/others in the future so that different components could be configured to ignore CRD management while using our CompatibilityRequirement system to ensure the CRD manager doesn't install something that breaks them

The requirement from a hypershift side once standalone implements this is:

Configure our config object to tell us which CRDs you are installing

Don't try and remove an API group we still rely on

The latter creates a coupling between HyperShift and OpenShift, but, our support contracts mean we have to support different API versions for some period already, and I fear those periods will extend over time to the point where this doesn't actually impact this conversation at all

Adding on what Alberto mentioned, MCE has already aligned its CAPI version with downstream CAPI starting in OCP v4.19. It’s unclear why HyperShift cannot follow the same approach.

Regarding platform support, my understanding is that HyperShift similar to MCE targets an N-2 alignment with OCP. If that’s the case, the API support lifetime should already be covered within the existing support matrix.

Overall, the proposal provide a solution to addresses the CRD conflict issue, but the operational cost of an apiGroup proxy conversion (e.g., webhook handling, RBAC extensions, etc.) seems significant compared to simply aligning HyperShift with the downstream CAPI version, the trade-off is questionable.

One additional point related to the Integration Test section (and potentially a risk):
The test scenarios should include running standard CAPI and CAPI providers (e.g., CAPA) side-by-side with HyperShift in the same management cluster. This is already MCE use-case that we do not have a solution for it in MCE releases v2.10/v2.11.

MCE version bundles a pinned version of hypershift. What would prevent MCE and HO from running with the same latest capi APIs release for each downstream cycle?

This seems like the easy fix (but maybe hard to maintain) approach. My understanding is MCE's CAPI is aligned with OCP's integration in the latest release, but HyperShift that is bundled in MCE is a version back(for its CAPI) which is causing issues.

JoelSpeed · 2026-02-03T13:25:42Z

enhancements/hypershift/private-capi-types.md

+
+Historically, HyperShift management clusters did not use Cluster API (CAPI) for their own machine management, relying instead on the OpenShift Machine API. This allowed HyperShift to install and manage its own version of CAPI CRDs, effectively owning the CAPI types on the management cluster.
+
+With OpenShift's evolution toward using CAPI for standalone cluster machine management, a critical conflict emerges: both the platform and HyperShift will need to install CAPI CRDs on the same management cluster. If these CRD versions are incompatible, neither the platform nor HyperShift can function correctly.


How often do we expect this to really happen? What does compatible really mean?

What is the version skew that HyperShift operator supports between itself and its management cluster?

My read of this situation is that from a compatibility perpsective, suppose HyperShift operator is managing the CRD lifecycle, then the cluster CAPI system is compatible as long as hypershift is still installing the same API version, and no fields that we care about have been removed from the spec. Additional fields won't matter, validating that's tightened will be considered upstream and ratchet. And we can run in-cluster validation that validates that resources in the openshift-cluster-api namespace validate against both the original schema for that cluster and the hypershift cluster

The only major problem I see would be if hypershift operator wanted to install CRDs that did not have an API version that the cluster CAPI was relying on. I'm expecting at least some amount of carrying downstream as CAPI APIs evolve to support skip level upgrades in the future, HyperShift may end up in that same predicament independent of this requirement

JoelSpeed · 2026-02-03T13:28:05Z

enhancements/hypershift/private-capi-types.md

+
+* As a HyperShift platform engineer, I want HyperShift to use isolated CAPI CRDs, so that I can ensure compatibility between platform and HyperShift CAPI versions without coordination overhead.
+
+* As a management cluster administrator, I want to upgrade my standalone OpenShift cluster's CAPI implementation independently from HyperShift, so that I can adopt new platform features without risking HyperShift stability.


The mechanism we are building presently (which I think we have to build regardless of HyperShift adopting it or not) would be usable bi-directionally for HyperShift and OpenShift to protect their own concerns.

Realistically, we expect HyperShift operator is likely ahead of the management cluster at all times no?

JoelSpeed · 2026-02-03T13:29:36Z

enhancements/hypershift/private-capi-types.md

+
+### Goals
+
+* Isolate HyperShift's CAPI CRD dependencies from the platform's CAPI CRDs by using a distinct API group (`cluster.hypershift.openshift.io`).


What are the implications of this from an end user perspective?

For SRE, does this mean they have to rework their tooling to look at the different API groups? Do console/UIs now need to support multiple different types of machine for the long term?

JoelSpeed · 2026-02-03T13:31:07Z

enhancements/hypershift/private-capi-types.md

+* Isolate HyperShift's CAPI CRD dependencies from the platform's CAPI CRDs by using a distinct API group (`cluster.hypershift.openshift.io`).
+* Enable HyperShift components to continue using standard CAPI client libraries without modification through a transparent API proxy.
+* Automatically migrate existing HyperShift installations to use the private CAPI types without user intervention or hosted cluster downtime.
+* Ensure zero user-facing impact - hosted cluster administrators and workloads should experience no behavioral changes.


We don't expect anyone ever does oc get machines.cluster.x-k8s.io?

JoelSpeed · 2026-02-03T13:32:05Z

enhancements/hypershift/private-capi-types.md

+
+* Supporting both standard and private CAPI types simultaneously in production deployments long-term. This is a one-way migration for the entire deployment.
+* Backporting this functionality to HyperShift Operator releases prior to 4.22.
+* Making the API proxy sidecar a general-purpose, reusable component for other use cases beyond CAPI type translation.


How much knowledge about CAPI is baked into this proxy? Surely all it needs to do is translate the group of the request? It knows nothing about the actual data structures?

JoelSpeed · 2026-02-03T13:40:06Z

enhancements/hypershift/private-capi-types.md

+- Validation and defaulting webhooks follow the same pattern as standard CAPI CRDs
+- Conversion webhooks require a shim layer to translate between private and standard CAPI groups during version conversion (see Conversion Webhook Shim section)
+- The CRDs are owned and lifecycled by the HyperShift operator
+- Deleting a HostedCluster will clean up all associated private CAPI resources through standard owner reference garbage collection


The CRDs are owned by the HostedCluster? Does this imply that each cluster has its own copy of the CRD?

JoelSpeed · 2026-02-03T13:52:12Z

enhancements/hypershift/private-capi-types.md

+  - The cluster continues operating normally with standard CAPI types
+- No manual intervention is required for failed migrations
+- Alerts notify operators of migration failures for awareness and potential retry
+- The migration controller will retry failed migrations on subsequent reconciliation loops


What happens if it hits the same issue repeatedly?

JoelSpeed · 2026-02-03T13:56:54Z

enhancements/hypershift/private-capi-types.md

+
+### Alternative 1: Coordinate CAPI Versions Between Platform and HyperShift
+
+Instead of isolating CAPI types, ensure that the platform and HyperShift always use compatible CAPI versions through tight coordination.


The mechanism we've designed was also intended to extend to MCE/others in the future so that different components could be configured to ignore CRD management while using our CompatibilityRequirement system to ensure the CRD manager doesn't install something that breaks them

The requirement from a hypershift side once standalone implements this is:

Configure our config object to tell us which CRDs you are installing

Don't try and remove an API group we still rely on

The latter creates a coupling between HyperShift and OpenShift, but, our support contracts mean we have to support different API versions for some period already, and I fear those periods will extend over time to the point where this doesn't actually impact this conversation at all

JoelSpeed · 2026-02-03T13:57:56Z

enhancements/hypershift/private-capi-types.md

+
+**Why not chosen**:
+- CAPI is designed around cluster-scoped resources, and changing this would be a fundamental architecture change requiring upstream buy-in.
+- Kubernetes does not support multiple versions of the same CRD with different scopes.


I'm told that folks are keen to investigate this for the future (NamespacedCRD), but that's of now help to us right now

JoelSpeed · 2026-02-03T13:59:01Z

enhancements/hypershift/private-capi-types.md

+- Proxy failures result in CAPI operation failures but do not expose new attack vectors
+- The proxy does not process or store sensitive data beyond what is necessary for API group translation
+
+## Alternatives (Not Implemented)


You don't call out leveraging the existing mechanisms that the cluster infra team are building (designed explicitly to allow HyperShift to not worry about these issues) as an alternative, and therefore don't document why the plan is insufficient

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2026

csrwng force-pushed the capi-proxy branch from c74e8c1 to eec9314 Compare January 26, 2026 20:42

csrwng changed the title ~~WIP: proposal for isolating CAPI types in hypershift~~ OCPSTRAT-2789: Add HyperShift private CAPI types enhancement Jan 26, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 26, 2026

csrwng force-pushed the capi-proxy branch from eec9314 to d09d298 Compare January 26, 2026 20:53

csrwng changed the title ~~OCPSTRAT-2789: Add HyperShift private CAPI types enhancement~~ CNTRLPLANE-2640: Add HyperShift private CAPI types enhancement Jan 26, 2026

csrwng marked this pull request as ready for review January 26, 2026 20:54

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 26, 2026

openshift-ci bot requested review from enxebre and sjenning January 26, 2026 20:55

csrwng force-pushed the capi-proxy branch from d09d298 to 493e54f Compare January 26, 2026 21:11

enxebre reviewed Jan 27, 2026

View reviewed changes

JoelSpeed reviewed Feb 3, 2026

View reviewed changes

		Historically, HyperShift management clusters did not use Cluster API (CAPI) for their own machine management, relying instead on the OpenShift Machine API. This allowed HyperShift to install and manage its own version of CAPI CRDs, effectively owning the CAPI types on the management cluster.

		With OpenShift's evolution toward using CAPI for standalone cluster machine management, a critical conflict emerges: both the platform and HyperShift will need to install CAPI CRDs on the same management cluster. If these CRD versions are incompatible, neither the platform nor HyperShift can function correctly.


		This enhancement introduces new CRDs that mirror the standard CAPI CRDs but use the `cluster.hypershift.openshift.io` API group:

		- `Cluster.cluster.hypershift.openshift.io`


		dual operator architecture consists of two HyperShift operator instances running simultaneously: one supporting private CAPI types (new) and one using standard CAPI types (legacy).

		1. The platform administrator upgrades their HyperShift operator to a version that supports the `--private-capi-types` flag and runs `hypershift install --private-capi-types`.


		### Alternative 1: Coordinate CAPI Versions Between Platform and HyperShift

		Instead of isolating CAPI types, ensure that the platform and HyperShift always use compatible CAPI versions through tight coordination.


		* As a HyperShift platform engineer, I want HyperShift to use isolated CAPI CRDs, so that I can ensure compatibility between platform and HyperShift CAPI versions without coordination overhead.

		* As a management cluster administrator, I want to upgrade my standalone OpenShift cluster's CAPI implementation independently from HyperShift, so that I can adopt new platform features without risking HyperShift stability.


		### Goals

		* Isolate HyperShift's CAPI CRD dependencies from the platform's CAPI CRDs by using a distinct API group (`cluster.hypershift.openshift.io`).

Comments

Conversation

csrwng commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jan 21, 2026

Uh oh!

openshift-ci bot commented Jan 21, 2026

Uh oh!

bryan-cox commented Jan 21, 2026

Uh oh!

openshift-ci-robot commented Jan 26, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Jan 26, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jan 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

csrwng commented Jan 21, 2026 •

edited

Loading

openshift-ci-robot commented Jan 26, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 26, 2026 •

edited by openshift-ci bot

Loading