Skip to content

Comments

NE-2489: Gateway API without OLM followup #1938

Open
rikatz wants to merge 5 commits intoopenshift:masterfrom
rikatz:gwapi-without-olm-followup
Open

NE-2489: Gateway API without OLM followup #1938
rikatz wants to merge 5 commits intoopenshift:masterfrom
rikatz:gwapi-without-olm-followup

Conversation

@rikatz
Copy link
Member

@rikatz rikatz commented Feb 9, 2026

This PR is a followup on the EP for Gateway API without OLM.

It updates how Istio CRDs should be installed by CIO, how to deal
with the existence of OSSM installed on the cluster, what conditions
should be considered and also provides some other followup fixes on the EP

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 9, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 9, 2026

@rikatz: This pull request references ne-2470 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

* If all OLM subscriptions are removed: The cluster-ingress-operator owns them. More below.
Open Question: How do you handle fields that are extra but withouit a default.
*/
/* Todo (by Candace)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting here:

We are not just installing Gateway API. We are installing Istio, and Istio may need its own APIs. The way I see, it is comparable to something like "we have an operator XYZ that we rely just on part of the solution. Installing a partial manifest is worst then installing it fully". I understand the concern here, but as pointed during the meeting:

  • we are NOT providing Istio for our customers, but Gateway API
  • but we are providing Gateway API with Istio for layered products that are authorized to rely on Istio CRs to modify the Gateway API behavior on a supported way
  • we need a compatibility matrix, that means "works with RHOAI, RHCL, etc" which we can simply state "this is the minimum version that we support" (based on the Istioversion we are installing) and then OSSM can take over it. What we need is a signal of who installed that CRD

Also, OSSM team stated that Istio CRDs are retrocompatible and as opposite to Gateway API CRDs, they are well controller

This PR is a followup on the EP for Gateway API without OLM.

It updates how Istio CRDs should be installed by CIO, how to deal
with the existence of OSSM installed on the cluster, what conditions
should be considered and also provides some other followup fixes on the EP
@rikatz rikatz force-pushed the gwapi-without-olm-followup branch from 74a559e to 243c586 Compare February 10, 2026 22:41
@rikatz rikatz changed the title WIP: ne-2470: add followup comments NE-2489: Gateway API without OLM followup Feb 10, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 10, 2026

@rikatz: This pull request references NE-2489 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 10, 2026

@rikatz: This pull request references NE-2489 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR is a followup on the EP for Gateway API without OLM.

It updates how Istio CRDs should be installed by CIO, how to deal
with the existence of OSSM installed on the cluster, what conditions
should be considered and also provides some other followup fixes on the EP

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rikatz rikatz marked this pull request as ready for review February 10, 2026 22:42
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2026
@openshift-ci openshift-ci bot requested review from candita and frobware February 10, 2026 22:42
set from the beginning prevents this version fragmentation.

Additionally, since Istio will support resource filtering in a future release, this
approach allows layered products to simply request new resources to be added to the filter configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this invalidate the point above about layered products not requiring changes in CIO to use new Istio CRDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the point is: we do not want to have to bump the CRDs (this may mean even bumping the library, etc) but instead "we can simply change the PILOT_RESOURCE_FILTER_INCLUDE" to add this resource that we were not reconciling on pilot before.

Copy link

@aslakknutsen aslakknutsen Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikatz In this scenario, isn't it better to have the FILTER match the available CRDs if we can? It's a lot easier to detect a missing CRD for a layered product than some possible obscure bug due to a config change that was never picked up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I wouldn't like to have pilot using resources that it shouldn't for N/S traffic. Eg.: VirtualServices.

Otherwise a user that installs OSSM will end up having the N/S controller having the same capabilities, and be able to do unsupported actions

For example, if `CIO` installed only a subset of CRDs, and a user subsequently
installs OSSM and later removes the OSSM subscription, the cluster could end up
with a mix of CRD versions (some from OSSM, some from CIO). Installing the complete
set from the beginning prevents this version fragmentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes a stable set. OSSM might install a subset of what CIO could install(a later version could introduce a new type for instance). If CIO only install a complete set or nothing, we have to accept that CIO might not provide 100% of the features given the CRDs available from another source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! but is any other situation better, from a perspective of consistency? I am ok if we say "sail operator will install whatever sail operator thinks it is best for the profile", but we need to be sure to provide a status for users on how are Istio CRDs from a Gateway API perspective, and checking the consistency of istio.io group is easier than a per-CRD state.

As an example:

  • CIO installs 3 Istio CRDs - status is ManagedByCIO
  • OLM installs 4 Istio CRDs - status is ManagedByOLM
  • OLM is uninstalled - sail-operator library marks the 3 provided CRDs back with CIO label - we have an orphan CRD - I can use this information to provide it on the status saying "one CRD is unknown by CIO"

The point of the status is letting layered products and users know the state of Istio CRD and take the decision like "should RHCL be installed on this environment?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of providing status of Istio CRDs is beyond the scope. Managing Istio CRDs shouldn't become a majority of the work required for a non-OLM installation, and we should be doing the minimum required for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your cases, and yes, we only need to worry about the CRDs we can install (so if OLM installs more, or a 3rd party does, we don't know or care).

But we need to handle the mixed case... what if our label is on one CRD, but is missing on others? I assume we have to take the "worst" management for the overall status? But what do we do with the one labelled CRD? Do we update it since we own it? Even if we can't update the full set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will actually defer this question to @aslakknutsen as we are considering this to be part of the sail-operator library management.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say;

  • Try to own the complete set we know about/want
  • If we can't because someone else does; attempt to detect who is/what is happening and report that to "some" status/user. If that's OSSM/OLM, or unknown or some mixture is some what irrelevant, we can't do anything more or less in either case then report.
  • Install "istio" regardless and hope it works (the crds are pretty stable, and not needed at all by the core functionality of gateway api)

Copy link
Member Author

@rikatz rikatz Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echoing my proposal on Slack:

  • CIO passes to sail-operator the label we expect to exist on CRD
  • sail-operator is the one that takes the decision about installing/updating the CRDs. This process must always use REPLACE/UPDATE (never patch) so the label map is not merged
  • If there is no CRD, or the CRDs has the label provided by CIO, sail-operator installs the CRDs or updates based on its version
  • If there are CRDs and they have the OLM label, sail-operator does nothing with the CRDs
  • If there are CRDs and they do not have OLM label or CIO label, sail-operator does nothing with the CRDs
  • If there are CRDs and they are on partial mismatch state (eg.: some has OLM labels, some has CIO labels, some has no labels) sail-operator does nothing with CRDs
  • CIO watches a list of CRDs that sail-operator provides (this is a static list based on what sail-operator will always be installing) to know what it cares and add the conditions to gateway class

Given CIO watches the CRDs, we can call a new reconciliation and call sail-operator library in case we detect some change on the CRDs

This way we have some sort of "idempotent" reconciliation here: sail-operator cares about the whole installation but CRD installation is not added to its informer. Instead it expects to be called specifically to reconcile the CRDs


An Istio CRD can exist in one of three management states:

1. **Managed by CIO**: Contains the label `ingress.operator.openshift.io/owned`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - is the "owned" label protected by RBAC? Can someone introduce a security issue by adding this label, accidentally or maliciously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no such RBAC protection for fields, it would be the whole resource. But no, we are intentionally not protecting these CRDs.

  • In case we have CIO and users install a 3rd party Istio - CIO will add a condition to GatewayClass status letting users and layered product know that the installed Istio CRDs are not the original
  • In case we have OSSM and users install 3rd party Istio - same situation as above, saying it is not managed by a known service (CIO or OSSM)
    A user can rollback this by simply adding the label mentioned here, and the sail-operator library will reconcile the CRDs back

contain the annotation `"helm.sh/resource-policy": keep` to prevent deletion
during Helm operations.

When `CIO` installs Istio CRDs, it installs the complete set of CRDs provided by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many CRDs are there? What are the drawbacks to maintaining so many new but unused CRDs and how do we handle them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure tbh how this impacts the current proposal. We don't know the CRDs, they will be installed by sail-operator library. We know we will reconcile them and see if all of them are on the "managed by Openshift" state.

The drawback is similar to any other CRD on the cluster (like ClusterAPI CRDs, Gateway API CRDs that are there but never used by users), they are just installed CRDs.


### API Extensions

This enhancement does not introduce new CRDs or API extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add an API knob to allow the cluster admin to configure whether their cluster ingress solution was "RHCL-enabled" or "RHOAI-enabled", and not be forced to install any unneeded Istio CRDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have discussed this with @Miciah yesterday. The proposed approach here is the simpler as possible, adding a knob adds unnecessary complexity that we want to avoid.

which includes a set of Custom Resource Definitions required for Istio and its
integrations to work properly.

An Istio CRD can exist in one of three management states:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"An Istio CRD" -- This is a good way to phrase it... I assume we have to do this on a CRD by CRD basis for all Istio CRDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for the validation. But I am still unsure if I want a single condition for every Istio CRD (not possible with the current GatewayClass API) or if I want one condition on the class that represents the CRD condition with Reason like "ManagedByCIO", "ManagedByOLM", etc and add a 4th condition as "VersionMismatch" and on the message explaining that some CRDs are managed by one component, and others by another component

For example, if `CIO` installed only a subset of CRDs, and a user subsequently
installs OSSM and later removes the OSSM subscription, the cluster could end up
with a mix of CRD versions (some from OSSM, some from CIO). Installing the complete
set from the beginning prevents this version fragmentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your cases, and yes, we only need to worry about the CRDs we can install (so if OLM installs more, or a 3rd party does, we don't know or care).

But we need to handle the mixed case... what if our label is on one CRD, but is missing on others? I assume we have to take the "worst" management for the overall status? But what do we do with the one labelled CRD? Do we update it since we own it? Even if we can't update the full set?

When CRDs exist but do not contain CIO or OSSM management labels:
1. `CIO` sets the `CompatibleIstioCRDs` condition on the `GatewayClass` status to `False` with reason `UnknownManagement` to indicate the CRD management state
2. If the user removes the third-party CRDs, `CIO` will install its own CRDs and update the condition to reflect CIO management
3. Alternatively, if the user adds the `ingress.operator.openshift.io/owned` label to the existing CRDs, `CIO` will take ownership and manage them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern about incompatible CRDs? Are field removals acceptable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have answered it and Github didn't persisted, so let me try again:

On Gateway API side, I don't have many concerns on field removals. We are kind of stating "Our Gateway API ships with this version" and usually field removals will happen on downgrade.

A downgrade can happen on 2 moments:

  • OLM installs a lower version - I expect this never happens (@dgn can confirm)
  • User installs a lower version - Condition for CRDs gets into unmanaged

Based on that, layered products take the decision if they keep working or not. From our instance of istiod, if an upgrade happens with the addition of a new field nothing should happen because our version does not know that field yet (and our own products are the ones creating Istio CRs, so they should never be using fields that are not supported on that version).

If user downgrades,user is intentionally removing a field and this should be represented on layered product status.


#### GatewayClass Condition

**Note**: Adding conditions is a soft requirement for this Enhancement Proposal. While
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about this statement. It will make support and debugging harder if it is not implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it, during my discussion with @Miciah we felt this would be scope creep, but actually it does make sense for us (and it is not a huge effort) to add to GatewayClass status controller

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the revisions. Seems clear and complete. Nice work.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2026
- The CRDs were installed by a third party
- Message: Indicates the reason for the status and that the user can either remove the CRDs to allow CIO to install its own version, or add the `ingress.operator.openshift.io/owned` label to allow CIO to take ownership and manage them

* **Status: `False`, Reason: `ManagementMismatch`**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clarify what we do when we are in a mismatch state. Do we upgrade the CRDs we own, even if there are ones that we know we do not own?

Or do we say "it is dangerous to touch anything" and not update the CRDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, IMO we only touch the CRDs if they are on some known state:

  • Managed by CIO
  • No CRD present
  • Managed by OLM, but no subscription (and even this one I think it is arguable cc @aslakknutsen )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, for the later one I was willing to have a seamless migration (eg.: user upgraded to OCP 4.22, they were on OLM, they are not anymore)

The fact is: we do not remove the existing subscription. So would it make sense to tell them "hey, if you were on 4.21 and now are on 4.22, you can remove the subscription but must run this oc label crd ..." or do we expect the platform to do everything?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

@rikatz: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants