OPRUN-4099: OLMv1 Deployment Configuration API#1915
OPRUN-4099: OLMv1 Deployment Configuration API#1915oceanc80 wants to merge 5 commits intoopenshift:masterfrom
Conversation
|
@oceanc80: This pull request references OPRUN-4099 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
| - As a cluster extension admin, I want to attach custom storage volumes to operator pods, so that I can provide persistent storage or configuration files to operators. | ||
| - As a cluster extension admin, I want to configure pod affinity rules for operator deployments, so that I can control how operator pods are distributed across cluster nodes. | ||
| - As a cluster extension admin, I want to add custom annotations to operator deployments, so that I can integrate with monitoring and observability tools. | ||
|
|
There was a problem hiding this comment.
I wonder what the story for the Selector is. I wonder if its to handle changes in the pod label selector in the operator's controller deployment between versions (the label selector in the deployment spec is immutable). This configuration could provide upgrade continuity across this type of breaking change.
There was a problem hiding this comment.
I could also see it being used for blue/green deployments or other similar deployment strategies.
There was a problem hiding this comment.
The selector of a deployment is immutable, iirc. I dove into the history of this field, and it looks like it was basically there from the beginning with no real explanation that I could find, and it has never been honored as far as I can tell.
Chalk it up to how fast and loose the early days of OLM were.
|
@oceanc80 I know the PR is in WIP, and I also am not sure if the following comment is the scope of this EP. If the following comment is not the scope of this EP or it is not correct time to raise the comments, you could ignore the following the comment: The deploymentConfig API design looks well thought out. I noticed that the proposal currently focuses on initial installation scenarios, and I was wondering if you could clarify the behavior for runtime configuration updates, which I expect will be a common operational workflow. Question 1: Modifying Existing deploymentConfig Values Scenario: After creating a ClusterExtension with deploymentConfig, a user wants to update some values (e.g., changing memory limits from 256Mi to 512Mi, or adding a new nodeSelector). Could you clarify:
Example: # Initial configuration
deploymentConfig:
resources:
limits:
memory: "256Mi"
# User updates to:
deploymentConfig:
resources:
limits:
memory: "512Mi" # ← modified
nodeSelector: # ← added
infrastructure: "dedicated"Question 2: Adding deploymentConfig After Creation Scenario: A user creates a ClusterExtension without defining deploymentConfig initially, then later wants to add deployment configuration. Could you clarify:
Example: # Initial creation (no deploymentConfig)
apiVersion: olm.operatorframework.io/v1
kind: ClusterExtension
metadata:
name: my-operator
spec:
source:
sourceType: Catalog
catalog:
packageName: my-operator
# Note: no config.inline.deploymentConfig
---
# Later, user adds deploymentConfig
spec:
config:
inline:
deploymentConfig: # ← newly added
nodeSelector:
infrastructure: "dedicated" |
static at runtime, dynamic at build time
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@kuiwang02 let me try to reply to your questions
From the perspective of OLMv1, bundle configuration is opaque. It will take user input, validated it against the configuration schema provided by the bundle, and apply it to generate the final manifests. So, any configuration can
Yes. The Deployment will be regenerated with the new values and applied to the cluster.
I'd say so, yes. Any changes to the pod template should trigger a new replicaset and the deployment will transition towards that.
This is a good question. I know there are fields in the Deployment spec that are immutable (e.g. the label selector). That the only one I can think of. I think the configuration options under the deployment config are mutable.
Yes. For the same reasons in Q1.1
Yes.
Yes.
I don't think so.
Then we are back to the Deployment spec defined in the bundle by the author. The mental model here is really no different than:
AFAIK only the pod label selector is immutable once set. |
@perdasilva Thanks for your great reply. I got it. |
|
@JoelSpeed PTAL, thanks! |
|
@oceanc80: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| ## Proposal | ||
|
|
||
| This proposal extends the existing ClusterExtension inline configuration structure to support deployment customization by using [OLMv0's](https://github.com/operator-framework/api/blob/master/pkg/operators/v1alpha1/subscription_types.go#L42-L100) `v1alpha1.SubscriptionConfig`. For clarity in the OLMv1 codebase, a type alias, `DeploymentConfig`, will be introduced, since Subscription is a v0 concept only. OLMv1 will directly use the `SubscriptionConfig` type from operator-framework/api to define the `DeploymentConfig` type. This ensures feature parity with OLMv0 by using the same type definition and reduces our maintenance overhead while we navigate a period of simultaneous maintenance of both OLMv0 and OLMv1. |
There was a problem hiding this comment.
Sharing types between APIs like this is generally discouraged. Someone may make a change to the SubscriptionConfig that you didn't realise and then you end up either supporting (or breaking) functionality inadvertently in the DeploymentConfig usage.
Is the code shared between v0 and v1 for this part of the deployment strategy, or are you just trying to make sure you support the same set of customizations? Is there anything in the alpha API that you might want to make better from lessons learned as you promote this to v1?
There was a problem hiding this comment.
@JoelSpeed we had this discussion upstream where I'd proposed a new, completely separate structure for v1, but it was vetoed in favor of keeping v0 and v1 in sync.
There was a problem hiding this comment.
The purpose of this enhancement is to provide a feature that is needed for the migration pathway from OLMv0-installed operators to OLMv1 ClusterExtensions. We explicitly need all subscription.spec.config fields and behaviors to be supported in OLMv1 to ensure migration from OLMv0 to OLMv1 is not blocked on a diff between the configuration APIs.
The idea here is that a migration tool would be able to take the subscription.spec.config and copy it into the clusterextension.spec.config.inline.deploymentConfig.
So one way or another the OLMv1 bundle-schema for deploymentConfig needs to include everything that OLMv0 includes for equivalent OCP versions.
|
|
||
| #### Validation Failure | ||
|
|
||
| If the deployment configuration fails JSON schema validation: |
There was a problem hiding this comment.
What do you mean by json schema validation here?
There was a problem hiding this comment.
| 1. The ClusterExtension controller rejects the configuration during admission or runtime validation | ||
| 2. The ClusterExtension status is updated with a detailed error message indicating which fields failed validation | ||
| 3. The cluster extension admin corrects the configuration based on the error message | ||
| 4. The controller retries the installation with the corrected configuration |
There was a problem hiding this comment.
Why can't you reject invalid configuration at admission time instead of leaving a controller to check this asynchronously?
There was a problem hiding this comment.
Hey Joel, this enhancement is related to this previous one, where all of this was discussed. The TL;DR is, OLMv1 aims to get out of the bundle configuration game, therefore there won't but a one-size-fits-all configuration schema to validate against at ingress time. We want to shift the configuration surface definition to the authors and allow them to provide content-specific knobs. OLMv1 will only validate configuration against a bundle provided schema.
|
|
||
| ### API Extensions | ||
|
|
||
| The enhancement does not introduce new APIs, CRDs, webhooks, or aggregated API servers. As the inline configuration structure in the ClusterExtension API accepts any valid JSON object, the API will not be changed. This enhancement modifies the existing configuration schema for registry+v1 bundles to accept a `deploymentConfig` field. |
There was a problem hiding this comment.
Yes it does? You're adding a new field with significant structure to the ClusterExtension API as far as I can tell from reading this?
Even if this is a new structure within something that is effectively unstructured data, you are still introducing a new API contract, this should be explained here and reviewed
Can you link to the existing schema and show where it is updated to allow this deploymentConfig field?
There was a problem hiding this comment.
The CRD doesn't change. inline remains as is:
There are two lenses to this (and I feel like a lot of this was already discussed in the original EP):
- OLMv1 conceptually claims no control over schemas provided by bundles. Bundles define the schemas of their APIs, OLM validates the provided configurations using the bundle-provided schema.
- Technically, the registry+v1 schema definition is driven by two things:
- The install modes of the operator (this dictates the schema of the watch namespace configuration)
- OLMv0's subscription.spec.config
So this EP is not technically a new contract. It is establishing the OLMv1 implementation of an existing contract (OLMv0's subscription.spec.config)
Follow-up question: Is OCP okay with every single layered product being able to define their own configuration schema in the future? If so, will every layered product be expected to bring their configuration schema to OpenShift's API review? This API change is essentially the equivalent that.
If that seems untenable, perhaps we should drop the configuration API from OLMv1 entirely and require that layered products have a one-size-fits-all deployment (similar to core payload operators). But that would have major implications on migration from OLMv0 to OLMv1 that PM would have to weigh in on.
| ### API Extensions | ||
|
|
||
| The enhancement does not introduce new APIs, CRDs, webhooks, or aggregated API servers. As the inline configuration structure in the ClusterExtension API accepts any valid JSON object, the API will not be changed. This enhancement modifies the existing configuration schema for registry+v1 bundles to accept a `deploymentConfig` field. | ||
| The configuration is validated using JSON schema generated from Kubernetes core v1 and apps v1 OpenAPI specifications. |
There was a problem hiding this comment.
These generated schemas are not particularly reliable sources of information, why can you not validate using the internal validation packages within these APIs?
There was a problem hiding this comment.
They are reliable sources of information in the sense that they're generated from the source itself. ref: operator-framework/operator-controller#2454
I'm not sure what you mean by internal validation packages. I'm guessing the PR makes things more clear?
| } | ||
| ``` | ||
|
|
||
| The `Selector` field in the `SubscriptionConfig` is present but is not ever extracted or used by OLMv0. OLMv1 will maintain this behavior so the field will be accepted but ignored. |
There was a problem hiding this comment.
Accepting but ignoring a field is bad practice. Why not create a new type for the deployment config? It doesn't look like it'll be particularly complex to implement
There was a problem hiding this comment.
Same answer as above:
we had this discussion upstream where I'd proposed a new, completely separate structure for v1, but it was vetoed in favor of keeping v0 and v1 in sync.
It was discussed that reusing the v0 structure would mean carrying over debts, but the cost of it was assessed to be acceptable for long term maintainability
There was a problem hiding this comment.
I agree with @JoelSpeed 's point here. Even if we re-use the type, we are also in control of the schema generation for that type, right? So at a minimum we could specifically exclude that field from the generated schema.
| The inline configuration will be validated using JSON schema-based validation. The JSON schema for `DeploymentConfig` will be generated by introspecting the `v1alpha1.SubscriptionConfig` struct. This approach: | ||
|
|
||
| - **Ensures parity with OLMv0**: The schema is derived from the exact same type definition used in OLMv0 | ||
| - **Automatic updates**: When the github.com/operator-framework/api dependency is updated, the schema generation can be re-run to incorporate any new fields |
There was a problem hiding this comment.
How do you know that the schema generated by your v1 tooling is the same as the v0 tooling? What if there's drift between the generators?
There was a problem hiding this comment.
The nature of the generators in v0 and v1 are completely different. In v0 they were part of the API surface, in v1 they are not. ie different validation structure in v1 and v0, but both are tied together by the underlying SubscritionConfig structure.
And the v1 generator uses SubscriptionConfig struct to generate the schema.
In other words, this is why it was decided to use the v0 SubscripitonConfig directly, so that any drift in v0, automatically gets tracked in v1.
Note that the schema is not expected to change in v1 (thereby requiring a tracking back of change to v0), as this effort is solely to support old registry+v1 bundles in v1 so that they can be migrated over.
| curl -sSL https://raw.githubusercontent.com/kubernetes/kubernetes/refs/tags/$(OPENAPI_VERSION)/api/openapi-spec/v3/apis__apps__v1_openapi.json > ./pkg/schema/apis__apps__v1_openapi.json | ||
| ``` | ||
|
|
||
| This ensures that while the first iteration uses a static snapshot, there is an established, low-effort path to update the schema manifests whenever the project's Kubernetes dependencies are bumped. |
There was a problem hiding this comment.
Will you have checks in place so that people keep this up to date?
There was a problem hiding this comment.
Yes upstream runs verify every time and this check has been included in that.
| ## Open Questions / Considerations | ||
|
|
||
| ### Track changes to underlying kubernetes corev1 structures? | ||
| SubscriptionConfig uses many kubernetes corev1 structures from the standard kube lib. This means that the OLMv0 Subscription API would track changes to those structures (e.g. if a new Volume type is added to the API etc.). We need to think about whether we want the same behavior here, and if so how we'd like to implement it. E.g. we could have some process downloading and mining the openapi specs for the given kube lib version we have in go.mod, and having make verify fail when that changes. We'd want to think about how we'd handle any CEL expressions in those corev1 structures when doing the validation (and whether we want to handle them?). |
There was a problem hiding this comment.
Are you doing any processing of these fields, or just setting them directly on the deployment that you're rendering and applying? If you aren't processing them and are just passing them through, then this is probably fine
There was a problem hiding this comment.
Just passing them through without any processing. See https://docs.google.com/document/d/18O4qBvu5I4WIJgo5KU1opyUKcrfgk64xsI3tyXxmVEU/edit?tab=t.0#bookmark=kix.1y89z5sf1akf for more details.
Enhancement extending OLMv1's ClusterExtension API to support deployment configuration in order to provide feature parity with OLMv0's SubscriptionConfig.