-
Notifications
You must be signed in to change notification settings - Fork 52
OCPCLOUD-3327: Rewrite manifests-gen to support upgrade safety #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughOperator startup now reads provider image manifests from a provider image directory (PROVIDER_IMAGE_DIR), fetches and caches provider profiles from OCI images, and passes those profiles into reconciler wiring; a new manifests generator compiles kustomize assets into typed client.Object resources and manifests/metadata files. Changes
Sequence Diagram(s)sequenceDiagram
participant Operator as Cluster-CAPI Operator
participant ProviderMgr as Provider Image Manager
participant PullSecret as PullSecret (k8s)
participant Registry as Container Registry
participant FileCache as Local provider-images dir
participant CAPI as CAPI Installer Controller
Operator->>ProviderMgr: ReadProviderImages(ctx, containerImages, providerImageDir)
ProviderMgr->>PullSecret: Read pull-secret
ProviderMgr->>ProviderMgr: Build auth keychain
par Concurrent fetch
ProviderMgr->>Registry: Fetch image (using keychain)
Registry-->>ProviderMgr: Return image layers
ProviderMgr->>FileCache: Extract /capi-operator-manifests (layers top->bottom)
ProviderMgr->>ProviderMgr: Discover profiles (metadata.yaml + manifests.yaml)
ProviderMgr->>FileCache: Write manifests with SHA256 ContentID
end
ProviderMgr-->>Operator: []ProviderImageManifests
Operator->>CAPI: Wire ProviderImages & InstallerSuccessful channel
CAPI->>CAPI: reconcileProviderImages() -> sort by InstallOrder
CAPI->>FileCache: Read manifests for provider
CAPI->>CAPI: Apply manifests (envsubst + kubernetes apply)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-ovn-techpreview |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 |
1ab7f33 to
5087e3b
Compare
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 |
|
/test e2e-aws-ovn-techpreview |
dab2c52 to
7c0b761
Compare
f3ba06c to
a8a93e3
Compare
|
/test e2e-aws-ovn-techpreview |
|
/test e2e-aws-ovn-techpreview |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
2 similar comments
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
The e2e-aws-ovn-techpreview pass shows the modifications to the capiinstaller controller are backwards compatible with current transport configmaps. openshift/cluster-api-provider-aws#584 and openshift/cluster-api#259 update CAPA and core CAPI to publish manifests only in the provider images. They do not publish transport configmaps. The multi-pr test pass with these 2 PRs shows that pulling manifests from images is working. |
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
/testwith openshift/cluster-capi-operator/main/regression-clusterinfra-aws-ipi-techpreview-capi openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-ovn-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
/testwith openshift/cluster-capi-operator/main/e2e-aws-capi-techpreview openshift/cluster-api-provider-aws#584 openshift/cluster-api#259 |
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
7c31a62 to
bdff802
Compare
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
bdff802 to
dfdbf0d
Compare
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The bump to x/tools v0.39.0 changes the behaviour of the gosec linter, which no longer complains about these valid conversions.
Adds a flag, profile-name allowing specification of a profile (and therefore subdirectory) to output manifests and metadata to.
TODO: merge into rewrite commit
When parsing images, the logic now checks for multiple 'profiles' (sub directories) under /capi-operator-manifests. If multiple are found, we return them all.
We've been referring to a slice of []providerImagesManifests as providerImages, which is misleading. With us now having profiles, this makes more sense being named 'providerProfiles', each slice is a representation of metadata and manifests for a named profile.
| Old Field | New Location | |-----------|--------------| | `ProviderName` | `Name` | | `ProviderImageRef` | `SelfImageRef` | | `ProviderType` | `Attributes["type"]` | | `ProviderVersion` | `Attributes["version"]` | | `OCPPlatform` | `OCPPlatform` (unchanged) | | (new) | `InstallOrder` | Now installs based on `InstallOrder` rather than sorting on type/version.
cea5afd to
b1ba596
Compare
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manifests-gen/customizations.go (1)
34-170: Guard against empty webhook lists / nil Service refs to avoid panics.
mwc.Webhooks[0]andvwc.Webhooks[0]assume at least one webhook with a Service target. If a provider ships a webhook config with zero webhooks or a URL-based client config, manifest generation will panic. Consider returning a clear error instead.🐛 Suggested fix (propagate an explicit error)
- serviceSecretNames := findWebhookServiceSecretName(objs) + serviceSecretNames, err := findWebhookServiceSecretName(objs) + if err != nil { + return nil, err + } -func findWebhookServiceSecretName(objs []client.Object) map[string]string { +func findWebhookServiceSecretName(objs []client.Object) (map[string]string, error) { serviceSecretNames := map[string]string{} certSecretNames := map[string]string{} @@ - serviceSecretNames[mwc.Webhooks[0].ClientConfig.Service.Name] = secretName + if len(mwc.Webhooks) == 0 || mwc.Webhooks[0].ClientConfig.Service == nil { + return nil, fmt.Errorf("mutating webhook %q has no service reference", mwc.Name) + } + serviceSecretNames[mwc.Webhooks[0].ClientConfig.Service.Name] = secretName @@ - serviceSecretNames[vwc.Webhooks[0].ClientConfig.Service.Name] = secretName + if len(vwc.Webhooks) == 0 || vwc.Webhooks[0].ClientConfig.Service == nil { + return nil, fmt.Errorf("validating webhook %q has no service reference", vwc.Name) + } + serviceSecretNames[vwc.Webhooks[0].ClientConfig.Service.Name] = secretName @@ - return serviceSecretNames + return serviceSecretNames, nil }
🤖 Fix all issues with AI agents
In `@pkg/controllers/capiinstaller/capi_installer_controller.go`:
- Around line 228-249: The deferred errors.Join call in applyProviderImage
currently never affects the returned error because the function uses an unnamed
error return; change applyProviderImage to use a named error return (e.g., err
error) and update the defer to join the reader.Close() error into that named err
(using errors.Join(err, reader.Close())) so any close failure is preserved and
returned; reference the applyProviderImage function, the deferred reader.Close()
call, and errors.Join when making the change.
🧹 Nitpick comments (3)
manifests/0000_30_cluster-api_11_deployment.yaml (1)
95-96: Consider bounding the provider-images cache size.
An unboundedemptyDircan consume node ephemeral storage if the cache grows.♻️ Suggested size limit (tune as appropriate)
- - name: provider-images - emptyDir: {} + - name: provider-images + emptyDir: + sizeLimit: 1Gipkg/providerimages/pullsecret.go (1)
29-33: Confirm behavior when the pull-secret is empty.
authn.DefaultKeychaincan pick up ambient Docker creds; if missing pull-secret should be treated as “anonymous only” or as an error, consider narrowing the fallback.💡 Optional stricter fallback
func parseDockerConfig(pullSecret []byte) (authn.Keychain, error) { if len(pullSecret) == 0 { - return authn.DefaultKeychain, nil + return anonymousKeychain{}, nil } @@ return &configFileKeychain{cf: cf}, nil } +// anonymousKeychain always resolves to anonymous auth. +type anonymousKeychain struct{} + +func (anonymousKeychain) Resolve(_ authn.Resource) (authn.Authenticator, error) { + return authn.Anonymous, nil +}manifests-gen/customizations.go (1)
30-32: Consider making the registry check configurable.
The hard-codedexpectedRegistrywill fail manifest generation for any provider image that isn’t already inregistry.ci.openshift.org. If this tool is used outside that pipeline, a flag/option or allowlist would be safer.Also applies to: 194-201
| // applyProviderImage extracts provider manifests for a single provider image and applies them to the cluster. | ||
| func (r *CapiInstallerController) applyProviderImage(ctx context.Context, log logr.Logger, providerImage providerimages.ProviderImageManifests) error { | ||
| log.Info("reconciling CAPI provider", "name", providerImage.Name) | ||
|
|
||
| reader, err := providerManifestReader(providerImage) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create provider manifest reader: %w", err) | ||
| } | ||
|
|
||
| defer func() { | ||
| err = errors.Join(err, reader.Close()) | ||
| }() | ||
|
|
||
| yamlManifests, err := extractManifests(reader) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to extract manifests from provider manifest: %w", err) | ||
| } | ||
|
|
||
| if err := r.applyProviderComponents(ctx, yamlManifests); err != nil { | ||
| return fmt.Errorf("failed to apply provider components: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close errors are currently dropped in applyProviderImage.
Because the function has an unnamed return, the deferred errors.Join doesn’t affect the returned error.
🐛 Suggested fix (use named return)
-func (r *CapiInstallerController) applyProviderImage(ctx context.Context, log logr.Logger, providerImage providerimages.ProviderImageManifests) error {
+func (r *CapiInstallerController) applyProviderImage(ctx context.Context, log logr.Logger, providerImage providerimages.ProviderImageManifests) (err error) {
log.Info("reconciling CAPI provider", "name", providerImage.Name)
reader, err := providerManifestReader(providerImage)
if err != nil {
return fmt.Errorf("failed to create provider manifest reader: %w", err)
}
defer func() {
err = errors.Join(err, reader.Close())
}()
@@
log.Info("finished reconciling CAPI provider", "name", providerImage.Name)
return nil
}🤖 Prompt for AI Agents
In `@pkg/controllers/capiinstaller/capi_installer_controller.go` around lines 228
- 249, The deferred errors.Join call in applyProviderImage currently never
affects the returned error because the function uses an unnamed error return;
change applyProviderImage to use a named error return (e.g., err error) and
update the defer to join the reader.Close() error into that named err (using
errors.Join(err, reader.Close())) so any close failure is preserved and
returned; reference the applyProviderImage function, the deferred reader.Close()
call, and errors.Join when making the change.
b1ba596 to
1b4e131
Compare
|
@mdbooth: This pull request references OCPCLOUD-3327 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-azure-ovn-techpreview e2e-azure-capi-techpreview |
|
@mdbooth: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Needs rebase/renaming potentially. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Rewrite manifests-gen to support embedding CAPI installer manifests in the provider image instead of in a transport configmap.
Update the CAPI installer controller to support the new embedded manifests in addition to the existing transport configmaps. This allows us to have a smooth transition period while we update all providers.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.