-
Notifications
You must be signed in to change notification settings - Fork 268
CORENET-6100: [WIP] Add support for no-overlay mode #2844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Important Review skippedIgnore keyword(s) in the title. ⛔ Ignored keywords (3)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds templated no‑overlay and managed BGP options to OVN config, FRR‑k8s OperatorPKI and readiness/bootstrap handling, Network CRD updates, dependency replace directives, MTU validation and gating logic, render signature changes, and accompanying unit tests and manifest tweaks. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Comment |
|
@ricky-rav: This pull request references CORENET-6100 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ricky-rav The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes.go (1)
99-99: Consider separating error message capitalization changes.The error message capitalization changes (converting initial capital letters to lowercase) are stylistic improvements unrelated to the NoOverlay feature. While consistency is valuable, mixing these changes with feature work:
- Makes the diff harder to review
- Increases the risk of merge conflicts
- Complicates git blame and bisecting if issues arise
Consider moving these changes to a separate commit or PR focused solely on error message style consistency.
Also applies to: 998-998, 1038-1038, 1042-1042, 1047-1047, 1051-1051, 1056-1056, 1060-1060, 1066-1066, 1071-1071, 1306-1306, 1311-1311, 1317-1317, 1352-1352, 1381-1381, 1410-1410, 2099-2099
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (31)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/config/v1/types_infrastructure.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_node.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/types_scheduling.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/console/v1/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/console/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features.mdis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/features/features.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/types_machineconfignode.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1alpha1/register.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1alpha1/types_pinnedimageset.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1alpha1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1alpha1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/machineconfiguration/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_ingress.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/types_network.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_70_network_01_networks.crd.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (4)
bindata/network/ovn-kubernetes/managed/004-config.yaml(4 hunks)bindata/network/ovn-kubernetes/self-hosted/004-config.yaml(2 hunks)go.mod(1 hunks)pkg/network/ovn_kubernetes.go(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
bindata/network/ovn-kubernetes/managed/004-config.yamlgo.modbindata/network/ovn-kubernetes/self-hosted/004-config.yamlpkg/network/ovn_kubernetes.go
🔇 Additional comments (2)
bindata/network/ovn-kubernetes/self-hosted/004-config.yaml (1)
16-18: LGTM!The template changes for NoOverlay mode configuration are well-structured. The conditional blocks properly guard the new configuration sections, and the template variables follow existing naming conventions.
Also applies to: 68-87
bindata/network/ovn-kubernetes/managed/004-config.yaml (1)
16-18: LGTM!The managed configuration changes mirror the self-hosted configuration and maintain consistency across both ConfigMap instances (for node and control plane). The template structure is correct and follows existing patterns.
Also applies to: 58-77, 121-123, 169-188
go.mod
Outdated
| sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect | ||
| ) | ||
|
|
||
| replace github.com/openshift/api => github.com/ricky-rav/api v0.0.0-20251126121923-1bc31a20c5b1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove fork dependency before merging.
This replace directive points to a personal fork (github.com/ricky-rav/api). Fork dependencies create supply chain risks, break reproducible builds, and should never be merged to production code. Before removing the [WIP] tag, ensure the necessary API changes are merged upstream to github.com/openshift/api and update this dependency to point to the official repository.
🤖 Prompt for AI Agents
In go.mod around line 178, the replace directive points to a personal fork
(github.com/ricky-rav/api) which must be removed before merging; replace that
directive with the official module (github.com/openshift/api) after the upstream
changes are merged, remove the replace line, update the module version to the
appropriate tag/commit from the official repo, then run go mod tidy and re-run
CI/tests to ensure all imports resolve and builds remain reproducible.
2ace590 to
6a82514
Compare
7859f8c to
78628d6
Compare
df96340 to
c5b171b
Compare
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pkg/network/ovn_kubernetes.go (1)
369-386: Add validation: BGPManagedConfig requires NoOverlayRouting to be "managed".When
BGPManagedConfigis specified (line 369), there's no validation thatNoOverlayRoutingis set to "managed". BGP managed configuration is cluster-wide and applies to networks using no-overlay mode with managed routing, so these settings should be consistent.Consider adding validation to ensure
NoOverlayRoutingis "managed" whenBGPManagedConfigis present:// BGP managed configuration is cluster-wide and applies to any network (default or CUDN) // using no-overlay mode with managed routing. // BGPTopology is required when BGPManagedConfig is specified. if noOverlayFeatureEnabled && c.BGPManagedConfig.BGPTopology != "" { + // Validate that routing is set to managed when BGPManagedConfig is present + if data.Data["NoOverlayRouting"] != "managed" { + return nil, progressing, fmt.Errorf("BGPManagedConfig requires NoOverlayRouting to be set to 'managed'") + } + data.Data["NoOverlayManagedEnabled"] = true klog.V(2).Infof("BGP managed configuration enabled for no-overlay mode")
🧹 Nitpick comments (1)
pkg/network/render_test.go (1)
641-649: FRR capability test adjustment looks consistent with new resourcesUpdating the FRR-capability expectation to 20 objects and wiring the new
renderAdditionalRoutingCapabilitiesparameters (passingnilclient andfakeBootstrapResult()) is coherent with the added OperatorPKI resources and conditional VWC rendering. Just ensure the implementation gracefully handles a nil client on FRR paths (the test will expose any panic here).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (18)
bindata/network/frr-k8s/003-pki.yaml(1 hunks)bindata/network/frr-k8s/frr-k8s.yaml(4 hunks)bindata/network/frr-k8s/monitor.yaml(1 hunks)bindata/network/frr-k8s/node-status-cleaner.yaml(2 hunks)bindata/network/frr-k8s/webhook.yaml(2 hunks)bindata/network/ovn-kubernetes/managed/004-config.yaml(4 hunks)bindata/network/ovn-kubernetes/self-hosted/004-config.yaml(2 hunks)manifests/0000_70_network_01_networks-CustomNoUpgrade.crd.yaml(1 hunks)manifests/0000_70_network_01_networks-Default.crd.yaml(1 hunks)manifests/0000_70_network_01_networks-DevPreviewNoUpgrade.crd.yaml(1 hunks)manifests/0000_70_network_01_networks-OKD.crd.yaml(1 hunks)manifests/0000_70_network_01_networks-TechPreviewNoUpgrade.crd.yaml(1 hunks)pkg/controller/operconfig/operconfig_controller.go(1 hunks)pkg/network/frr_readiness_test.go(1 hunks)pkg/network/ovn_kubernetes.go(14 hunks)pkg/network/ovn_kubernetes_test.go(11 hunks)pkg/network/render.go(9 hunks)pkg/network/render_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- bindata/network/ovn-kubernetes/self-hosted/004-config.yaml
- pkg/network/frr_readiness_test.go
- bindata/network/frr-k8s/frr-k8s.yaml
- pkg/controller/operconfig/operconfig_controller.go
- bindata/network/frr-k8s/node-status-cleaner.yaml
- bindata/network/frr-k8s/monitor.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
bindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/frr-k8s/webhook.yamlmanifests/0000_70_network_01_networks-DevPreviewNoUpgrade.crd.yamlpkg/network/ovn_kubernetes.gomanifests/0000_70_network_01_networks-CustomNoUpgrade.crd.yamlmanifests/0000_70_network_01_networks-OKD.crd.yamlpkg/network/render.gopkg/network/ovn_kubernetes_test.gomanifests/0000_70_network_01_networks-Default.crd.yamlbindata/network/frr-k8s/003-pki.yamlpkg/network/render_test.gomanifests/0000_70_network_01_networks-TechPreviewNoUpgrade.crd.yaml
🧬 Code graph analysis (3)
pkg/network/ovn_kubernetes.go (2)
pkg/hypershift/hypershift.go (1)
NewHyperShiftConfig(106-125)pkg/util/util.go (1)
OVN_CONTROL_PLANE(19-19)
pkg/network/render.go (2)
pkg/client/types.go (1)
Client(19-31)pkg/bootstrap/types.go (2)
BootstrapResult(97-102)APIServerDefault(157-157)
pkg/network/ovn_kubernetes_test.go (1)
pkg/network/ovn_kubernetes.go (4)
OVN_NODE_SELECTOR_DEFAULT_DPU_HOST(63-63)OVN_NODE_SELECTOR_DEFAULT_DPU(64-64)OVN_NODE_SELECTOR_DEFAULT_SMART_NIC(65-65)ValidateMTUForNoOverlay(1149-1177)
🪛 YAMLlint (1.37.1)
bindata/network/frr-k8s/webhook.yaml
[error] 13-13: syntax error: could not find expected ':'
(syntax)
🔇 Additional comments (16)
manifests/0000_70_network_01_networks-OKD.crd.yaml (1)
5-11: OKD feature‑set annotation is consistent and low‑riskAdding
release.openshift.io/feature-set: OKDto the CRD metadata cleanly gates this variant without impacting the CRD’s functional schema.manifests/0000_70_network_01_networks-CustomNoUpgrade.crd.yaml (1)
240-746: No‑overlay / BGP schema and validations look coherentThe new
defaultNetworkTransport,defaultNetworkNoOverlayOptions, andbgpManagedConfigfields, together with theirx-kubernetes-validations(immutability, required‑when‑Managed, and NoOverlay gating), form a consistent contract and should prevent misconfiguration of no‑overlay/BGP mode without adding runtime risk.manifests/0000_70_network_01_networks-DevPreviewNoUpgrade.crd.yaml (1)
1-1046: DevPreview Network CRD definition matches Default semanticsThis DevPreviewNoUpgrade Network CRD mirrors the Default variant’s schema and validations, differing only in the feature‑set annotation, which is appropriate for feature‑gated exposure.
manifests/0000_70_network_01_networks-Default.crd.yaml (1)
1-1046: Default Network CRD schema and validations look soundThe new Network CRD for the Default feature set provides a comprehensive, consistent OpenAPI schema and CEL validations (FRR/routeAdvertisements, ipForwarding, migration MTU, etc.) without introducing obvious correctness or security issues.
bindata/network/frr-k8s/webhook.yaml (1)
12-40: Conditional webhook rendering on CA bundle is appropriateGuarding the ValidatingWebhookConfiguration on
.FRRK8sWebhookCABundleand wiring that intocaBundlematches the new OperatorPKI-based flow and avoids deploying a broken webhook before CA material is ready.bindata/network/ovn-kubernetes/managed/004-config.yaml (1)
16-18: No‑overlay and BGP managed blocks are wired correctly into ovnkube.confThe new
transport,[no-overlay], and[bgp.managed]sections are conditionally rendered and symmetric between node/master configs, which matches the no-overlay design and is exercised by the new tests inovn_kubernetes_test.go. No structural or templating issues stand out.Also applies to: 58-77, 121-123, 169-188
bindata/network/frr-k8s/003-pki.yaml (1)
1-26: OperatorPKI resources correctly model webhook and metrics TLSUsing OperatorPKI for
frr-k8s-webhookandfrr-k8s-metricswith service FQDN common names is consistent with avoiding service-ca during bootstrap and with the new webhook CA-bundle handling.pkg/network/ovn_kubernetes_test.go (1)
76-84: No‑overlay feature‑gate wiring and tests are comprehensive and consistentIncluding
FeatureGateNoOverlayModein the default/known feature‑gate sets and adding the new no‑overlay tests (TestRenderOVNKubernetesNoOverlay, defaults/MTU, andTestValidateMTUForNoOverlay) gives good coverage of:
- transport selection and feature‑gate off behavior,
- rendered
[no-overlay]and[bgp.managed]sections and value casing,- MTU defaulting and validation vs host MTU.
The expectations line up with the CRD schema defaults and the
ValidateMTUForNoOverlayimplementation, and existing tests keep NoOverlay disabled where needed via custom feature‑gate sets.Also applies to: 929-936, 4483-4922
pkg/network/ovn_kubernetes.go (3)
1145-1177: LGTM: ValidateMTUForNoOverlay is well-structured.The function correctly validates MTU constraints for no-overlay mode:
- Handles nil config and unset MTU gracefully
- Skips validation when not applicable (non-no-overlay mode)
- Logs a warning instead of failing when hostMTU is unknown
- Validates that MTU doesn't exceed host MTU (correct for no-overlay mode with no encapsulation overhead)
1264-1270: LGTM: MTU calculation correctly handles no-overlay mode.The logic appropriately differentiates MTU calculation:
- No-overlay mode: Uses host MTU directly (no encapsulation overhead)
- Overlay mode: Subtracts encapsulation overhead from host MTU
This aligns with the no-overlay design where packets are not encapsulated.
2235-2244: LGTM: Safe feature gate check pattern.The helper function prevents panics by checking if a feature gate is known before calling
Enabled(). This is a good defensive programming pattern that ensures stability when new feature gates are introduced or when feature gate configurations are incomplete.pkg/network/render.go (5)
52-84: LGTM: FRR webhook readiness check is robust.The function correctly uses EndpointSlices to check webhook readiness:
- Handles errors gracefully with logging
- Checks for nil conditions before accessing Ready field
- Counts ready endpoints across all EndpointSlices
- Provides diagnostic logging for troubleshooting
This prevents race conditions where OVNK attempts to use the webhook before it's ready.
86-132: LGTM: Gating logic correctly prevents race condition.The function appropriately gates OVNK rendering to prevent race conditions:
- Only applies to fresh installs (OVNK not yet running)
- Requires all conditions to be met (FRR enabled, RouteAdvertisements enabled, webhook not ready)
- Does not interfere with upgrades or existing deployments
- Provides clear logging for operational visibility
This ensures OVNK waits for FRR webhook readiness only when necessary.
723-729: LGTM: Integration point correctly handles rendering skip.The pre-check is properly integrated:
- Placed before rendering logic to prevent unnecessary work
- Returns
progressing=trueto signal CNO to continue reconciling- Only applies to OVNKubernetes network type
- Clear comment explains the race condition being prevented
This ensures CNO retries until FRR webhook is ready before rendering OVNK.
1077-1102: LGTM: FRR rendering data correctly populated.The function signature update and data population are appropriate:
- Added
clientandbootstrapResultparameters enable necessary data fetching- API server host/port injection solves bootstrap connectivity issue (service IP not routable before CNI is running)
- Webhook CA bundle fetched via dedicated function
- Clear comment explains the bootstrap connectivity challenge
This ensures FRR pods can connect to the API server during initial deployment.
1115-1142: LGTM: CA bundle retrieval correctly handles unavailability.The function safely handles the CA bundle lifecycle:
- Returns empty string when ConfigMap or CA bundle is not yet available
- Base64 encodes the CA bundle for ValidatingWebhookConfiguration
- Provides diagnostic logging
Based on past review discussion, the webhook.yaml template conditionally renders the ValidatingWebhookConfiguration only when
FRRK8sWebhookCABundleis non-empty. This design ensures:
- Service is created immediately (outside conditional)
- ValidatingWebhookConfiguration is deferred until CA bundle is available
- No invalid webhook configuration is created
dae393f to
0171d58
Compare
|
/payload-job-with-prs periodic-ci-openshift-cluster-network-operator-master-e2e-metal-ipi-ovn-dualstack-bgp-lgw-no-overlay-test openshift/release#73335 openshift/ovn-kubernetes#2878 openshift/client-go#349 |
|
@zhaozhanqi: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
@zhaozhanqi: This PR was included in a payload test run from openshift/ovn-kubernetes#2920 |
fe7907a to
71cae67
Compare
openshift/api#2537 openshift/client-go#349 Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
When no-overlay mode is enabled for the default network: - If MTU is not specified, set it to the host MTU - If MTU is specified, validate that it does not exceed the host MTU This ensures proper MTU configuration for no-overlay mode where packets are not encapsulated and can use the full host network MTU. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
When installing a cluster with no-overlay mode and routeAdvertisements enabled, there was a race condition where OVN-Kubernetes would start before the FRR-k8s webhook was ready. This caused RouteAdvertisements validation to fail, and OVNK would eventually give up retrying before FRR became ready. This fix adds a check in renderDefaultNetwork() that skips OVNK rendering until the FRR-k8s webhook has ready endpoints. The check only applies when: - OVNK is not yet running (fresh install) - FRR provider is enabled in additionalRoutingCapabilities - RouteAdvertisements is set to Enabled When these conditions are met and FRR webhook isn't ready, CNO returns progressing=true to continue reconciling until FRR is ready. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
Add toleration for node.kubernetes.io/not-ready:NoSchedule to the frr-k8s-statuscleaner deployment. This allows the FRR webhook pod to be scheduled on nodes during cluster bootstrap when nodes have the not-ready taint because the CNI is not yet configured. Without this toleration, the pod cannot be scheduled, blocking the FRR webhook from becoming ready. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
During fresh cluster installation with no-overlay mode and RouteAdvertisements enabled, there was a circular dependency: - FRR-k8s webhook needs TLS certs from service-ca - service-ca needs a working CNI to start - CNI (OVN-K) needs FRR-k8s webhook to validate RouteAdvertisements This commit breaks the cycle by using CNO's built-in OperatorPKI for the FRR-k8s webhook certificate. OperatorPKI creates certs directly without needing service-ca or network connectivity. Changes: - Add 003-pki.yaml with OperatorPKI for frr-k8s-webhook - Update node-status-cleaner.yaml to use frr-k8s-webhook-cert secret - Update webhook.yaml to inject CA bundle from OperatorPKI ConfigMap - Update render.go to fetch and base64-encode the CA bundle Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
The FRR DaemonSet requires a TLS secret for kube-rbac-proxy to start. Previously this used service-ca, but service-ca is not available during bootstrap (it depends on CNI being ready first), causing a deadlock. This commit adds a second OperatorPKI for metrics certificates, so both webhook and metrics use OperatorPKI with no service-ca dependency during bootstrap. Changes: - Add frr-k8s-metrics OperatorPKI to 003-pki.yaml - Update frr-k8s.yaml to use frr-k8s-metrics-cert secret - Remove service-ca annotation from monitor.yaml Service - Use insecureSkipVerify in ServiceMonitor (Prometheus doesn't have OperatorPKI CA in its trust bundle, but TLS encryption is still active) Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
FRR pods use hostNetwork: true but were trying to reach the Kubernetes API at the service IP (172.30.0.1), which kubelet auto-injects as KUBERNETES_SERVICE_HOST. During bootstrap, this service IP is not routable because the CNI (OVN-K) is not running yet, creating a deadlock: CNO waits for FRR webhook -> FRR pods can't reach API at 172.30.0.1 -> Service IP needs CNI routing -> CNI waits for FRR -> DEADLOCK Add KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables to FRR pods, overriding the kubelet-injected values with the actual API server address (the API VIP, e.g., 192.168.111.5). Since FRR pods use hostNetwork, they can reach the API VIP directly via L2 without needing CNI routing, breaking the deadlock. This follows the pattern used by other CNO hostNetwork components (ovnkube-node, multus, sdn). Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
The ValidatingWebhookConfiguration requires a valid caBundle to verify the webhook's TLS certificate. The caBundle comes from a ConfigMap created by the OperatorPKI controller. On the first reconcile, the OperatorPKI CR is created but the CA ConfigMap doesn't exist yet. If we render the VWC with an empty caBundle, the API server will reject all webhook calls with: x509: certificate signed by unknown authority Fix this by only rendering the VWC when the CA bundle is available. CNO will keep reconciling, and once the OperatorPKI generates the CA ConfigMap, the next reconcile will render the VWC with the correct caBundle. Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
|
@ricky-rav: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.