Add enhancement spec for centralized TLS configuration and enforcement#1910
Add enhancement spec for centralized TLS configuration and enforcement#1910richardsonnick wants to merge 23 commits intoopenshift:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
| **Rejected because**: This adds significant complexity and goes against the goal of centralized configuration. The three existing configuration points (API Server, Kubelet, Ingress) already provide sufficient granularity. | ||
|
|
||
| ## Open Questions | ||
|
|
There was a problem hiding this comment.
There were some questions raised in our sync that I think we will need to write down and think about:
- Should we focus ingress-related improvements in the Gateway API/implementation?
- What does the ingress configuration apply to? Just the reverse proxies that are in the ingress controllers/routers/gateway implementations? Or are exposed backend services expected to honor the ingress config over the "in-cluster" config? What if the backend service is exposed outside the cluster, but is also expected to be used within the cluster? And lastly, should there be any consideration for the possibility of TLS passthrough (or not) being used for the ingress/route/gateway?
|
|
||
| 1. **Fetch TLS configuration** from the appropriate central source: | ||
| - **API Server configuration** (`apiserver.config.openshift.io/cluster`) - For most components that should match the API server TLS profile | ||
| - **Kubelet configuration** - For components running on nodes that need to match node-level TLS settings |
There was a problem hiding this comment.
Can we get more details on how one should pick this flavor + some concrete examples?
Case in point: monitoring deploys the node-exporter to all (Linux) nodes via a DaemonSet to collect node metrics. Right now we rely on the API server configuration settings to configure TLS, should we rather switch to Kubelet settings?
Corollary question: what's the use case for different API and kubelet TLS settings?
There was a problem hiding this comment.
@simonpasquier did you ever manage to figure this out? In your case, I think it makes sense to follow apiservers since monitoring isn't directly aligned with the kubelet component itself.
|
|
||
| **Rejected because**: This adds significant complexity and goes against the goal of centralized configuration. The three existing configuration points (API Server, Kubelet, Ingress) already provide sufficient granularity. | ||
|
|
||
| ## Open Questions |
There was a problem hiding this comment.
From what I can tell, the proposal focuses on TLS server configuration. I assume that TLS client configuration is out of scope. Asking the question because Prometheus scrapes metrics from operators/operands via HTTPS. Is it ok to consider that the scraped endpoints apply the centralized TLS configuration and we don't need to enforce anything at the client (Prometheus) level?
We also a few components which users can configure to talk to external systems:
- Prometheus can send metrics via remote-write to HTTPS servers.
- Alertmanager can send notifications to HTTPS based API services (as well as emails via STARTTLS).
Again is it safe to assume that the users are responsible to properly configure these interfaces?
There was a problem hiding this comment.
Right now, the focus is on servers, yes. If we find clients that we ship are using old or constrained TLS implementations that don't offer to negotiate using TLS 1.3 + ML-KEM, then we will likely ask for those clients to be updated because the end result of that client/server combo will not be TLS 1.3 + ML-KEM.
e639b43 to
826d177
Compare
|
Updated this with a revised draft with up to date information from internal discussions. |
| - type: Applied | ||
| status: "True" | ||
| reason: AllComponentsUpdated | ||
| message: "All components have applied the TLS configuration" |
There was a problem hiding this comment.
who's going to update this condition?
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Reworked this EP. Removed the proposal of a new CRD for centralizing TLS configuration. The new plan is to use the existing apiserver config for all components. This reduces the complexity of adding a new CRD. |
|
Open question for discussion: What should components do if they are unable to comply with the tls settings? Is there a pattern for bubbling up configuration errors? Should the use the degraded flag? |
|
|
||
| The new `tlsAdherence` field controls how strictly the TLS configuration is enforced by components: | ||
|
|
||
| **`legacy` (default):** Provides backward-compatible behavior. Components will attempt to honor the configured TLS profile but may fall back to their individual defaults if conflicts arise. This mode is intended for clusters that need to maintain compatibility with existing configurations during migration. |
There was a problem hiding this comment.
From the user's perspective, if I set this, where do I go next to find out which components are falling back to legacy behavior?
There was a problem hiding this comment.
To make sure I'm recalling correctly, this mode being configured means "do what you've always done" and is meant to mitigate breaking changes from automatically migrating clusters from "some components respect this API for TLS configuration" to "all components respect this API for TLS configuration".
Should the guidance here instead be something along the lines of:
- IF
tlsAdherence: Legacy- IF
${component}already respects TLS profile configuration, continue to do so. - ELSE do what
${component}has always done, which is presumably that they have not respected the TLS profile and therefore should not
- IF
| **Components With Override Capability:** | ||
| - **Ingress Controller:** Retains its existing capability to define a specific `tlsSecurityProfile`. This allows Ingress to support legacy external clients (e.g., using Intermediate or Old profiles) even if the cluster internal communication is set to Modern. | ||
| - **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements. | ||
| - **Layered Products:** Layered products are expected to inherit the cluster default. Products unable to support the default (e.g., due to version incompatibility) must document their deviation and provide a justification. |
There was a problem hiding this comment.
How should a component let the user know it's not going to use the cluster default set in the API server configuration?
| tlsAdherence: strict | ||
| ``` | ||
|
|
||
| #### TLS 1.3 with Custom Ciphers (Not Recommended) |
There was a problem hiding this comment.
Just to make sure my understanding is correct - the best a cluster administrator could do if we land this enhancement without #1894 (assuming we implement consistent enforcement) would be to configure the cluster to use the Modern profile and hope clients prefer the hybrid curves (assuming the cluster admin is prioritizing defense against harvest now, decrypt later attacks).
If we land this enhancement along with #1894 - then they at least have the option force clients to use PQC-safe curves with:
apiVersion: config.openshift.io/v1
kind: APIServer
metadata:
name: cluster
spec:
tlsSecurityProfile:
type: Custom
custom:
# ciphers:
# - TLS_AES_128_GCM_SHA256 # This will be IGNORED - TLS 1.3 ciphers are hardcoded
minTLSVersion: VersionTLS13
curves:
- X25519MLKEM512
tlsAdherence: strictBut that requires 1894 to turn off the classical curves.
|
reposting my comment from #1910 (comment) I'm assuming that the proposal applies to the server part of managed components and we don't need to enforce any client TLS settings. |
|
99926c2 to
8ff168c
Compare
|
@richardsonnick can this be written down in the proposal? Maybe in the non-goals section? |
…e `tlsAdherence` field. Clarify default behavior for empty values and provide implementation notes for component developers. Update example configuration and ensure proper handling of unknown enum values.
…StrictAllComponents` modes in the centralized TLS configuration documentation. Add a summary table for better understanding of how different modes affect API servers and other components. Ensure consistency in the description of component-specific TLS configuration overrides.
|
|
||
| | Mode | API Servers (kube, openshift, oauth) | Other Components | | ||
| |------|--------------------------------------|------------------| | ||
| | `LegacyExternalAPIServerComponentsOnly` | Honor cluster-wide TLS profile | Use their individual TLS configurations | |
There was a problem hiding this comment.
what about components which already honor the cluster-wide TLS profile? I assume that they should behave the same. E.g. the Cluster monitoring operator already does for a couple of releases now.
There was a problem hiding this comment.
My understanding is that, yes they should retain the same behavior.
Looping in @joelanford to confirm
There was a problem hiding this comment.
The CLO also honor's cluster-wide TLS profile which means on upgade we could potentially revert its behavior since its not one of these Legacy components. How should layered products treat this going forward? What is their expected behavior?
There was a problem hiding this comment.
Legacy means components should retain their existing behavior. The intent is to avoid changes to the server configuration on cluster upgrade. So if components already honor the profile, they should continue honoring the profile.
@richardsonnick we should probably call that out in the EP.
There was a problem hiding this comment.
But that nuance means the name of this enum is not quite true.
@everettraven thoughts on naming for the Legacy* enum?
| - **Kubelet:** Can be overridden via `KubeletConfig` CR with its own `tlsSecurityProfile` field. If not explicitly set, inherits the APIServer TLS config. | ||
| - **Ingress Controller:** Can be overridden via `IngressController` CR with its own `tlsSecurityProfile` field. If not explicitly set, inherits the APIServer TLS config. This allows Ingress to support legacy external clients (e.g., using Intermediate or Old profiles) even when the cluster uses Modern internally. | ||
| - **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements. | ||
| - **Gateway Controller:** Will initially honor the APIServer TLS profile. Overrides may be added later if needed. |
There was a problem hiding this comment.
I also saw that changes to the TLSSecurityProfile resulted in changes to the Machine Config Operator (I think that was the one). We should call out all of the ones we know about.
And we also may want to audit the rest of the openshift github org (via github search) to see if we can find APIs outside of openshift/api that embed this type. Out-of-tree API definitions may pick this validation change up in the future (when they bump their openshift/api dep)
There was a problem hiding this comment.
Created APIs Referencing TLSSecurityProfile section that contains (a hopefully exhaustive) list of references and embeddings of the TLSSecurityProfile type
Co-authored-by: Joe Lanford <joe.lanford@gmail.com>
Co-authored-by: Joe Lanford <joe.lanford@gmail.com>
Co-authored-by: Joe Lanford <joe.lanford@gmail.com>
…e` helper function.
| approvers: | ||
| - joelanford | ||
| api-approvers: | ||
| - joelanford |
| - [`openshift/oc`](https://github.com/openshift/oc/blob/8b0a043216f7ae608606afb5bdb0ce451561021e/pkg/cli/admin/rebootmachineconfigpool/types.go#L396) | ||
| - [`openshift/cluster-logging-operator`](https://github.com/openshift/cluster-logging-operator/blob/754ef3e6d0c48470afa470092c709dc5ad094702/api/observability/v1/output_types.go#L206) - `OutputSpec.TLS.TLSSecurityProfile` | ||
| - [`openshift/lightspeed-operator`](https://github.com/openshift/lightspeed-operator/blob/122c7a163aa662c04e1cbc8272cc063ec8a4006e/api/v1alpha1/olsconfig_types.go#L205) - `OLSConfig.spec.ols.deployment.api.tlsSecurityProfile` |
There was a problem hiding this comment.
Seems like:
ocis related to kubelet, which is go-based and would need the ratcheted validation. I'm guessing this plumbs through to the kubelet config that is in-tree?cluster-logging-operatorappears to be configuration for a client TLS config, not a server, but we should get confirmation (ping @jcantrill)lightspeed-operatoris unrelated to anything else, so we definitely need to make them aware and get their feedback.
There was a problem hiding this comment.
cluster-logging-operator appears to be configuration for a client TLS config, not a server, but we should get confirmation (ping @jcantrill)
The 'server' config of ClusterLogForwarder only relies upon the global cluster setting and does not expose the capability for an admin to overwrite the global settings via CLF. Correctly, the output side of CLF does allow admin to vary from the cluster setting as is necessary to forward logs to the receivers which are under their config control
| - **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements. | ||
| - **Gateway Controller:** Will initially honor the APIServer TLS profile. Overrides may be added later if needed. | ||
|
|
||
| **APIs Referencing TLSSecurityProfile:** |
There was a problem hiding this comment.
Interestingly, I found this code, which already validates cipher suites are not configurable with TLS 1.3: https://github.com/openshift/kubernetes/blob/9d521311f5fb67dc43f49eeb728ee2c80976835a/openshift-kube-apiserver/admission/customresourcevalidation/apiserver/validate_apiserver.go#L214-L219
When does that code run?
There was a problem hiding this comment.
Looks like it should run at admission time of an APIServer config resource :) - might mean you don't need to make any validation updates to the APIServer type, but it might makes sense to move towards using CEL validations for this.
I suspect this validation logic was added as part of our patched-in admission plugin because there was no way to do this evaluation on the CRD at the time.
|
|
||
| **APIs Referencing TLSSecurityProfile:** | ||
|
|
||
| Changes to `configv1.TLSSecurityProfile` validation (e.g., TLS 1.3 cipher restrictions) will affect APIs that reference this type. Known usages include: |
There was a problem hiding this comment.
I also see:
- Plumbing for microshift ingress that likely needs to change to the new ingress type: https://github.com/openshift/microshift/blob/bda126f67a2cdf076f2b608ee9c9c9dec9b2dda2/pkg/config/ingress.go#L140-L151
- Plumbing for hypershift's hosted apiserver, where we do want to propagate our ratcheting validation
| - Examples of common override scenarios (e.g., using a less restrictive profile for Ingress to support legacy clients) | ||
| - Any limitations or caveats specific to each component's override capability | ||
|
|
||
| **Layered Products:** Layered products are expected to inherit the cluster default. Products unable to support the default (e.g., due to version incompatibility) must document their deviation and provide a justification. Any override configurations offered by layered products must be clearly documented, including the rationale for why overrides are necessary. For non-metrics or non-webhook product servers, the expectation is to fall back to the APIServer's TLS configuration; offering specific override configuration is a product team decision. |
There was a problem hiding this comment.
Layered products should only inherit the cluster default if tlsAdherence=StrictAllComponents, correct? If tlsAdherence=LegacyExternalAPIServerComponentsOnly then layered products should ignore tlsSecurityProfile?
There was a problem hiding this comment.
- If
tlsAdherence == StrictAllComponents, then layered products should inherit the cluster default. - If
tlsAdherence == LegacyExternalAPIServerComponentsOnlythen layered products should continue doing what they were before. If that means ignoring the TLS security profile then they should continue this behavior.
|
|
||
| The APIServer validation will reject configurations that attempt to set cipher suites with TLS 1.3. | ||
|
|
||
| **Curve Preferences Ordering:** Starting in Go 1.24, `CurvePreferences` semantics are changing ([golang/go#69393](https://github.com/golang/go/issues/69393)). Components should account for these changes. |
There was a problem hiding this comment.
I don't understand what's the expectation for component owners here. From what I understand, the ordering doesn't matter anymore.
There was a problem hiding this comment.
Updated the spec to be more specific here. Component owners should just apply the set of curves in the order given by the CRD regardless of go crypto/tls behavior. Cluster admins will need to be aware of this behavior when we update the documentation to add the new curves field.
There was a problem hiding this comment.
thanks, it's clear for me now!
|
@richardsonnick: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Centralized and Enforced TLS Configuration Enhancement
This enhancement proposes a mechanism to ensure all OpenShift components honor the centralized TLS security profile configuration, addressing a gap in TLS policy enforcement across the platform.
Summary
Currently, many OpenShift components hardcode TLS settings or rely on library defaults rather than respecting cluster-wide TLS configuration. This creates security vulnerabilities and blocks Post-Quantum Cryptography (PQC) readiness.
This proposal introduces a
tlsAdherencefield that allows gradual adoption of strict TLS enforcement, providing a safe upgrade path for customers.Key Features
tlsAdherencefield withLegacy(default) andStrictmodesTLSAdherenceStrictfeature gate during transitionWhy This Matters
Related Work