Skip to content

Comments

Add enhancement spec for centralized TLS configuration and enforcement#1910

Open
richardsonnick wants to merge 23 commits intoopenshift:masterfrom
richardsonnick:centralized-tls-config
Open

Add enhancement spec for centralized TLS configuration and enforcement#1910
richardsonnick wants to merge 23 commits intoopenshift:masterfrom
richardsonnick:centralized-tls-config

Conversation

@richardsonnick
Copy link

@richardsonnick richardsonnick commented Dec 16, 2025

Centralized and Enforced TLS Configuration Enhancement

This enhancement proposes a mechanism to ensure all OpenShift components honor the centralized TLS security profile configuration, addressing a gap in TLS policy enforcement across the platform.

Summary

Currently, many OpenShift components hardcode TLS settings or rely on library defaults rather than respecting cluster-wide TLS configuration. This creates security vulnerabilities and blocks Post-Quantum Cryptography (PQC) readiness.

This proposal introduces a tlsAdherence field that allows gradual adoption of strict TLS enforcement, providing a safe upgrade path for customers.

Key Features

  • Safe upgrade path: New tlsAdherence field with Legacy (default) and Strict modes
  • Backwards compatible: Defaults to existing behavior, requires explicit opt-in
  • Comprehensive: Applies to core platform components and layered products
  • Feature gated: Controlled by TLSAdherenceStrict feature gate during transition

Why This Matters

  • Enables consistent cryptographic policy enforcement across all components
  • Unblocks PQC readiness initiatives
  • Allows customers to enforce custom TLS profiles for compliance requirements
  • Prevents components from ignoring configured cipher/TLS version restrictions

Related Work

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@richardsonnick richardsonnick changed the title Initialize proposal Add enhancement spec for centralized TLS configuration and enforcement Dec 16, 2025
@richardsonnick richardsonnick marked this pull request as ready for review December 16, 2025 16:34
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025
@openshift-ci openshift-ci bot requested review from jotak and spadgett December 16, 2025 16:35
**Rejected because**: This adds significant complexity and goes against the goal of centralized configuration. The three existing configuration points (API Server, Kubelet, Ingress) already provide sufficient granularity.

## Open Questions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some questions raised in our sync that I think we will need to write down and think about:

  1. Should we focus ingress-related improvements in the Gateway API/implementation?
  2. What does the ingress configuration apply to? Just the reverse proxies that are in the ingress controllers/routers/gateway implementations? Or are exposed backend services expected to honor the ingress config over the "in-cluster" config? What if the backend service is exposed outside the cluster, but is also expected to be used within the cluster? And lastly, should there be any consideration for the possibility of TLS passthrough (or not) being used for the ingress/route/gateway?


1. **Fetch TLS configuration** from the appropriate central source:
- **API Server configuration** (`apiserver.config.openshift.io/cluster`) - For most components that should match the API server TLS profile
- **Kubelet configuration** - For components running on nodes that need to match node-level TLS settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get more details on how one should pick this flavor + some concrete examples?
Case in point: monitoring deploys the node-exporter to all (Linux) nodes via a DaemonSet to collect node metrics. Right now we rely on the API server configuration settings to configure TLS, should we rather switch to Kubelet settings?

Corollary question: what's the use case for different API and kubelet TLS settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier did you ever manage to figure this out? In your case, I think it makes sense to follow apiservers since monitoring isn't directly aligned with the kubelet component itself.


**Rejected because**: This adds significant complexity and goes against the goal of centralized configuration. The three existing configuration points (API Server, Kubelet, Ingress) already provide sufficient granularity.

## Open Questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, the proposal focuses on TLS server configuration. I assume that TLS client configuration is out of scope. Asking the question because Prometheus scrapes metrics from operators/operands via HTTPS. Is it ok to consider that the scraped endpoints apply the centralized TLS configuration and we don't need to enforce anything at the client (Prometheus) level?

We also a few components which users can configure to talk to external systems:

  • Prometheus can send metrics via remote-write to HTTPS servers.
  • Alertmanager can send notifications to HTTPS based API services (as well as emails via STARTTLS).
    Again is it safe to assume that the users are responsible to properly configure these interfaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the focus is on servers, yes. If we find clients that we ship are using old or constrained TLS implementations that don't offer to negotiate using TLS 1.3 + ML-KEM, then we will likely ask for those clients to be updated because the end result of that client/server combo will not be TLS 1.3 + ML-KEM.

@richardsonnick
Copy link
Author

Updated this with a revised draft with up to date information from internal discussions.

- type: Applied
status: "True"
reason: AllComponentsUpdated
message: "All components have applied the TLS configuration"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who's going to update this condition?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tkashem for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@richardsonnick
Copy link
Author

Reworked this EP. Removed the proposal of a new CRD for centralizing TLS configuration. The new plan is to use the existing apiserver config for all components. This reduces the complexity of adding a new CRD.

@richardsonnick
Copy link
Author

Open question for discussion: What should components do if they are unable to comply with the tls settings? Is there a pattern for bubbling up configuration errors? Should the use the degraded flag?


The new `tlsAdherence` field controls how strictly the TLS configuration is enforced by components:

**`legacy` (default):** Provides backward-compatible behavior. Components will attempt to honor the configured TLS profile but may fall back to their individual defaults if conflicts arise. This mode is intended for clusters that need to maintain compatibility with existing configurations during migration.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's perspective, if I set this, where do I go next to find out which components are falling back to legacy behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I'm recalling correctly, this mode being configured means "do what you've always done" and is meant to mitigate breaking changes from automatically migrating clusters from "some components respect this API for TLS configuration" to "all components respect this API for TLS configuration".

Should the guidance here instead be something along the lines of:

  • IF tlsAdherence: Legacy
    • IF ${component} already respects TLS profile configuration, continue to do so.
    • ELSE do what ${component} has always done, which is presumably that they have not respected the TLS profile and therefore should not

**Components With Override Capability:**
- **Ingress Controller:** Retains its existing capability to define a specific `tlsSecurityProfile`. This allows Ingress to support legacy external clients (e.g., using Intermediate or Old profiles) even if the cluster internal communication is set to Modern.
- **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements.
- **Layered Products:** Layered products are expected to inherit the cluster default. Products unable to support the default (e.g., due to version incompatibility) must document their deviation and provide a justification.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should a component let the user know it's not going to use the cluster default set in the API server configuration?

tlsAdherence: strict
```

#### TLS 1.3 with Custom Ciphers (Not Recommended)
Copy link

@rhmdnd rhmdnd Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure my understanding is correct - the best a cluster administrator could do if we land this enhancement without #1894 (assuming we implement consistent enforcement) would be to configure the cluster to use the Modern profile and hope clients prefer the hybrid curves (assuming the cluster admin is prioritizing defense against harvest now, decrypt later attacks).

If we land this enhancement along with #1894 - then they at least have the option force clients to use PQC-safe curves with:

apiVersion: config.openshift.io/v1
kind: APIServer
metadata:
  name: cluster
spec:
  tlsSecurityProfile:
    type: Custom
    custom:
      # ciphers:
      #   - TLS_AES_128_GCM_SHA256  # This will be IGNORED - TLS 1.3 ciphers are hardcoded
      minTLSVersion: VersionTLS13
      curves:
      - X25519MLKEM512
  tlsAdherence: strict

But that requires 1894 to turn off the classical curves.

@simonpasquier
Copy link
Contributor

reposting my comment from #1910 (comment)

I'm assuming that the proposal applies to the server part of managed components and we don't need to enforce any client TLS settings.

@richardsonnick
Copy link
Author

reposting my comment from #1910 (comment)
I'm assuming that the proposal applies to the server part of managed components and we don't need to enforce any client TLS settings.
@simonpasquier Yes that is correct

@simonpasquier
Copy link
Contributor

simonpasquier commented Jan 27, 2026

@richardsonnick can this be written down in the proposal? Maybe in the non-goals section?

…e `tlsAdherence` field. Clarify default behavior for empty values and provide implementation notes for component developers. Update example configuration and ensure proper handling of unknown enum values.
…StrictAllComponents` modes in the centralized TLS configuration documentation. Add a summary table for better understanding of how different modes affect API servers and other components. Ensure consistency in the description of component-specific TLS configuration overrides.

| Mode | API Servers (kube, openshift, oauth) | Other Components |
|------|--------------------------------------|------------------|
| `LegacyExternalAPIServerComponentsOnly` | Honor cluster-wide TLS profile | Use their individual TLS configurations |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about components which already honor the cluster-wide TLS profile? I assume that they should behave the same. E.g. the Cluster monitoring operator already does for a couple of releases now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that, yes they should retain the same behavior.
Looping in @joelanford to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLO also honor's cluster-wide TLS profile which means on upgade we could potentially revert its behavior since its not one of these Legacy components. How should layered products treat this going forward? What is their expected behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy means components should retain their existing behavior. The intent is to avoid changes to the server configuration on cluster upgrade. So if components already honor the profile, they should continue honoring the profile.

@richardsonnick we should probably call that out in the EP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that nuance means the name of this enum is not quite true.

@everettraven thoughts on naming for the Legacy* enum?

- **Kubelet:** Can be overridden via `KubeletConfig` CR with its own `tlsSecurityProfile` field. If not explicitly set, inherits the APIServer TLS config.
- **Ingress Controller:** Can be overridden via `IngressController` CR with its own `tlsSecurityProfile` field. If not explicitly set, inherits the APIServer TLS config. This allows Ingress to support legacy external clients (e.g., using Intermediate or Old profiles) even when the cluster uses Modern internally.
- **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements.
- **Gateway Controller:** Will initially honor the APIServer TLS profile. Overrides may be added later if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also saw that changes to the TLSSecurityProfile resulted in changes to the Machine Config Operator (I think that was the one). We should call out all of the ones we know about.

And we also may want to audit the rest of the openshift github org (via github search) to see if we can find APIs outside of openshift/api that embed this type. Out-of-tree API definitions may pick this validation change up in the future (when they bump their openshift/api dep)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created APIs Referencing TLSSecurityProfile section that contains (a hopefully exhaustive) list of references and embeddings of the TLSSecurityProfile type

richardsonnick and others added 2 commits February 6, 2026 15:20
Co-authored-by: Joe Lanford <joe.lanford@gmail.com>
Co-authored-by: Joe Lanford <joe.lanford@gmail.com>
approvers:
- joelanford
api-approvers:
- joelanford
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api-approvers should be @everettraven

Comment on lines +131 to +133
- [`openshift/oc`](https://github.com/openshift/oc/blob/8b0a043216f7ae608606afb5bdb0ce451561021e/pkg/cli/admin/rebootmachineconfigpool/types.go#L396)
- [`openshift/cluster-logging-operator`](https://github.com/openshift/cluster-logging-operator/blob/754ef3e6d0c48470afa470092c709dc5ad094702/api/observability/v1/output_types.go#L206) - `OutputSpec.TLS.TLSSecurityProfile`
- [`openshift/lightspeed-operator`](https://github.com/openshift/lightspeed-operator/blob/122c7a163aa662c04e1cbc8272cc063ec8a4006e/api/v1alpha1/olsconfig_types.go#L205) - `OLSConfig.spec.ols.deployment.api.tlsSecurityProfile`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like:

  • oc is related to kubelet, which is go-based and would need the ratcheted validation. I'm guessing this plumbs through to the kubelet config that is in-tree?
  • cluster-logging-operator appears to be configuration for a client TLS config, not a server, but we should get confirmation (ping @jcantrill)
  • lightspeed-operator is unrelated to anything else, so we definitely need to make them aware and get their feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-logging-operator appears to be configuration for a client TLS config, not a server, but we should get confirmation (ping @jcantrill)

The 'server' config of ClusterLogForwarder only relies upon the global cluster setting and does not expose the capability for an admin to overwrite the global settings via CLF. Correctly, the output side of CLF does allow admin to vary from the cluster setting as is necessary to forward logs to the receivers which are under their config control

- **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements.
- **Gateway Controller:** Will initially honor the APIServer TLS profile. Overrides may be added later if needed.

**APIs Referencing TLSSecurityProfile:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I found this code, which already validates cipher suites are not configurable with TLS 1.3: https://github.com/openshift/kubernetes/blob/9d521311f5fb67dc43f49eeb728ee2c80976835a/openshift-kube-apiserver/admission/customresourcevalidation/apiserver/validate_apiserver.go#L214-L219

When does that code run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should run at admission time of an APIServer config resource :) - might mean you don't need to make any validation updates to the APIServer type, but it might makes sense to move towards using CEL validations for this.

I suspect this validation logic was added as part of our patched-in admission plugin because there was no way to do this evaluation on the CRD at the time.


**APIs Referencing TLSSecurityProfile:**

Changes to `configv1.TLSSecurityProfile` validation (e.g., TLS 1.3 cipher restrictions) will affect APIs that reference this type. Known usages include:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see:

- Examples of common override scenarios (e.g., using a less restrictive profile for Ingress to support legacy clients)
- Any limitations or caveats specific to each component's override capability

**Layered Products:** Layered products are expected to inherit the cluster default. Products unable to support the default (e.g., due to version incompatibility) must document their deviation and provide a justification. Any override configurations offered by layered products must be clearly documented, including the rationale for why overrides are necessary. For non-metrics or non-webhook product servers, the expectation is to fall back to the APIServer's TLS configuration; offering specific override configuration is a product team decision.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layered products should only inherit the cluster default if tlsAdherence=StrictAllComponents, correct? If tlsAdherence=LegacyExternalAPIServerComponentsOnly then layered products should ignore tlsSecurityProfile?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If tlsAdherence == StrictAllComponents, then layered products should inherit the cluster default.
  • If tlsAdherence == LegacyExternalAPIServerComponentsOnlythen layered products should continue doing what they were before. If that means ignoring the TLS security profile then they should continue this behavior.


The APIServer validation will reject configurations that attempt to set cipher suites with TLS 1.3.

**Curve Preferences Ordering:** Starting in Go 1.24, `CurvePreferences` semantics are changing ([golang/go#69393](https://github.com/golang/go/issues/69393)). Components should account for these changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's the expectation for component owners here. From what I understand, the ordering doesn't matter anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the spec to be more specific here. Component owners should just apply the set of curves in the order given by the CRD regardless of go crypto/tls behavior. Cluster admins will need to be aware of this behavior when we update the documentation to add the new curves field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's clear for me now!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2026

@richardsonnick: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants