Skip to content

Comments

Follow up to enhancement for exposing hosted control plane metrics to data plane#1944

Open
csrwng wants to merge 2 commits intoopenshift:masterfrom
csrwng:hosted_cluster_metrics_followup
Open

Follow up to enhancement for exposing hosted control plane metrics to data plane#1944
csrwng wants to merge 2 commits intoopenshift:masterfrom
csrwng:hosted_cluster_metrics_followup

Conversation

@csrwng
Copy link
Contributor

@csrwng csrwng commented Feb 18, 2026

Summary

Follow-up changes to the hosted control plane metrics exposure enhancement,
addressing PR feedback.

Commit 1: Per-component label mappings and PR feedback

  • Replace blanket mTLS assumption with per-component upstream
    authentication derived from existing ServiceMonitor/PodMonitor specs
    (mTLS, server-side TLS only, or plain HTTP)
  • Add annotation-based component discovery using
    hypershift.openshift.io/metrics-* annotations on ServiceMonitors and
    PodMonitors as the opt-in mechanism and source of injected label values
  • Detail how the control-plane-operator builds the metrics proxy
    Deployment: collecting and deduplicating Secret/ConfigMap volume mounts
    from tlsConfig, and generating the proxy ConfigMap with file-path
    references
  • Update metrics forwarder to TCP proxy mode (no TLS termination),
    with end-to-end TLS between Prometheus and the metrics proxy
  • Add metrics-proxy-serving-ca ConfigMap synced to openshift-monitoring
    for Prometheus to verify the metrics proxy cert through the TCP proxy
  • Use bearerTokenFile for Prometheus authentication via the existing
    prometheus-k8s ServiceAccount token

Commit 2: Add endpoint resolver to decouple metrics proxy from management cluster RBAC

Addresses #1944 (comment)

The metrics proxy is a request-serving component exposed to the data plane
via a route. Giving it management cluster RBAC (to read Endpoints for pod
discovery) violates a security boundary — the same issue that required
putting an ignition proxy in front of the ignition server.

This commit introduces a separate endpoint resolver deployment:

  • A lightweight deployment in the HCP namespace that watches
    Endpoints/EndpointSlices and serves pod IP resolution over HTTPS
  • The metrics proxy queries the endpoint resolver instead of the kube API
    directly, so it has no management cluster RBAC
  • The endpoint resolver is internal to the HCP namespace (no route) — it
    is not reachable from the data plane
  • The CPO creates the endpoint resolver's Deployment, Service, serving
    certificate, and RBAC as part of reconciliation
  • A separate deployment (rather than an endpoint on the CPO) avoids a
    chicken-and-egg problem: the CPO creates serving certs, so it cannot
    create one for itself

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@csrwng
Copy link
Contributor Author

csrwng commented Feb 18, 2026

/test all

- Replace blanket mTLS assumption with per-component upstream
  authentication derived from existing ServiceMonitor/PodMonitor specs
  (mTLS, server-side TLS only, or plain HTTP)
- Add annotation-based component discovery using
  hypershift.openshift.io/metrics-* annotations on ServiceMonitors and
  PodMonitors as the opt-in mechanism and source of injected label values
- Detail how the control-plane-operator builds the metrics proxy
  Deployment: collecting and deduplicating Secret/ConfigMap volume mounts
  from tlsConfig, and generating the proxy ConfigMap with file-path
  references
- Update metrics forwarder to TCP proxy mode (no TLS termination),
  with end-to-end TLS between Prometheus and the metrics proxy
- Add metrics-proxy-serving-ca ConfigMap synced to openshift-monitoring
  for Prometheus to verify the metrics proxy cert through the TCP proxy
- Use bearerTokenFile for Prometheus authentication via the existing
  prometheus-k8s ServiceAccount token

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@csrwng csrwng force-pushed the hosted_cluster_metrics_followup branch from 4d3917d to 04a2f48 Compare February 19, 2026 14:28
@csrwng csrwng marked this pull request as ready for review February 19, 2026 14:29
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2026
@openshift-ci openshift-ci bot requested review from enxebre and sjenning February 19, 2026 14:30
@enxebre
Copy link
Member

enxebre commented Feb 19, 2026

I think this violates a security boundary since it's a request serving component and has rbac for management cluster. In the past we put the additional ignition proxy in front of the ignition server because of this. To avoid that we would need to decouple the server from the kube logic. The controller could push over the network the data against the server which then serves the pod monitor forwarder

…er RBAC

The metrics proxy is a request-serving component exposed to the data plane
via a route. Giving it management cluster RBAC (to read Endpoints for pod
discovery) violates a security boundary. Introduce a separate endpoint
resolver deployment that holds the RBAC and serves pod IP resolution over
HTTPS, keeping the externally-exposed metrics proxy free of management
cluster credentials.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@csrwng
Copy link
Contributor Author

csrwng commented Feb 19, 2026

Addressed in ad7e150. The metrics proxy no longer has any management cluster RBAC.

Pod endpoint discovery is moved to a separate endpoint resolver deployment in the HCP namespace. It watches Endpoints/EndpointSlices and serves pod IP resolution over HTTPS. The metrics proxy queries the endpoint resolver instead of the kube API directly.

Key properties:

  • The metrics proxy (externally exposed via route) has no management cluster RBAC
  • The endpoint resolver (has management cluster RBAC) is not exposed via a route — only reachable within the HCP namespace
  • The endpoint resolver is a separate deployment (not an endpoint on the CPO) to avoid a chicken-and-egg problem with serving certificates — the CPO creates serving certs, so it can't create one for itself

This follows the same pattern as the ignition server / ignition proxy split.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

@csrwng: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants