feat: add initial weka cluster addmision for shared drives setup#2090
feat: add initial weka cluster addmision for shared drives setup#2090
Conversation
How to use the Graphite Merge QueueAdd the label main-merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has required the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Graphite Automations"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (01/19/26)2 reviewers were added to this PR based on Anton Bykov's automation. |
6077d6c to
fd05c68
Compare
charts/weka-operator/values.yaml
Outdated
| # Certificate configuration - certificates are generated via Helm | ||
| certDir: "/tmp/k8s-webhook-server/serving-certs" |
There was a problem hiding this comment.
how did you generate this certificate? should it be part of operator as it says here?
There was a problem hiding this comment.
we generate it via helm install:
{{- $ca := genCA "weka-operator-webhook-ca" 365 }}
{{- $cert := genSignedCert $cn nil $altNames 365 $ca }}
| // Calculate capacity split based on ratio | ||
| tlcCapacity := (cfg.ContainerCapacity * tlcRatio) / totalRatio | ||
| qlcCapacity := (cfg.ContainerCapacity * qlcRatio) / totalRatio |
There was a problem hiding this comment.
let's always use this one weka.GetTlcQlcCapacity (from api repo) to avoid remainder issue and have same calculations everywhere
| // calculateMinContainerCapacity calculates the minimum containerCapacity needed | ||
| // to achieve minCapacityForType GiB for a drive type with the given ratio. | ||
| // Formula: minContainerCapacity = minCapacityForType * totalRatio / typeRatio | ||
| func calculateMinContainerCapacity(minCapacityForType, tlcRatio, qlcRatio, typeRatio int) int { |
There was a problem hiding this comment.
these are confusing names, there are no separate qlc and tlc ratios - we have single ratio and we have the separate struct for it in api
also, minCapacityForType depends on enforceMinDrivesPerTypePerCore helm value
And in case when enforceMinDrivesPerTypePerCore is false, we don't know in advance what the "minCapacityForType" is, because minimum number of drives per type will be calculated dinamically
| tlcRatio := cfg.DriveTypesRatio.Tlc | ||
| qlcRatio := cfg.DriveTypesRatio.Qlc | ||
| totalRatio := tlcRatio + qlcRatio | ||
|
|
||
| if totalRatio <= 0 { | ||
| // Already validated by basic rules | ||
| return allErrs | ||
| } |
There was a problem hiding this comment.
here maybe let's also fix naming
e.g. tlcParts/qlcParts/totalParts
| {{- $serviceName := printf "%s-webhook-service" .Values.prefix }} | ||
| {{- $secretName := printf "%s-webhook-server-cert" .Values.prefix }} | ||
| {{- $cn := printf "%s.%s.svc" $serviceName .Release.Namespace }} | ||
| {{- $altNames := list (printf "%s.%s.svc" $serviceName .Release.Namespace) (printf "%s.%s.svc.cluster.local" $serviceName .Release.Namespace) }} | ||
| {{- $ca := genCA "weka-operator-webhook-ca" 365 }} | ||
| {{- $cert := genSignedCert $cn nil $altNames 365 $ca }} |
There was a problem hiding this comment.
Not sure about the certificate generation approach
this is what I got when chatting with Graphite a bit:
Why it's acceptable:
- Internal-only communication (API server → webhook service within cluster)
- Self-signed certificates are sufficient since the caBundle is explicitly provided to the API server
- Common pattern used by many operators (e.g., early versions of cert-manager, various operators)
Important limitations:
- Helm upgrade issue: Certificates regenerate on every helm upgrade, which can cause temporary webhook failures until pods restart with new certs
- No automatic rotation: Certs expire in 365 days with no auto-renewal mechanism
- StatefulSet/upgrade challenges: If pods don't restart after upgrade, they'll have stale certificates
Better production alternatives:
- cert-manager: Industry standard for Kubernetes certificate management with automatic rotation
- External CA: Use your organization's PKI infrastructure
- Helm lookup function: Preserve existing certificates across upgrades (prevents regeneration issue)
For a production-ready operator, consider migrating to cert-manager or implementing the lookup pattern to avoid certificate churn during upgrades.
The current approach works but may cause operational friction.
So I guess it's fine to keep it like this for now and maybe replace later with cert-manager
fd05c68 to
7fe1ec1
Compare
7fe1ec1 to
4c6d4d9
Compare
4c6d4d9 to
8f29c06
Compare

No description provided.