Skip to content

feat: add initial weka cluster addmision for shared drives setup#2090

Open
assafgi wants to merge 1 commit intomainfrom
01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup
Open

feat: add initial weka cluster addmision for shared drives setup#2090
assafgi wants to merge 1 commit intomainfrom
01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup

Conversation

@assafgi
Copy link
Contributor

@assafgi assafgi commented Jan 19, 2026

No description provided.

@assafgi assafgi marked this pull request as ready for review January 19, 2026 18:12
Copy link
Contributor Author

assafgi commented Jan 19, 2026


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@graphite-app
Copy link

graphite-app bot commented Jan 19, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (01/19/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

@assafgi assafgi force-pushed the 01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup branch from 6077d6c to fd05c68 Compare January 19, 2026 18:15
Comment on lines +314 to +315
# Certificate configuration - certificates are generated via Helm
certDir: "/tmp/k8s-webhook-server/serving-certs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you generate this certificate? should it be part of operator as it says here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we generate it via helm install:

  {{- $ca := genCA "weka-operator-webhook-ca" 365 }}
  {{- $cert := genSignedCert $cn nil $altNames 365 $ca }}

Comment on lines +197 to +199
// Calculate capacity split based on ratio
tlcCapacity := (cfg.ContainerCapacity * tlcRatio) / totalRatio
qlcCapacity := (cfg.ContainerCapacity * qlcRatio) / totalRatio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's always use this one weka.GetTlcQlcCapacity (from api repo) to avoid remainder issue and have same calculations everywhere

// calculateMinContainerCapacity calculates the minimum containerCapacity needed
// to achieve minCapacityForType GiB for a drive type with the given ratio.
// Formula: minContainerCapacity = minCapacityForType * totalRatio / typeRatio
func calculateMinContainerCapacity(minCapacityForType, tlcRatio, qlcRatio, typeRatio int) int {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are confusing names, there are no separate qlc and tlc ratios - we have single ratio and we have the separate struct for it in api

also, minCapacityForType depends on enforceMinDrivesPerTypePerCore helm value

And in case when enforceMinDrivesPerTypePerCore is false, we don't know in advance what the "minCapacityForType" is, because minimum number of drives per type will be calculated dinamically

Comment on lines +188 to +195
tlcRatio := cfg.DriveTypesRatio.Tlc
qlcRatio := cfg.DriveTypesRatio.Qlc
totalRatio := tlcRatio + qlcRatio

if totalRatio <= 0 {
// Already validated by basic rules
return allErrs
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here maybe let's also fix naming

e.g. tlcParts/qlcParts/totalParts

Comment on lines +2 to +7
{{- $serviceName := printf "%s-webhook-service" .Values.prefix }}
{{- $secretName := printf "%s-webhook-server-cert" .Values.prefix }}
{{- $cn := printf "%s.%s.svc" $serviceName .Release.Namespace }}
{{- $altNames := list (printf "%s.%s.svc" $serviceName .Release.Namespace) (printf "%s.%s.svc.cluster.local" $serviceName .Release.Namespace) }}
{{- $ca := genCA "weka-operator-webhook-ca" 365 }}
{{- $cert := genSignedCert $cn nil $altNames 365 $ca }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the certificate generation approach

this is what I got when chatting with Graphite a bit:

Why it's acceptable:

- Internal-only communication (API server → webhook service within cluster)
- Self-signed certificates are sufficient since the caBundle is explicitly provided to the API server
- Common pattern used by many operators (e.g., early versions of cert-manager, various operators)

Important limitations:

- Helm upgrade issue: Certificates regenerate on every helm upgrade, which can cause temporary webhook failures until pods restart with new certs
- No automatic rotation: Certs expire in 365 days with no auto-renewal mechanism
- StatefulSet/upgrade challenges: If pods don't restart after upgrade, they'll have stale certificates

Better production alternatives:

- cert-manager: Industry standard for Kubernetes certificate management with automatic rotation
- External CA: Use your organization's PKI infrastructure
- Helm lookup function: Preserve existing certificates across upgrades (prevents regeneration issue)

For a production-ready operator, consider migrating to cert-manager or implementing the lookup pattern to avoid certificate churn during upgrades. 
The current approach works but may cause operational friction.

So I guess it's fine to keep it like this for now and maybe replace later with cert-manager

@assafgi assafgi force-pushed the 01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup branch from fd05c68 to 7fe1ec1 Compare January 20, 2026 10:54
@assafgi assafgi force-pushed the 01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup branch from 7fe1ec1 to 4c6d4d9 Compare January 20, 2026 13:30
@assafgi assafgi force-pushed the 01-19-feat_add_initial_weka_cluster_addmision_for_shared_drives_setup branch from 4c6d4d9 to 8f29c06 Compare January 21, 2026 08:06
@assafgi assafgi requested a review from a team as a code owner January 28, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants