Skip to content

feat: add smbw support#2104

Open
assafgi wants to merge 1 commit intomainfrom
01-21-feat_add_smbw_support
Open

feat: add smbw support#2104
assafgi wants to merge 1 commit intomainfrom
01-21-feat_add_smbw_support

Conversation

@assafgi
Copy link
Contributor

@assafgi assafgi commented Jan 22, 2026

No description provided.

@assafgi assafgi marked this pull request as ready for review January 22, 2026 07:00
Copy link
Contributor Author

assafgi commented Jan 22, 2026


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@graphite-app
Copy link

graphite-app bot commented Jan 22, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (01/22/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch from 6660290 to 295e9ec Compare January 22, 2026 08:07
Comment on lines +40 to +49
smbwCluster, err := wekaService.GetSmbwCluster(ctx)
if err != nil {
err = errors.Wrap(err, "Failed to get SMB-W cluster")
return err
}

if smbwCluster.Active {
logger.Info("SMB-W cluster already exists")
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like EnsureSmbwCluster expects smbw cluster to be already created and will otherwise

Comment on lines +263 to +265
func contains(s, substr string) bool {
return len(s) >= len(substr) && (s == substr || len(s) > 0 && containsIgnoreCase(s, substr))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it's for containsIgnoreCase
then maybe would be good to remove this "contains" func and only keep "containsIgnoreCase"

@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch from 295e9ec to 5516b89 Compare January 22, 2026 10:54
@assafgi assafgi requested a review from a team as a code owner January 28, 2026 18:09
@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch 13 times, most recently from dd13517 to 607f4de Compare February 11, 2026 11:32
@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch 5 times, most recently from 3ecc1ad to ff27dc2 Compare February 16, 2026 13:59
@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch from ff27dc2 to 4b43eff Compare February 24, 2026 10:41
Copilot AI review requested due to automatic review settings March 5, 2026 09:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class SMB-W (“smbw”) container/cluster support across the operator, including lifecycle reconciliation steps, Weka CLI integrations, and runtime/pod configuration needed for SMB-W.

Changes:

  • Extend WekaService + CLI implementation with SMB-W cluster operations (create/destroy/join/remove/list + floating IP ranges).
  • Add SMB-W reconciliation steps in both WekaCluster (post-create, deletion) and WekaContainer (active join, deletion remove) flows.
  • Update runtime/pod/helm configuration for SMB-W specifics (mode support in runtime script, /dev/shm tmpfs sizing, templates/resources).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/services/weka.go Adds SMB-W service types and CLI methods for cluster lifecycle + IP ranges.
internal/controllers/wekacontainer/metrics_steps.go Includes SMB-W (and data-services modes) in status metrics polling predicates.
internal/controllers/wekacontainer/funcs_active_state_smbw.go New container-loop helpers for joining/removing SMB-W cluster membership.
internal/controllers/wekacontainer/flow_deleting_state.go Adds deletion step to remove SMB-W containers from the SMB-W cluster; adjusts S3 removal exec selection.
internal/controllers/wekacontainer/flow_active_state.go Adds active-state step to join SMB-W cluster when ready.
internal/controllers/wekacluster/steps_post_cluster.go Adds post-cluster SMB-W steps: destroy/ensure cluster, domain join, configure IP ranges.
internal/controllers/wekacluster/steps_deletion.go Ensures SMB-W containers are paused and absent during cluster finalization.
internal/controllers/wekacluster/steps_cluster_creation.go Includes “smbw” in missing-container build logic (template-driven).
internal/controllers/wekacluster/funcs_smbw.go Implements SMB-W cluster create/destroy, domain join, and floating IP range reconciliation.
internal/controllers/wekacluster/funcs_helpers.go Adds SMB-W container selectors/helpers for cluster loop.
internal/controllers/resources/pod.go Adds /dev/shm tmpfs volume/mount for SMB-W pods; adjusts CPU/memory sizing to account for SMB-W mode.
internal/controllers/factory/container_factory.go Plumbs template knobs into SMB-W container construction (cores/hugepages/extra cores/memory).
internal/controllers/allocator/templates.go Extends templates + dynamic defaults for SMB-W cores/extra cores/hugepages/containers.
internal/config/env.go Adds operator env-config for SMB-W shm sizing + cluster formation constants.
charts/weka-operator/values.yaml Adds helm values for SMB-W shm sizing.
charts/weka-operator/templates/manager.yaml Wires SMBW_SHM_SIZE env var into the manager deployment.
charts/weka-operator/resources/weka_runtime.py Enables “smbw” mode in runtime script paths (creation, resources, shutdown, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +236 to +237
cmd := fmt.Sprintf("weka smb domain join %s %s", username, password)
_, stderr, err := executor.ExecSensitive(ctx, "JoinSmbwDomain", []string{"bash", "-ce", cmd})
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinSmbwDomain builds a shell command string containing the domain-join password and executes it via bash -ce. This is vulnerable to shell injection/breakage if the username/password contain spaces or shell metacharacters, and it also increases the risk of accidental secret exposure through process args/telemetry. Prefer executing the CLI with an argv slice (no shell) or otherwise properly escaping/quoting the credentials so they are not interpreted by a shell.

Suggested change
cmd := fmt.Sprintf("weka smb domain join %s %s", username, password)
_, stderr, err := executor.ExecSensitive(ctx, "JoinSmbwDomain", []string{"bash", "-ce", cmd})
_, stderr, err := executor.ExecSensitive(ctx, "JoinSmbwDomain", []string{"weka", "smb", "domain", "join", username, password})

Copilot uses AI. Check for mistakes.

containerId := r.container.Status.ClusterContainerID
if containerId == nil {
return errors.New("Container ID is not set")
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveFromSmbwCluster returns an error when ClusterContainerID is nil. During deletion this can happen for SMB-W containers that never joined the cluster, and this step will block finalization unnecessarily. Align behavior with RemoveFromS3Cluster by treating a nil container ID as a no-op (return nil).

Suggested change
return errors.New("Container ID is not set")
logger.Info("SMB-W container has no ClusterContainerID, skipping removal")
return nil

Copilot uses AI. Check for mistakes.
@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch from 474d01f to df2b9a0 Compare March 5, 2026 11:20
Copilot AI review requested due to automatic review settings March 5, 2026 14:09
@assafgi assafgi force-pushed the 01-21-feat_add_smbw_support branch from df2b9a0 to 981e169 Compare March 5, 2026 14:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

cmd := []string{
"weka", "smb", "cluster", "update",
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateSmbwCluster uses "weka" while the other SMB-W cluster operations here use "wekaauthcli". This inconsistency is likely to break updates in environments where only the auth-wrapped CLI is available/required. Use the same CLI binary consistently for SMB-W cluster operations (or document/justify why update must use a different binary).

Suggested change
"weka", "smb", "cluster", "update",
"wekaauthcli", "smb", "cluster", "update",

Copilot uses AI. Check for mistakes.
Comment on lines +510 to +519
shmSizeLimit := resource.MustParse(config.Config.Smbw.ShmSize)
vols = append(vols, corev1.Volume{
Name: "smbw-shm",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{
Medium: corev1.StorageMediumMemory,
SizeLimit: &shmSizeLimit,
},
},
})
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource.MustParse will panic the controller if SMBW_SHM_SIZE (Helm/environment) is misconfigured. Since this value is user-provided, parse it with error handling and return a reconcile error (and/or surface a condition/event) instead of crashing the manager process.

Suggested change
shmSizeLimit := resource.MustParse(config.Config.Smbw.ShmSize)
vols = append(vols, corev1.Volume{
Name: "smbw-shm",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{
Medium: corev1.StorageMediumMemory,
SizeLimit: &shmSizeLimit,
},
},
})
shmSizeLimit, err := resource.ParseQuantity(config.Config.Smbw.ShmSize)
if err == nil {
vols = append(vols, corev1.Volume{
Name: "smbw-shm",
VolumeSource: corev1.VolumeSource{
EmptyDir: &corev1.EmptyDirVolumeSource{
Medium: corev1.StorageMediumMemory,
SizeLimit: &shmSizeLimit,
},
},
})
}

Copilot uses AI. Check for mistakes.
}

// EnsureSmbwClusterUpdated updates the SMB-W cluster configuration for fields that support updates.
// This includes profile, encryption, and IP pools/ranges.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states updates include profile, but SmbwUpdateParams and the populated updateParams do not include it. Either implement profile updates end-to-end (spec → params → CLI flag) or update the comment/docs to match actual behavior to avoid misleading operators.

Suggested change
// This includes profile, encryption, and IP pools/ranges.
// This includes encryption and IP pools/ranges.

Copilot uses AI. Check for mistakes.
ClusterName: clusterName,
Encryption: r.cluster.Spec.SmbwConfig.Encryption,
IpPools: r.cluster.Spec.SmbwConfig.IpPools,
IpRanges: r.cluster.Spec.SmbwConfig.IpRanges,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states updates include profile, but SmbwUpdateParams and the populated updateParams do not include it. Either implement profile updates end-to-end (spec → params → CLI flag) or update the comment/docs to match actual behavior to avoid misleading operators.

Suggested change
IpRanges: r.cluster.Spec.SmbwConfig.IpRanges,
IpRanges: r.cluster.Spec.SmbwConfig.IpRanges,
Profile: r.cluster.Spec.SmbwConfig.Profile,

Copilot uses AI. Check for mistakes.
}

func (r *containerReconcilerLoop) RemoveFromSmbwCluster(ctx context.Context) error {
ctx, logger, end := instrumentation.GetLogSpan(ctx, "")
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing an empty span/logger name makes tracing and log correlation significantly harder and typically results in unnamed spans. Use a stable operation name (e.g., the function name) so SMB-W removal operations are observable in traces/logs.

Suggested change
ctx, logger, end := instrumentation.GetLogSpan(ctx, "")
ctx, logger, end := instrumentation.GetLogSpan(ctx, "RemoveFromSmbwCluster")

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
if containerId == nil {
return errors.New("Container ID is not set")
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go error strings should generally be lower-case and not start with a capital letter (so they read naturally when wrapped). Consider changing these to container ID is not set / no active container found (and similarly in other new code paths added in this PR).

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +60
if executeInContainer == nil {
return errors.New("No active container found")
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go error strings should generally be lower-case and not start with a capital letter (so they read naturally when wrapped). Consider changing these to container ID is not set / no active container found (and similarly in other new code paths added in this PR).

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +71
// Don't fail if SMB-W cluster doesn't exist
if err.Error() == "SMB-W cluster is not configured" {
logger.Info("SMB-W cluster is not configured, skipping removal", "container_id", *containerId)
return nil
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing err.Error() to a specific string is brittle and can easily break if the upstream error message changes or if wrapping is introduced. Prefer returning/propagating a typed/sentinel error from RemoveFromSmbwCluster (e.g., var ErrSmbwNotConfigured = ... or a custom error type) and matching it with errors.Is/errors.As.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants