Skip to content

YNU-805: Run multiple stress test pods & other improvements#584

Merged
dimast-x merged 3 commits intomainfrom
feat/update-stress-tool
Feb 25, 2026
Merged

YNU-805: Run multiple stress test pods & other improvements#584
dimast-x merged 3 commits intomainfrom
feat/update-stress-tool

Conversation

@dimast-x
Copy link
Contributor

@dimast-x dimast-x commented Feb 24, 2026

Summary by CodeRabbit

  • New Features

    • Multiple stress-test pods with per-pod configuration.
    • Stress-test data provided via mounted config and executed by a dedicated Job.
    • Conditional secret support for per-pod private keys.
  • Refactor

    • Stress-test execution moved to a config-driven Job model for flexibility.
  • Improvements

    • Per-connection retries with exponential backoff and retry reporting.
    • Service and deployment labels aligned for consistent selection.

@dimast-x dimast-x requested a review from a team as a code owner February 24, 2026 15:39
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dimast-x, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the stress testing framework for the clearnode application. It transitions from a monolithic single-pod test to a scalable, multi-pod Kubernetes Job architecture, allowing for parallel and independently configured stress test executions. This change enhances the flexibility and reliability of performance testing. Furthermore, the underlying client connection logic has been made more resilient through the addition of retry mechanisms, ensuring more stable test setups.

Highlights

  • Multi-Pod Stress Testing: Implemented support for running multiple stress test pods concurrently using Kubernetes Jobs, replacing the previous single-pod approach.
  • Configurable Stress Test Pods: Refactored Helm chart values to allow granular configuration of individual stress test pods, including per-pod overrides for WebSocket URL, private key, connections, timeout, and error rate.
  • Robust Client Connections: Introduced a retry mechanism with exponential backoff for SDK client connections in stress tests, significantly improving connection establishment robustness.
  • Kubernetes Resource Refactoring: Converted the stress test deployment from a single Kubernetes Pod to a combination of a ConfigMap (for dynamic environment variables) and an Indexed Job (for managing multiple parallel executions).
Changelog
  • clearnode/chart/config/v1-rc/clearnode.yaml
    • Added example configuration for multiple stress test pods under the stressTest.pods section.
  • clearnode/chart/templates/tests/stress-test.yaml
    • Reworked the stress test deployment from a single Kubernetes Pod to a ConfigMap and an Indexed Job.
    • Configured the ConfigMap to dynamically generate environment variables for each stress test pod.
    • Updated the Job to run multiple parallel pods, each loading its specific configuration from the ConfigMap.
    • Adjusted Helm hook weights for the new ConfigMap and Job resources.
  • clearnode/chart/values.yaml
    • Updated stressTest values to support a list of configurable pods, allowing individual settings for each.
    • Renamed existing stressTest parameters to be 'Default' values, which can be overridden per pod.
    • Removed the top-level stressTest.specs field in favor of per-pod specs.
  • clearnode/stress/pool.go
    • Implemented connection retry logic with exponential backoff for SDK client pool creation, improving connection stability.
  • clearnode/stress/transfer.go
    • Added connection retry logic with exponential backoff for individual SDK client creation in transfer stress tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec01e25 and 69b35a8.

📒 Files selected for processing (1)
  • clearnode/chart/templates/tests/stress-test.yaml

📝 Walkthrough

Walkthrough

Adds per-pod stressTest configuration and five pod entries; replaces a Pod-based Helm stress-test with a ConfigMap-driven config + Job runner; and adds exponential-backoff retry logic to client creation and pool connection setup in the stress package.

Changes

Cohort / File(s) Summary
Helm config values & presets
clearnode/chart/config/v1-rc/clearnode.yaml, clearnode/chart/values.yaml
Introduced stressTest.pods list and added five pod entries (get-config-1..get-config-5) with per-pod specs; updated comments/descriptions for pluralized per-pod overrides.
Helm templates (manifest shape)
clearnode/chart/templates/tests/stress-test.yaml
Replaced Pod-based manifest with a ConfigMap (*-stress-config) containing per-pod env files and a batch/v1 Job (*-stress-test) that mounts the ConfigMap and runs per-pod containers. Added conditional Secret for private keys, hook weight changes, volumes, imagePullSecrets, nodeSelector/tolerations, completions/parallelism, and updated command to source per-pod env.
Kubernetes selector labeling
clearnode/chart/templates/deployment.yaml, clearnode/chart/templates/service.yaml
Added app.kubernetes.io/component: server to Deployment selector/template and Service selector to align labels/selectors.
Go: stress client & pool logic
clearnode/stress/transfer.go, clearnode/stress/pool.go
Added retry loops with exponential backoff in client creation and per-connection pool setup (capped delays, progress logging, total retry accounting). Pool continues after individual connection failures and only fails if zero connections established; no public APIs changed.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Helm as Helm/chart renderer
participant KubeAPI as Kubernetes API
participant ConfigMap as ConfigMap (stress-config)
participant Job as Job controller
participant Pod as Job Pod (container)
Helm->>KubeAPI: Apply templates -> create ConfigMap + Job
KubeAPI->>ConfigMap: Store per-pod env files
KubeAPI->>Job: Create Job resource
Job->>KubeAPI: Schedule Pods
KubeAPI->>Pod: Start Pod with mounted ConfigMap
Pod->>ConfigMap: Read /stress-config/pod-.env
Pod->>Pod: Execute clearnode stress-test using env values
Pod->>KubeAPI: Report completion/status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • alessio

Poem

🐰 I hopped through configs, five in a row,
Jobs and ConfigMaps set the stage to go,
Backoffs that double, retries that hum,
Connections that try until success will come,
Tiny rabbit cheers — the tests now run!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change—running multiple stress test pods—and includes a secondary mention of additional improvements, reflecting the actual changeset across configuration, templates, and connection retry logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/update-stress-tool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
clearnode/stress/pool.go 0.00% 21 Missing ⚠️
clearnode/stress/transfer.go 0.00% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the stress testing capabilities, allowing multiple test pods to run in parallel via a Kubernetes Job. The configuration in values.yaml has been refactored to support this, which is a great enhancement for flexibility. Additionally, the Go stress test clients now include a resilient connection mechanism with retries and exponential backoff. While these are excellent changes, I've identified a critical security vulnerability and a medium-severity robustness issue in the Helm template that should be addressed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clearnode/chart/templates/tests/stress-test.yaml`:
- Around line 29-32: The template currently writes STRESS_PRIVATE_KEY into a
ConfigMap using the $privateKey variable; move this into a Kubernetes Secret and
inject it via valueFrom: secretKeyRef instead of rendering the raw value. Change
the logic that reads $pod.privateKey | default $defaults.privateKey so you
render only a Secret reference (or accept secret name/key parameters) and update
the environment entry STRESS_PRIVATE_KEY to use env.valueFrom.secretKeyRef with
the configured secretName and key; ensure no raw privateKey value is rendered
into the ConfigMap template or chart outputs.
- Around line 23-28: The STRESS_SPECS env entry is generated with join " " which
creates raw spaces that get split when the env file is later exported; update
the pod-{{ $i }}.env template to emit a safely quoted value (e.g.
STRESS_SPECS='{{ join " " $pod.specs }}') or use a non-space delimiter (e.g.
join "," or "|") and ensure the consuming code parses that delimiter; apply the
same quoting/delimiter change to the other similar env lines (the second block
at lines referenced in the comment) so multi-entry specs are preserved during
export.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b5c378 and cedcf22.

📒 Files selected for processing (5)
  • clearnode/chart/config/v1-rc/clearnode.yaml
  • clearnode/chart/templates/tests/stress-test.yaml
  • clearnode/chart/values.yaml
  • clearnode/stress/pool.go
  • clearnode/stress/transfer.go

Comment on lines +23 to +28
pod-{{ $i }}.env: |
STRESS_SPECS={{ join " " $pod.specs }}
STRESS_WS_URL={{ $wsURL }}
STRESS_CONNECTIONS={{ $connections }}
STRESS_TIMEOUT={{ $timeout }}
STRESS_MAX_ERROR_RATE={{ $maxErrorRate }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

STRESS_SPECS with multiple entries will break env export.

join " " produces spaces, and export $(... | xargs) splits them, truncating STRESS_SPECS and potentially causing invalid exports. This breaks multi-spec pods.

🔧 Safer env loading + quoting
-    STRESS_SPECS={{ join " " $pod.specs }}
+    STRESS_SPECS={{ join " " $pod.specs | quote }}
@@
-              export $(cat /stress-config/pod-${JOB_COMPLETION_INDEX}.env | xargs)
+              set -a
+              . /stress-config/pod-${JOB_COMPLETION_INDEX}.env
+              set +a

Also applies to: 66-69

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/chart/templates/tests/stress-test.yaml` around lines 23 - 28, The
STRESS_SPECS env entry is generated with join " " which creates raw spaces that
get split when the env file is later exported; update the pod-{{ $i }}.env
template to emit a safely quoted value (e.g. STRESS_SPECS='{{ join " "
$pod.specs }}') or use a non-space delimiter (e.g. join "," or "|") and ensure
the consuming code parses that delimiter; apply the same quoting/delimiter
change to the other similar env lines (the second block at lines referenced in
the comment) so multi-entry specs are preserved during export.

@dimast-x dimast-x force-pushed the feat/update-stress-tool branch from 6712ee9 to 83611d8 Compare February 24, 2026 16:16
@dimast-x dimast-x force-pushed the feat/update-stress-tool branch from 83611d8 to 8e7ff68 Compare February 24, 2026 16:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
clearnode/chart/templates/tests/stress-test.yaml (2)

23-28: STRESS_SPECS value is unquoted — export $(... | xargs) will break with multi-entry specs.

As previously noted, join " " produces bare spaces in the env file, and export $(cat ... | xargs) will word-split them. The current config happens to use single-entry specs, but this is a latent bug.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/chart/templates/tests/stress-test.yaml` around lines 23 - 28, The
env file writes STRESS_SPECS unquoted which will be word-split by consumers like
export $(cat ... | xargs); update the pod-{{ $i }}.env template in
stress-test.yaml so STRESS_SPECS is produced as a single quoted/escaped value
(e.g., wrap the result of join " " $pod.specs in quotes or use a join with a
safe delimiter and escape) so that the join " " $pod.specs output cannot be
split by shell word-splitting; ensure downstream consumers expect a quoted value
and adjust parsing accordingly.

29-32: STRESS_PRIVATE_KEY rendered into a ConfigMap is a security concern.

As previously noted, ConfigMaps are not secret-safe. This should be injected via a Kubernetes Secret reference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/chart/templates/tests/stress-test.yaml` around lines 29 - 32, The
template currently renders STRESS_PRIVATE_KEY into a ConfigMap using
{{$pod.privateKey | default $defaults.privateKey}}, which exposes sensitive
material; stop writing the private key into the ConfigMap and instead read it
from a Kubernetes Secret. Modify the template to remove STRESS_PRIVATE_KEY from
the ConfigMap and change the pod/container env entry to use
valueFrom.secretKeyRef pointing to a Secret name and key (e.g., reference a
secret that contains the private key under a known key); ensure the secret is
created or documented as required and update any references to
$pod.privateKey/$defaults.privateKey so they map to the Secret rather than being
rendered directly.
🧹 Nitpick comments (2)
clearnode/stress/transfer.go (1)

54-68: Retry parameters and signer creation are duplicated across pool.go — consider extracting a shared helper.

Both createClient (transfer.go) and CreateClientPool (pool.go) share identical signer creation code (ethMsgSigner, stateSigner, txSigner from privateKey) yet maintain separate retry loops with conflicting backoff parameters:

  • transfer.go uses 100ms base with no cap
  • pool.go uses 500ms base with 10s cap

This duplication risks divergence during maintenance. Consider extracting a shared helper like createSigners(privateKey string) for signer creation, and optionally connectWithRetry(wsURL string, stateSigner, txSigner interface{}, baseBackoff time.Duration, maxBackoff time.Duration) to parameterize backoff behavior if the difference is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/stress/transfer.go` around lines 54 - 68, Both createClient
(transfer.go) and CreateClientPool (pool.go) duplicate signer creation and
retry/backoff logic; extract a shared helper createSigners(privateKey string)
that returns ethMsgSigner, stateSigner, txSigner and replace both sites to call
it, and replace the retry loop with a shared connectWithRetry(wsURL string,
stateSigner, txSigner interface{}, baseBackoff time.Duration, maxBackoff
time.Duration) (or a simpler connectWithRetry(wsURL, stateSigner, txSigner,
baseBackoff) if no cap needed) so you can centralize the retry policy and
parameterize baseBackoff and maxBackoff to reconcile the 100ms vs 500ms/10s
behaviors used by createClient and CreateClientPool.
clearnode/chart/templates/tests/stress-test.yaml (1)

72-76: Remove redundant JOB_COMPLETION_INDEX environment variable definition.

Kubernetes automatically injects the JOB_COMPLETION_INDEX environment variable into containers of Indexed-mode Jobs (GA since K8s 1.24). The manual fieldRef extraction from the pod annotation is redundant and can be removed.

♻️ Suggested simplification
-          env:
-            - name: JOB_COMPLETION_INDEX
-              valueFrom:
-                fieldRef:
-                  fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/chart/templates/tests/stress-test.yaml` around lines 72 - 76,
Remove the manual environment variable entry that sets JOB_COMPLETION_INDEX via
a fieldRef to metadata.annotations['batch.kubernetes.io/job-completion-index']
because Kubernetes injects JOB_COMPLETION_INDEX for indexed Jobs automatically;
locate the env block containing "name: JOB_COMPLETION_INDEX" and the
valueFrom/fieldRef stanza and delete that entire env item so the container
relies on the built-in injected variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clearnode/chart/templates/deployment.yaml`:
- Around line 14-17: You're adding a new label key app.kubernetes.io/component:
server to spec.selector.matchLabels in the Deployment template
(spec.selector.matchLabels), which is an immutable field and will break
upgrades; fix by either (a) revert this selector label change and keep selectors
identical to existing releases, (b) if the label is required, document a
migration step and implement a safe upgrade path such as adding a Helm
pre-upgrade hook to delete the old Deployment before creating the new one, or
(c) gate the change behind a new chart release name / new Deployment
metadata.name so Kubernetes treats it as a new object; apply the chosen approach
and update Chart NOTES.md to describe the migration if you choose
delete/migration or change the release name.

---

Duplicate comments:
In `@clearnode/chart/templates/tests/stress-test.yaml`:
- Around line 23-28: The env file writes STRESS_SPECS unquoted which will be
word-split by consumers like export $(cat ... | xargs); update the pod-{{ $i
}}.env template in stress-test.yaml so STRESS_SPECS is produced as a single
quoted/escaped value (e.g., wrap the result of join " " $pod.specs in quotes or
use a join with a safe delimiter and escape) so that the join " " $pod.specs
output cannot be split by shell word-splitting; ensure downstream consumers
expect a quoted value and adjust parsing accordingly.
- Around line 29-32: The template currently renders STRESS_PRIVATE_KEY into a
ConfigMap using {{$pod.privateKey | default $defaults.privateKey}}, which
exposes sensitive material; stop writing the private key into the ConfigMap and
instead read it from a Kubernetes Secret. Modify the template to remove
STRESS_PRIVATE_KEY from the ConfigMap and change the pod/container env entry to
use valueFrom.secretKeyRef pointing to a Secret name and key (e.g., reference a
secret that contains the private key under a known key); ensure the secret is
created or documented as required and update any references to
$pod.privateKey/$defaults.privateKey so they map to the Secret rather than being
rendered directly.

---

Nitpick comments:
In `@clearnode/chart/templates/tests/stress-test.yaml`:
- Around line 72-76: Remove the manual environment variable entry that sets
JOB_COMPLETION_INDEX via a fieldRef to
metadata.annotations['batch.kubernetes.io/job-completion-index'] because
Kubernetes injects JOB_COMPLETION_INDEX for indexed Jobs automatically; locate
the env block containing "name: JOB_COMPLETION_INDEX" and the valueFrom/fieldRef
stanza and delete that entire env item so the container relies on the built-in
injected variable.

In `@clearnode/stress/transfer.go`:
- Around line 54-68: Both createClient (transfer.go) and CreateClientPool
(pool.go) duplicate signer creation and retry/backoff logic; extract a shared
helper createSigners(privateKey string) that returns ethMsgSigner, stateSigner,
txSigner and replace both sites to call it, and replace the retry loop with a
shared connectWithRetry(wsURL string, stateSigner, txSigner interface{},
baseBackoff time.Duration, maxBackoff time.Duration) (or a simpler
connectWithRetry(wsURL, stateSigner, txSigner, baseBackoff) if no cap needed) so
you can centralize the retry policy and parameterize baseBackoff and maxBackoff
to reconcile the 100ms vs 500ms/10s behaviors used by createClient and
CreateClientPool.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83611d8 and ec01e25.

📒 Files selected for processing (7)
  • clearnode/chart/config/v1-rc/clearnode.yaml
  • clearnode/chart/templates/deployment.yaml
  • clearnode/chart/templates/service.yaml
  • clearnode/chart/templates/tests/stress-test.yaml
  • clearnode/chart/values.yaml
  • clearnode/stress/pool.go
  • clearnode/stress/transfer.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • clearnode/chart/values.yaml

Comment on lines 14 to +17
selector:
matchLabels:
{{- include "clearnode.common.selectorLabels" . | nindent 6 }}
app.kubernetes.io/component: server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Adding a label to spec.selector.matchLabels is a breaking change for existing Deployments.

Kubernetes does not allow updating the selector of an existing Deployment. If this chart is applied to a cluster where the Deployment already exists, the upgrade will fail with an error like field is immutable. The Deployment must be deleted and recreated, which causes downtime.

Consider whether a migration note or a helm.sh/hook pre-upgrade delete is needed, or whether this change should be gated behind a new Deployment name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clearnode/chart/templates/deployment.yaml` around lines 14 - 17, You're
adding a new label key app.kubernetes.io/component: server to
spec.selector.matchLabels in the Deployment template
(spec.selector.matchLabels), which is an immutable field and will break
upgrades; fix by either (a) revert this selector label change and keep selectors
identical to existing releases, (b) if the label is required, document a
migration step and implement a safe upgrade path such as adding a Helm
pre-upgrade hook to delete the old Deployment before creating the new one, or
(c) gate the change behind a new chart release name / new Deployment
metadata.name so Kubernetes treats it as a new object; apply the chosen approach
and update Chart NOTES.md to describe the migration if you choose
delete/migration or change the release name.

@dimast-x dimast-x merged commit 603e786 into main Feb 25, 2026
11 checks passed
@dimast-x dimast-x deleted the feat/update-stress-tool branch February 25, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants