e2e: Add housekeeping IRQ load balancing tests #1456

SargunNarula · 2026-01-15T13:24:52Z

Added e2e tests for IRQ load balancing with housekeeping pods:

[86346] Verify housekeeping works correctly with single hyperthread allocation
[86348] Verify irqbalance does not overwrite on TuneD restart
(housekeeping annotation)
[86347] Verify housekeeping selects single CPU when SMT is disabled
Added baseload calculation functions for determining available pod capacity on nodes

SargunNarula · 2026-01-21T15:31:44Z

/retest

yanirq · 2026-01-25T11:42:01Z

/retest

test/e2e/performanceprofile/functests/1_performance/irqbalance.go

MarSik · 2026-01-27T15:24:20Z

test/e2e/performanceprofile/functests/1_performance/irqbalance.go

+			Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)")
+
+			cpuRequest := 2
+			if cpuRequest > newIsolatedCPUs.Size() {


This will most likely fail to schedule if you have exactly 2 isolated cpus. There will be some burstable pods as well. You need to check the currently available resources or "assume" some burstable and compare with newIsolatedCPUs - 1.

you can add a similar helper as there is in NROP:
https://github.com/openshift-kni/numaresources-operator/blob/main/internal/baseload/baseload.go#L37

@shajmakh @MarSik Thanks for pointing this, have a look at the baseload helper implementation introduced with latest commit.

test/e2e/performanceprofile/functests/1_performance/irqbalance.go

shajmakh · 2026-01-27T16:04:42Z

test/e2e/performanceprofile/functests/1_performance/irqbalance.go

+			Expect(smtActive).To(Equal("0"), "SMT should be disabled (smt/active should be 0)")
+
+			cpuRequest := 2
+			if cpuRequest > newIsolatedCPUs.Size() {


test/e2e/performanceprofile/functests/1_performance/irqbalance.go

shajmakh · 2026-01-28T16:14:56Z

test/e2e/performanceprofile/functests/1_performance/irqbalance.go

+			By("Restoring original CPU configuration")
+			currentProfile, err = profiles.GetByNodeLabels(testutils.NodeSelectorLabels)
+			Expect(err).ToNot(HaveOccurred())
+			currentReserved := string(*currentProfile.Spec.CPU.Reserved)


why updating PP twice?

While restoring, we update Profile twice to handle the CPU topology transition safely.

We cannot update PP in a single phase because removing nosmt changes the online CPU
topology only after reboot, while kubelet immediately validates reservedSystemCPUs/isolatedCPUs
against the current online CPU set during CPU Manager initialization. A single update would
apply cpusets that reference offline SMT siblings, causing kubelet config validation to fail.

See kubelet CPU manager validation:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L251-L270

Interesting, I'd expect kubelet to reconcile gracefully

test/e2e/performanceprofile/functests/utils/baseload/baseload.go

shajmakh · 2026-01-28T16:31:37Z

test/e2e/performanceprofile/functests/utils/baseload/baseload.go

+// CPURequestedCores returns the total CPU requested in whole cores (rounded up)
+func (l Load) CPURequestedCores() int {
+	millis := l.Resources.Cpu().MilliValue()
+	return int((millis + 999) / 1000)


perhaps use a roundup:
retCpu := *resource.NewQuantity(roundUp(cpu.Value(), 2), resource.DecimalSI)

Thanks for the suggestion! I considered the roundUp approach used in NROP, but after our offline discussion, I believe the ceiling division approach is more appropriate for this use case.

As both have different use cases:

In NROP, roundUp to even numbers is used to calculate remaining resources for node padding, which needs to be SMT-aligned.

In this case, the baseload calculation is only used to verify that the node has sufficient capacity for the test pod. We need an accurate representation of the actual load, not an SMT-aligned value

SargunNarula · 2026-01-30T13:22:48Z

/retest

Added e2e tests for IRQ load balancing with housekeeping pods: - [86346] Verify housekeeping works correctly with single hyperthread allocation - [86348] Verify irqbalance does not overwrite on TuneD restart (housekeeping annotation) - [86347] Verify housekeeping selects single CPU when SMT is disabled Added baseload calculation functions for determining available pod capacity on nodes Signed-off-by: Sargun Narula <snarula@redhat.com>

shajmakh · 2026-01-30T15:30:11Z

/approve
@SargunNarula can you confirm please that these tests have run enough and are stable?

openshift-ci · 2026-01-30T15:30:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SargunNarula, shajmakh
Once this PR has been reviewed and has the lgtm label, please assign jmencak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-01-30T17:05:19Z

@SargunNarula: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-pao	`fb10b58`	link	true	`/test e2e-gcp-pao`
ci/prow/e2e-hypershift-pao	`fb10b58`	link	true	`/test e2e-hypershift-pao`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from jmencak and yanirq January 15, 2026 13:25

yanirq reviewed Jan 25, 2026

View reviewed changes

test/e2e/performanceprofile/functests/1_performance/irqbalance.go Outdated Show resolved Hide resolved

yanirq reviewed Jan 25, 2026

View reviewed changes

test/e2e/performanceprofile/functests/1_performance/irqbalance.go Outdated Show resolved Hide resolved

SargunNarula force-pushed the irq_housekeeping_2 branch from ff2d905 to bdafc69 Compare January 27, 2026 09:49

MarSik reviewed Jan 27, 2026

View reviewed changes

test/e2e/performanceprofile/functests/1_performance/irqbalance.go Outdated Show resolved Hide resolved

MarSik reviewed Jan 27, 2026

View reviewed changes

test/e2e/performanceprofile/functests/1_performance/irqbalance.go Outdated Show resolved Hide resolved

MarSik reviewed Jan 27, 2026

View reviewed changes

test/e2e/performanceprofile/functests/1_performance/irqbalance.go Outdated Show resolved Hide resolved

SargunNarula force-pushed the irq_housekeeping_2 branch from bdafc69 to f663ea0 Compare January 27, 2026 14:04

MarSik reviewed Jan 27, 2026

View reviewed changes

shajmakh reviewed Jan 27, 2026

View reviewed changes

SargunNarula force-pushed the irq_housekeeping_2 branch from f663ea0 to 0ce6e8c Compare January 28, 2026 13:43

shajmakh reviewed Jan 28, 2026

View reviewed changes

SargunNarula force-pushed the irq_housekeeping_2 branch 3 times, most recently from f036f2c to 88dc045 Compare January 30, 2026 08:59

SargunNarula force-pushed the irq_housekeeping_2 branch from 88dc045 to fb10b58 Compare January 30, 2026 13:49

e2e: Add housekeeping IRQ load balancing tests #1456

Are you sure you want to change the base?

e2e: Add housekeeping IRQ load balancing tests #1456

Uh oh!

Conversation

SargunNarula commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SargunNarula commented Jan 21, 2026

Uh oh!

yanirq commented Jan 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SargunNarula Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SargunNarula commented Jan 30, 2026

Uh oh!

shajmakh commented Jan 30, 2026

Uh oh!

openshift-ci bot commented Jan 30, 2026

Uh oh!

openshift-ci bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SargunNarula commented Jan 15, 2026 •

edited

Loading

SargunNarula Jan 30, 2026 •

edited

Loading