-
Notifications
You must be signed in to change notification settings - Fork 120
e2e: PP: cover ExecCPUAffinity support in tests #1432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shajmakh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
depend on #1426 |
6ef4f1a to
beeea3d
Compare
565820a to
0af067c
Compare
|
regarding /test e2e-gcp-pao-workloadhints |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). In this commit we start updating only the affected job on which the test would run, later we will need to add this setting to all other jobs that consume ipi-gcp cluster configuration. Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
|
/retest |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
|
when temporarly removed the failing test due to misaligning node topology with PP cpu section, |
|
/hold |
GCP cluster profile uses ipi-gcp flow which by default uses 6 vCPUs for compute machines (see `step-registry/ipi/conf/gcp/ipi-conf-ref.yaml`).The performance profile suites configures a profile with `reserved: "0"` and `isolated: "1-3"` (see openshift/cluster-node-tuning-operator#909), unless environment vars are specificed. In general this is the good practice to include all node's cpus in the PP cpu section, but reason why we need this now is that we have some new tests that requires most the cpus to be all distributed using PP (see openshift/cluster-node-tuning-operator#1432 (comment)). Note: this is subject to change should the CPU specifications on GCP get modified. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
0af067c to
41afeca
Compare
|
/test e2e-aws-ovn |
acdb51a to
a8b7158
Compare
Add main e2e tests that checks the behavior of performance-profile with `ExecCPUAffinity: first` and without it (legacy). Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Add unit tests for functions in resources helper package for tests. Assisted-by: Cursor v1.2.2 AI-Attribution: AIA Entirely AI, Human-initiated, Reviewed, Cursor v1.2.2 v1.0 Signed-off-by: Shereen Haj <shajmakh@redhat.com>
|
/retest |
|
@shajmakh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/unhold |
SargunNarula
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests, IMO some tests are redundant which can be removed.
| } | ||
|
|
||
| var err error | ||
| testPod := pods.MakePodWithResources(ctx, workerRTNode, qos, containersResources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Expect(err).ToNot(HaveOccurred()) | ||
| if isExclusiveCPURequest { | ||
| testlog.Infof("exec process CPU: %d, first shared CPU: %d", execProcessCPUInt, firstCPU) | ||
| Expect(execProcessCPUs).To(Equal(firstCPU), "Exec process CPU is not the first shared CPU; retry %d", i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mismatch comparison - execProcessCPUInt to be used instead of execProcessCPUs
| sharedCpusResource: resource.MustParse("1"), | ||
| }, | ||
| }), | ||
| Entry("guaranteed pod with multiple containers with shared CPU request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests the same condition as a guaranteed pod with a single container requesting shared CPUs.
Although the CPU manager takes a different allocation path by splitting CPUs across containers, the exec-cpu-affinity feature behaves identically on a per-container basis, which is what we are validating here.
| corev1.ResourceMemory: resource.MustParse("100Mi"), | ||
| }, | ||
| }), | ||
| Entry("guaranteed pod with fractional CPU requests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same check as in - guaranteed pod with single container with shared CPU request and fractinal CPU requests
| sharedCpusResource: resource.MustParse("1"), | ||
| }, | ||
| }), | ||
| Entry("best-effort pod with shared CPU request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to keep the best-effort scenario under cpu_management only, since it does not depend on shared CPUs.
| //cnt1 resources | ||
| {}, | ||
| }), | ||
| Entry("burstable pod with shared CPU request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case with burstable
| // should we expect cpu affinity to the first CPU | ||
| isExclusiveCPURequest := false | ||
| if qos == corev1.PodQOSGuaranteed { | ||
| cpuRequestFloat := container.Resources.Requests.Name(corev1.ResourceCPU, resource.DecimalSI).AsFloat64Slow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, can we use Millivalue -
milliCPU := container.Resources.Requests.Cpu().MilliValue()
isExclusiveCPURequest = (milliCPU % 1000) == 0
| corev1.ResourceMemory: resource.MustParse("200Mi"), | ||
| }, | ||
| }), | ||
| Entry("guaranteed pod with two containers with exclusive CPUs,exec process should be binned to first CPU", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same check as in - guaranteed pod single container with exclusive CPUs, exec process should be binned to first CPU
| corev1.ResourceMemory: resource.MustParse("200Mi"), | ||
| }, | ||
| }), | ||
| Entry("guaranteed pod with two container with fraction CPU request, exec process can be binned to any CPU for containerrewuesting fractional CPUs, and to the first CPU for container requesting exclusive CPUs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of this test - guaranteed pod multiple containers with fraction CPU request, exec process can be binned to any CPU for container requesting fractional CPUs, and to the first CPU for container requesting exclusive CPUs
| } | ||
| } | ||
|
|
||
| func MakePodWithResources(ctx context.Context, workerRTNode *corev1.Node, qos corev1.PodQOSClass, containersResources []corev1.ResourceList) *corev1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused argument - ctx context.Context
Add basic e2e tests that checks the default behavior of performance-profile with default enabled
ExecCPUAffinity: first.