Skip to content

Comments

Fix flakiness of SR-IOV reboot test#3928

Open
yossisegev wants to merge 1 commit intoRedHatQE:mainfrom
yossisegev:sriov-interface-report
Open

Fix flakiness of SR-IOV reboot test#3928
yossisegev wants to merge 1 commit intoRedHatQE:mainfrom
yossisegev:sriov-interface-report

Conversation

@yossisegev
Copy link
Contributor

@yossisegev yossisegev commented Feb 19, 2026

We observed flakiness of test_sriov_interfaces_post_reboot, where sometimes the interface status report (post reboot) is not yet fully updated, which is accepted and shouldn't fail the test immediately. Therefore the check is now given a grace period to allow completion of the interface status, before checking it.

Summary by CodeRabbit

  • Tests
    • Improved SR‑IOV post‑reboot validation to use polling with a timeout and enhanced logging instead of an immediate equality check, making detection of interface restoration more reliable and providing clearer error reporting when restoration fails.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ac0864 and 42f5206.

📒 Files selected for processing (1)
  • tests/network/sriov/test_sriov.py

📝 Walkthrough

Walkthrough

Replaces an immediate equality assertion with a TimeoutSampler polling loop (120s timeout, 5s interval) that waits for restarted_sriov_vm4.vmi.interfaces[1] to match vm4_interfaces[1], adds TimeoutSampler/TimeoutExpiredError imports and a module-level LOGGER for logging mismatches and timeout errors.

Changes

Cohort / File(s) Summary
SR-IOV test update
tests/network/sriov/test_sriov.py
Added imports TimeoutSampler, TimeoutExpiredError and a module-level LOGGER. Replaced immediate equality assertion for the second SR-IOV interface with a TimeoutSampler polling loop (120s timeout, 5s interval) that compares restarted_sriov_vm4.vmi.interfaces[1] to vm4_interfaces[1], logs mismatches, and raises on timeout.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarizes the main change: adding polling to reduce test flakiness in the SR-IOV reboot test.
Description check ✅ Passed The PR description explains the problem and solution, but omits several required template sections including issue reference and special notes for reviewer.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot-4

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest verify-bugs-are-open - verify-bugs-are-open
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No WIP, hold, conflict labels
  5. Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev

Reviewers:

  • Anatw
  • EdDev
  • azhivovk
  • servolkov
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@yossisegev
Copy link
Contributor Author

yossisegev commented Feb 19, 2026

/verified

Verified by:

  1. Running the test and making sure it passes (all 5 times).
  2. Negative verification: Hard-coding an incomplete interface status report to compare with the interface post-reboot status report, and making sure a failure occurs.

@openshift-virtualization-qe-bot-3
Copy link
Contributor

@coderabbitai

Test execution plan request details

CRITICAL: You MUST post an inline review comment on the first changed line of the first file.
The inline comment should contain the Test Execution Plan with the smoke tests recommendation.
Do NOT submit a formal review - just post the inline comment directly.

As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan.
You will post an inline review comment with the test execution plan on the first changed file.
If you fail to run or post a comment, retry.

Analysis Requirements:

  1. Examine code changes in each modified file
  2. Identify affected code paths, functions, and classes
  3. Analyze pytest-specific elements: fixtures (scope, dependencies), parametrization, markers, conftest changes
  4. Trace test dependencies through imports, shared utilities, and fixture inheritance
  5. Detect new tests introduced in the PR
  6. Smoke Test Impact Analysis: Determine if any changes could affect smoke tests by checking:
    • Changes to files/functions used by tests marked with @pytest.mark.smoke
    • Changes to fixtures or utilities imported by smoke tests
    • Changes to conftest.py files that may affect smoke test execution
    • Changes to core infrastructure code (utilities/, libs/) that smoke tests depend on

Your deliverable:
Your change request comment will be based on the following requirements:

Test Execution Plan

  • Run smoke tests: True / False (Based on your smoke test impact analysis - True if any changes may affect smoke tests)
  • path/to/test_file.py - When the entire test file needs verification
  • path/to/test_file.py::TestClass::test_method - When specific test(s) needed
  • path/to/test_file.py::test_function - When specific test(s) needed
  • -m marker - When specific marker(s) can be used to cover multiple cases.

Guidelines:

  • Include only tests directly affected by the changes
  • Use a full file path only if ALL tests in that file require verification
  • Use file path + test name if only specific tests are needed
  • If a test marker can cover multiple files/tests, provide the marker
  • Balance coverage vs over-testing - Keep descriptions minimal

Hardware-Related Checks (SR-IOV, GPU, DPDK):

When PR modifies fixtures for hardware-specific resources:

  • Collection Safety: Fixtures MUST have existence checks (return None when hardware unavailable)
  • Test Plan: MUST verify both WITH and WITHOUT hardware:
    • Run affected tests on cluster WITH hardware
    • Verify collection succeeds on cluster WITHOUT hardware

CRITICAL WORKFLOW COMPLETION RULES:

When responding to this test execution plan request, you MUST follow these rules EXACTLY:

  1. YOUR ONLY DELIVERABLE: Post an inline review comment containing the test execution plan on the first changed line
  2. THEN STOP IMMEDIATELY - Do NOT generate any additional response
  3. FALLBACK ONLY: If submitting the review fails after retrying, post as a regular PR comment
  4. SILENCE = SUCCESS: After successfully submitting the review, your task is complete. No confirmation needed.

ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):

  • ❌ Do NOT post acknowledgment messages like "Test execution plan posted", "Review posted successfully", "I've successfully posted"
  • ❌ Do NOT mention review IDs, URLs, or confirmation of posting in the PR thread
  • ❌ Do NOT add any follow-up comments after submitting the review
  • ❌ Do NOT reply to confirm task completion
  • ❌ Do NOT explain what you did - just do it and stop

Remember: The pull request review is visible to users. Additional comments are redundant noise.

CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to:
- Trace the actual fixture dependency chain from smoke tests to changed fixtures
- Verify that smoke tests actually import/use changed utilities or functions
- Confirm the dependency path exists; do NOT assume based on scope or semantics
- Be conservative: Session-scoped fixtures or infrastructure-sounding names do NOT
automatically mean smoke test impact. Only flag smoke test impact when you can
demonstrate a concrete dependency path.
WRONG: "This session-scoped storage fixture might affect smoke tests"
RIGHT: "Smoke test X uses fixture Y, which depends on the changed fixture Z"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the PR diff to see what files were modified
gh pr diff 3928 --repo RedHatQE/openshift-virtualization-tests

echo "=== Checking for smoke tests that might be affected ==="
# Find all smoke tests
rg "pytest.mark.smoke" --type=py -B2 | grep "def test_" || echo "No smoke tests found with pattern"

echo "=== Checking if smoke tests use SR-IOV fixtures or utilities ==="
# Check if any smoke tests import or use SR-IOV related code
rg "pytest.mark.smoke" --type=py -l | xargs -I {} sh -c 'echo "=== File: {} ===" && cat {}'

Length of output: 178864


🏁 Script executed:

#!/bin/bash

# Check if any smoke tests import or use SR-IOV test utilities
echo "=== Checking for smoke test dependencies on SR-IOV code ==="
rg "pytest.mark.smoke" --type=py -l | while read -r smoke_test_file; do
    echo "Checking smoke test file: $smoke_test_file"
    # Check if smoke test imports anything from tests/network/sriov
    if grep -q "from tests.network.sriov" "$smoke_test_file" 2>/dev/null; then
        echo "FOUND: $smoke_test_file imports from tests/network/sriov"
    fi
    if grep -q "import tests.network.sriov" "$smoke_test_file" 2>/dev/null; then
        echo "FOUND: $smoke_test_file imports tests.network.sriov"
    fi
done

echo ""
echo "=== Checking for shared utilities used by SR-IOV tests ==="
# Check what the SR-IOV test file imports
echo "SR-IOV test imports:"
head -20 tests/network/sriov/test_sriov.py | grep -E "^(import|from)"

echo ""
echo "=== Getting PR file information ==="
gh pr view 3928 --repo RedHatQE/openshift-virtualization-tests --json files --jq '.files[] | "\(.path) - first line: \(.additions)"'

Length of output: 1245


🏁 Script executed:

#!/bin/bash

# Post inline review comment on first changed line with test execution plan
gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/RedHatQE/openshift-virtualization-tests/pulls/3928/reviews \
  -f event='COMMENT' \
  -f body='' \
  -f comments[][path]='tests/network/sriov/test_sriov.py' \
  -f comments[][line]=8 \
  -f comments[][side]='RIGHT' \
  -f comments[][body]='## Test Execution Plan

**Run smoke tests: False**

**Reason:** The change is isolated to a single SR-IOV test implementation. No smoke tests depend on SR-IOV test code or the modified functions. This is a test reliability fix that adds polling for interface status updates.

**Tests to Execute:**

- `tests/network/sriov/test_sriov.py::test_sriov_interfaces_post_reboot` - Direct test modification with polling mechanism

**Verification Strategy:**

Since this PR fixes test flakiness by adding a grace period for interface status updates:

1. **Run the modified test multiple times** (5-10 iterations) on a cluster WITH SR-IOV hardware to verify:
   - Test passes consistently with the new polling mechanism
   - Timeout handling works correctly when interface status is delayed
   - Error logging provides useful information on mismatch

2. **Hardware-Related Check:** Verify test collection succeeds on clusters WITHOUT SR-IOV hardware (test should be skipped gracefully via fixtures)

**Changed Files:**
- `tests/network/sriov/test_sriov.py` - Added TimeoutSampler polling for post-reboot interface validation'

Length of output: 270


🏁 Script executed:

#!/bin/bash

# Post inline review comment on first changed line with test execution plan
# Use -F for numeric fields (line) and -f for string fields
gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/RedHatQE/openshift-virtualization-tests/pulls/3928/reviews \
  -f event='COMMENT' \
  -f body='' \
  -f comments[][path]='tests/network/sriov/test_sriov.py' \
  -F comments[][line]=8 \
  -f comments[][side]='RIGHT' \
  -f comments[][body]='## Test Execution Plan

**Run smoke tests: False**

**Reason:** The change is isolated to a single SR-IOV test implementation. No smoke tests depend on SR-IOV test code or the modified functions. This is a test reliability fix that adds polling for interface status updates.

**Tests to Execute:**

- `tests/network/sriov/test_sriov.py::test_sriov_interfaces_post_reboot` - Direct test modification with polling mechanism

**Verification Strategy:**

Since this PR fixes test flakiness by adding a grace period for interface status updates:

1. **Run the modified test multiple times** (5-10 iterations) on a cluster WITH SR-IOV hardware to verify:
   - Test passes consistently with the new polling mechanism
   - Timeout handling works correctly when interface status is delayed
   - Error logging provides useful information on mismatch

2. **Hardware-Related Check:** Verify test collection succeeds on clusters WITHOUT SR-IOV hardware (test should be skipped gracefully via fixtures)

**Changed Files:**
- `tests/network/sriov/test_sriov.py` - Added TimeoutSampler polling for post-reboot interface validation'

Length of output: 1735



🧠 Learnings used
Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-22T15:56:00.157Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, do NOT use "REQUEST_CHANGES" review type if the PR author has already marked the PR as verified (e.g., with `/verified` command). Test execution plans are informational guides, not blocking requirements. Use COMMENT event for informational test plans, or only REQUEST_CHANGES if there are actual code issues that need to be addressed before merging.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T10:28:54.212Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for test execution plans, only set "Run smoke tests: True" if there is a verified, traceable dependency path from smoke tests to the changed code. Session-scoped fixtures or infrastructure-sounding changes do NOT automatically imply smoke test impact - the dependency chain must be explicitly verified using shell scripts before recommending smoke test execution.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:00:59.076Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, CodeRabbit must post ONLY an inline review comment on the Files Changed tab and then stop immediately without generating any follow-up comments in the PR discussion thread. No acknowledgment messages, no confirmation of posting, no explanation - silence after posting the inline review equals success. Additional comments create empty/meaningless reviews that clutter the PR.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T20:11:03.645Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for changes affecting py_config["os_login_param"], follow this verification methodology: (1) Find all smoke tests: `rg "pytest.mark.smoke" --type=py -B2 | grep "def test_"`, (2) For each smoke test file, search for VM creation patterns: `rg "VirtualMachineForTests|running_vm|VirtualMachineForTestsFromTemplate|wait_for_ssh|check_ssh_connectivity"`, (3) Trace the dependency chain: smoke test → VirtualMachineForTests/running_vm() → wait_for_ssh_connectivity() (default enabled) → vm.login_params property → py_config["os_login_param"][vm.os_flavor], (4) Check utilities/virt.py for login_params usage: `rg "os_login_param|login_params" utilities/virt.py -C3`. Any smoke test creating VMs with default SSH connectivity checks (running_vm with check_ssh_connectivity=True) depends on os_login_param, even if the test doesn't directly reference it.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: For PR `#1904` test execution, the critical validation point is test_connectivity_over_migration_between_localnet_vms which should fail gracefully on cloud clusters but pass on bare-metal/PSI clusters, representing the core nmstate conditional logic functionality.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: The test execution plan for PR `#1904` focuses on cluster-type conditional logic where nmstate functionality is bypassed on cloud clusters (Azure/AWS) but fully functional on bare-metal/PSI clusters, requiring different test strategies for each environment type.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:06:22.391Z
Learning: In the openshift-virtualization-tests repository, when posting test execution plan inline review comments using GitHub API, the full test execution plan content must go in the `comments[].body` field (which appears on Files Changed tab), NOT in the top-level `body` field (which appears in PR discussion thread). The top-level `body` field should be omitted or left empty to avoid posting redundant comments in the PR discussion thread.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T15:09:49.597Z
Learning: In the openshift-virtualization-tests repository, when conftest.py or utilities/bitwarden.py changes affect py_config["os_login_param"], smoke test impact must be determined by: (1) finding all smoke tests using `rg "pytest.mark.smoke"`, (2) checking each for VM creation patterns (VirtualMachineForTests, running_vm, VirtualMachineForTestsFromTemplate), (3) tracing whether running_vm is called with default check_ssh_connectivity=True, which accesses vm.login_params property that reads py_config["os_login_param"][vm.os_flavor]. The dependency chain is: smoke test → VM creation → running_vm → wait_for_ssh_connectivity → vm.login_params → os_login_param. Any smoke test creating VMs with SSH connectivity (the default) depends on os_login_param.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3358
File: tests/network/sriov/test_sriov.py:21-21
Timestamp: 2026-01-07T09:52:22.771Z
Learning: For PRs that only remove or modify pytest markers (like removing pytest.mark.post_upgrade) without changing test logic in openshift-virtualization-tests, collection verification using `pytest --collect-only` is sufficient. Full test execution is not required to verify marker-only changes.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 2469
File: utilities/sanity.py:139-142
Timestamp: 2025-11-08T07:36:57.616Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep refactoring PRs (like PR `#2469`) strictly focused on moving/organizing code into more granular modules without adding new functionality, error handling, or behavioral changes. Such improvements should be handled in separate PRs.

Learnt from: EdDev
Repo: RedHatQE/openshift-virtualization-tests PR: 3612
File: tests/network/user_defined_network/ip_specification/test_ip_specification.py:10-10
Timestamp: 2026-01-26T20:29:54.623Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, when introducing STD (Standard Test Definition) PRs that define test case structure without implementation, it's acceptable and intentional to use `__test__ = False` to block pytest collection until the tests are fully implemented in a follow-up PR. The STD workflow involves: (1) defining test structure, markers, and documentation first, (2) implementing the actual test logic and removing the collection blocker in a subsequent PR.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3196
File: tests/network/upgrade/test_upgrade_network.py:4-4
Timestamp: 2025-12-22T16:27:44.327Z
Learning: For PRs that remove test cases (especially redundant test cleanup PRs in openshift-virtualization-tests), test collection verification (pytest --collect-only showing selected/deselected counts) is sufficient to confirm the removal was clean and the test module remains functional. Full test execution is not required for test deletion PRs.

Learnt from: servolkov
Repo: RedHatQE/openshift-virtualization-tests PR: 1776
File: libs/net/node_network.py:25-31
Timestamp: 2025-08-20T23:43:28.117Z
Learning: In the RedHatQE/openshift-virtualization-tests project, servolkov's team always uses bare metal (BM) clusters with IPv4 setup in their testing environment, making defensive checks for IPv4 data presence potentially redundant in their networking code.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3371
File: scripts/tests_analyzer/compare_coderabbit_decisions.py:199-289
Timestamp: 2026-01-13T10:06:14.822Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep pagination loops inline rather than extracting them into generic helper functions when the loops have different URL patterns and unique post-processing logic, as the inline approach improves readability and makes each endpoint's behavior more explicit.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3228
File: .coderabbit.yaml:30-41
Timestamp: 2026-01-05T10:33:55.037Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers minimal pre-merge checks in CodeRabbit configuration: only docstrings enforcement (80% threshold) is needed, not title or description checks.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 1160
File: tests/storage/storage_migration/test_mtc_storage_class_migration.py:165-176
Timestamp: 2025-06-17T07:45:37.776Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers explicit fixture parameters over composite fixtures in test methods, even when there are many parameters, as they find this approach more readable and maintainable for understanding test dependencies.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 3571
File: tests/storage/storage_migration/utils.py:158-167
Timestamp: 2026-01-25T13:18:26.819Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers to avoid nitpicky style changes (e.g., removing `.keys()` from dict membership checks) because verifying every change is expensive. Be cautious about suggesting low-impact stylistic improvements that require verification overhead.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:24-64
Timestamp: 2025-06-23T19:28:20.281Z
Learning: In OpenShift Virtualization mass machine type transition tests, the machine type glob pattern "pc-q35-rhel8.*.*" is intentionally hard-coded in the kubevirt_api_lifecycle_automation_job as it's used only once for this specific test case, with plans to update it in the future if the job needs to support other machine types.

Learnt from: RoniKishner
Repo: RedHatQE/openshift-virtualization-tests PR: 1411
File: utilities/os_utils.py:246-279
Timestamp: 2025-07-22T17:13:59.166Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, CentOS preferences follow the format "centos-stream<version>" (e.g., "centos-stream9", "centos-stream10"). The generate_instance_type_centos_os_matrix function correctly uses regex to extract numeric versions and constructs the latest version string in the same format as the input preferences for proper comparison.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:83-97
Timestamp: 2025-06-23T19:19:31.961Z
Learning: In OpenShift Virtualization mass machine type transition tests, the kubevirt_api_lifecycle_automation_job requires cluster-admin privileges to function properly, as confirmed by the test maintainer akri3i.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-07T14:51:53.484Z
Learning: In the openshift-virtualization-tests repository, the team has decided to avoid using predefined time constants (like TIMEOUT_2MIN, TIMEOUT_5SEC) and prefers using explicit numeric values for timeout parameters.

Learnt from: chandramerla
Repo: RedHatQE/openshift-virtualization-tests PR: 2577
File: tests/virt/node/workload_density/test_free_page_reporting.py:90-90
Timestamp: 2025-11-20T16:27:01.693Z
Learning: The virtio_balloon free page reporting issue on s390x architecture (tracked in OCPBUGS-51113) has been fixed. Free page reporting tests in tests/virt/node/workload_density/test_free_page_reporting.py are now safe to run on s390x clusters.

Learnt from: azhivovk
Repo: RedHatQE/openshift-virtualization-tests PR: 3598
File: tests/network/sriov/conftest.py:62-71
Timestamp: 2026-01-28T17:36:14.188Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, sanity checks validate that the cluster supports at least one IP family (IPv4 or IPv6) before SR-IOV tests run. Therefore, defensive guards checking for empty IP addresses in SR-IOV VM fixtures (like `sriov_vm` in tests/network/sriov/conftest.py) are not necessary—if neither `ipv4_supported_cluster` nor `ipv6_supported_cluster` is True, the cluster is considered broken and will fail sanity validation.

Learnt from: azhivovk
Repo: RedHatQE/openshift-virtualization-tests PR: 3072
File: tests/conftest.py:50-55
Timestamp: 2025-12-17T12:33:03.550Z
Learning: In the RedHatQE/openshift-virtualization-tests repository test environment, there is only one SR-IOV Node Network Policy (SNNP) deployed per cluster after deployment. The sriov_node_policy fixture implementation that retrieves the first policy from the namespace is intentional and correct for this single-policy environment. The team will adapt the test setup if the deployment strategy changes in the future.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/network/sriov/test_sriov.py`:
- Around line 97-110: The except clause is catching the wrong exception so the
diagnostic log never runs; change the handler to catch TimeoutExpiredError from
timeout_sampler (add the import) instead of ValueError, so when TimeoutSampler
in the try block (used with TimeoutSampler and the lambda comparing
restarted_sriov_vm4.vmi.interfaces[1] to vm4_interfaces[1]) times out the except
TimeoutExpiredError block executes and LOGGER.error prints the before/after
interface states before re-raising.

):
if sample:
return
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be TimeoutExpiredError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.
Thank you.

@openshift-virtualization-qe-bot-3
Copy link
Contributor

/retest all

Auto-triggered: Files in this PR were modified by merged PR #3598.

Overlapping files

tests/network/sriov/test_sriov.py

@openshift-virtualization-qe-bot-3
Copy link
Contributor

@coderabbitai

Test execution plan request details

CRITICAL: You MUST post an inline review comment on the first changed line of the first file.
The inline comment should contain the Test Execution Plan with the smoke tests recommendation.
Do NOT submit a formal review - just post the inline comment directly.

As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan.
You will post an inline review comment with the test execution plan on the first changed file.
If you fail to run or post a comment, retry.

Analysis Requirements:

  1. Examine code changes in each modified file
  2. Identify affected code paths, functions, and classes
  3. Analyze pytest-specific elements: fixtures (scope, dependencies), parametrization, markers, conftest changes
  4. Trace test dependencies through imports, shared utilities, and fixture inheritance
  5. Detect new tests introduced in the PR
  6. Smoke Test Impact Analysis: Determine if any changes could affect smoke tests by checking:
    • Changes to files/functions used by tests marked with @pytest.mark.smoke
    • Changes to fixtures or utilities imported by smoke tests
    • Changes to conftest.py files that may affect smoke test execution
    • Changes to core infrastructure code (utilities/, libs/) that smoke tests depend on

Your deliverable:
Your change request comment will be based on the following requirements:

Test Execution Plan

  • Run smoke tests: True / False (Based on your smoke test impact analysis - True if any changes may affect smoke tests)
  • path/to/test_file.py - When the entire test file needs verification
  • path/to/test_file.py::TestClass::test_method - When specific test(s) needed
  • path/to/test_file.py::test_function - When specific test(s) needed
  • -m marker - When specific marker(s) can be used to cover multiple cases.

Guidelines:

  • Include only tests directly affected by the changes
  • Use a full file path only if ALL tests in that file require verification
  • Use file path + test name if only specific tests are needed
  • If a test marker can cover multiple files/tests, provide the marker
  • Balance coverage vs over-testing - Keep descriptions minimal

Hardware-Related Checks (SR-IOV, GPU, DPDK):

When PR modifies fixtures for hardware-specific resources:

  • Collection Safety: Fixtures MUST have existence checks (return None when hardware unavailable)
  • Test Plan: MUST verify both WITH and WITHOUT hardware:
    • Run affected tests on cluster WITH hardware
    • Verify collection succeeds on cluster WITHOUT hardware

CRITICAL WORKFLOW COMPLETION RULES:

When responding to this test execution plan request, you MUST follow these rules EXACTLY:

  1. YOUR ONLY DELIVERABLE: Post an inline review comment containing the test execution plan on the first changed line
  2. THEN STOP IMMEDIATELY - Do NOT generate any additional response
  3. FALLBACK ONLY: If submitting the review fails after retrying, post as a regular PR comment
  4. SILENCE = SUCCESS: After successfully submitting the review, your task is complete. No confirmation needed.

ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):

  • ❌ Do NOT post acknowledgment messages like "Test execution plan posted", "Review posted successfully", "I've successfully posted"
  • ❌ Do NOT mention review IDs, URLs, or confirmation of posting in the PR thread
  • ❌ Do NOT add any follow-up comments after submitting the review
  • ❌ Do NOT reply to confirm task completion
  • ❌ Do NOT explain what you did - just do it and stop

Remember: The pull request review is visible to users. Additional comments are redundant noise.

CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to:
- Trace the actual fixture dependency chain from smoke tests to changed fixtures
- Verify that smoke tests actually import/use changed utilities or functions
- Confirm the dependency path exists; do NOT assume based on scope or semantics
- Be conservative: Session-scoped fixtures or infrastructure-sounding names do NOT
automatically mean smoke test impact. Only flag smoke test impact when you can
demonstrate a concrete dependency path.
WRONG: "This session-scoped storage fixture might affect smoke tests"
RIGHT: "Smoke test X uses fixture Y, which depends on the changed fixture Z"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Getting PR diff to identify first changed line ==="
gh pr diff 3928 --repo RedHatQE/openshift-virtualization-tests

echo ""
echo "=== Checking for smoke tests that might depend on SR-IOV code ==="
rg "pytest.mark.smoke" --type=py -l | head -10

echo ""
echo "=== Verify no smoke tests import from tests/network/sriov ==="
rg "pytest.mark.smoke" --type=py -l | xargs grep -l "from tests.network.sriov\|import tests.network.sriov" 2>/dev/null || echo "No smoke tests import SR-IOV test code"

echo ""
echo "=== Get exact line numbers from GitHub API ==="
gh api /repos/RedHatQE/openshift-virtualization-tests/pulls/3928/files --jq '.[] | select(.filename == "tests/network/sriov/test_sriov.py") | {filename: .filename, patch: .patch}' | head -30

Length of output: 4076


🏁 Script executed:

#!/bin/bash

# Post inline review comment on first changed line (line 1 - import logging)
gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/RedHatQE/openshift-virtualization-tests/pulls/3928/reviews \
  -f event='COMMENT' \
  -f body='' \
  -f comments[][path]='tests/network/sriov/test_sriov.py' \
  -F comments[][line]=1 \
  -f comments[][side]='RIGHT' \
  -f comments[][body]='## Test Execution Plan

**Run smoke tests: False**

**Reason:** Change is isolated to a single SR-IOV test implementation. No smoke tests depend on SR-IOV test code or the modified functions. This is a test reliability fix adding polling for interface status updates.

**Tests to Execute:**

- `tests/network/sriov/test_sriov.py::test_sriov_interfaces_post_reboot` - Direct test modification with TimeoutSampler polling mechanism

**Verification Strategy:**

Since this PR fixes test flakiness by adding a grace period for interface status updates:

1. **Run the modified test multiple times** (5-10 iterations) on a cluster WITH SR-IOV hardware to verify:
   - Test passes consistently with the new polling mechanism
   - Timeout handling works correctly when interface status is delayed
   - Error logging provides useful information on mismatch

2. **Hardware-Related Check:** Verify test collection succeeds on clusters WITHOUT SR-IOV hardware (test should be skipped gracefully via fixtures)

**Changed Files:**
- `tests/network/sriov/test_sriov.py` - Added TimeoutSampler polling (120s timeout, 5s sleep) with detailed error logging for post-reboot interface validation'

Length of output: 1735



🧠 Learnings used
Learnt from: RoniKishner
Repo: RedHatQE/openshift-virtualization-tests PR: 1411
File: utilities/os_utils.py:246-279
Timestamp: 2025-07-22T17:13:59.166Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, CentOS preferences follow the format "centos-stream<version>" (e.g., "centos-stream9", "centos-stream10"). The generate_instance_type_centos_os_matrix function correctly uses regex to extract numeric versions and constructs the latest version string in the same format as the input preferences for proper comparison.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T20:11:03.645Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for changes affecting py_config["os_login_param"], follow this verification methodology: (1) Find all smoke tests: `rg "pytest.mark.smoke" --type=py -B2 | grep "def test_"`, (2) For each smoke test file, search for VM creation patterns: `rg "VirtualMachineForTests|running_vm|VirtualMachineForTestsFromTemplate|wait_for_ssh|check_ssh_connectivity"`, (3) Trace the dependency chain: smoke test → VirtualMachineForTests/running_vm() → wait_for_ssh_connectivity() (default enabled) → vm.login_params property → py_config["os_login_param"][vm.os_flavor], (4) Check utilities/virt.py for login_params usage: `rg "os_login_param|login_params" utilities/virt.py -C3`. Any smoke test creating VMs with default SSH connectivity checks (running_vm with check_ssh_connectivity=True) depends on os_login_param, even if the test doesn't directly reference it.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 3571
File: tests/storage/storage_migration/utils.py:158-167
Timestamp: 2026-01-25T13:18:26.819Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers to avoid nitpicky style changes (e.g., removing `.keys()` from dict membership checks) because verifying every change is expensive. Be cautious about suggesting low-impact stylistic improvements that require verification overhead.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:24-64
Timestamp: 2025-06-23T19:28:20.281Z
Learning: In OpenShift Virtualization mass machine type transition tests, the machine type glob pattern "pc-q35-rhel8.*.*" is intentionally hard-coded in the kubevirt_api_lifecycle_automation_job as it's used only once for this specific test case, with plans to update it in the future if the job needs to support other machine types.

Learnt from: azhivovk
Repo: RedHatQE/openshift-virtualization-tests PR: 3598
File: tests/network/sriov/conftest.py:62-71
Timestamp: 2026-01-28T17:36:14.188Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, sanity checks validate that the cluster supports at least one IP family (IPv4 or IPv6) before SR-IOV tests run. Therefore, defensive guards checking for empty IP addresses in SR-IOV VM fixtures (like `sriov_vm` in tests/network/sriov/conftest.py) are not necessary—if neither `ipv4_supported_cluster` nor `ipv6_supported_cluster` is True, the cluster is considered broken and will fail sanity validation.

Learnt from: servolkov
Repo: RedHatQE/openshift-virtualization-tests PR: 1776
File: libs/net/node_network.py:25-31
Timestamp: 2025-08-20T23:43:28.117Z
Learning: In the RedHatQE/openshift-virtualization-tests project, servolkov's team always uses bare metal (BM) clusters with IPv4 setup in their testing environment, making defensive checks for IPv4 data presence potentially redundant in their networking code.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 2469
File: utilities/sanity.py:139-142
Timestamp: 2025-11-08T07:36:57.616Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep refactoring PRs (like PR `#2469`) strictly focused on moving/organizing code into more granular modules without adding new functionality, error handling, or behavioral changes. Such improvements should be handled in separate PRs.

Learnt from: rlobillo
Repo: RedHatQE/openshift-virtualization-tests PR: 1984
File: tests/install_upgrade_operators/network_policy/test_network_policy_components.py:182-187
Timestamp: 2025-09-12T14:14:28.329Z
Learning: Network policy tests in the RedHatQE/openshift-virtualization-tests repository don't require unique resource names as there are no parallel runs on the same cluster in their test environment.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3371
File: scripts/tests_analyzer/compare_coderabbit_decisions.py:199-289
Timestamp: 2026-01-13T10:06:14.822Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep pagination loops inline rather than extracting them into generic helper functions when the loops have different URL patterns and unique post-processing logic, as the inline approach improves readability and makes each endpoint's behavior more explicit.

Learnt from: aditi-sharma-1
Repo: RedHatQE/openshift-virtualization-tests PR: 3762
File: tests/global_config_s390x.py:32-42
Timestamp: 2026-02-09T09:29:22.246Z
Learning: On s390x clusters, VirtualMachineClusterPreference availability varies by RHEL version: RHEL 8 uses the generic "rhel.8" preference (no rhel.8.s390x preference exists), while RHEL 9 and RHEL 10 have architecture-specific preferences "rhel.9.s390x" and "rhel.10.s390x". In tests/global_config_s390x.py, generate_linux_instance_type_os_matrix calls should omit arch_suffix for RHEL 8 but include arch_suffix=S390X for RHEL 9 and RHEL 10.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3228
File: .coderabbit.yaml:30-41
Timestamp: 2026-01-05T10:33:55.037Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers minimal pre-merge checks in CodeRabbit configuration: only docstrings enforcement (80% threshold) is needed, not title or description checks.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:00:59.076Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, CodeRabbit must post ONLY an inline review comment on the Files Changed tab and then stop immediately without generating any follow-up comments in the PR discussion thread. No acknowledgment messages, no confirmation of posting, no explanation - silence after posting the inline review equals success. Additional comments create empty/meaningless reviews that clutter the PR.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3358
File: tests/network/sriov/test_sriov.py:21-21
Timestamp: 2026-01-07T09:52:22.771Z
Learning: For PRs that only remove or modify pytest markers (like removing pytest.mark.post_upgrade) without changing test logic in openshift-virtualization-tests, collection verification using `pytest --collect-only` is sufficient. Full test execution is not required to verify marker-only changes.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-22T15:56:00.157Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, do NOT use "REQUEST_CHANGES" review type if the PR author has already marked the PR as verified (e.g., with `/verified` command). Test execution plans are informational guides, not blocking requirements. Use COMMENT event for informational test plans, or only REQUEST_CHANGES if there are actual code issues that need to be addressed before merging.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T10:28:54.212Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for test execution plans, only set "Run smoke tests: True" if there is a verified, traceable dependency path from smoke tests to the changed code. Session-scoped fixtures or infrastructure-sounding changes do NOT automatically imply smoke test impact - the dependency chain must be explicitly verified using shell scripts before recommending smoke test execution.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3196
File: tests/network/upgrade/test_upgrade_network.py:4-4
Timestamp: 2025-12-22T16:27:44.327Z
Learning: For PRs that remove test cases (especially redundant test cleanup PRs in openshift-virtualization-tests), test collection verification (pytest --collect-only showing selected/deselected counts) is sufficient to confirm the removal was clean and the test module remains functional. Full test execution is not required for test deletion PRs.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: For PR `#1904` test execution, the critical validation point is test_connectivity_over_migration_between_localnet_vms which should fail gracefully on cloud clusters but pass on bare-metal/PSI clusters, representing the core nmstate conditional logic functionality.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T15:09:49.597Z
Learning: In the openshift-virtualization-tests repository, when conftest.py or utilities/bitwarden.py changes affect py_config["os_login_param"], smoke test impact must be determined by: (1) finding all smoke tests using `rg "pytest.mark.smoke"`, (2) checking each for VM creation patterns (VirtualMachineForTests, running_vm, VirtualMachineForTestsFromTemplate), (3) tracing whether running_vm is called with default check_ssh_connectivity=True, which accesses vm.login_params property that reads py_config["os_login_param"][vm.os_flavor]. The dependency chain is: smoke test → VM creation → running_vm → wait_for_ssh_connectivity → vm.login_params → os_login_param. Any smoke test creating VMs with SSH connectivity (the default) depends on os_login_param.

Learnt from: azhivovk
Repo: RedHatQE/openshift-virtualization-tests PR: 2119
File: tests/network/localnet/conftest.py:372-387
Timestamp: 2025-11-26T16:03:07.813Z
Learning: In the openshift-virtualization-tests repository, pytest fixtures that declare other fixtures as parameters purely for dependency ordering (without referencing them in the function body) should not be modified to silence Ruff ARG001 warnings. This is an idiomatic pytest pattern for ensuring setup order, and the team prefers to leave such fixtures unchanged rather than adding defensive comments or code to suppress linter warnings.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:06:22.391Z
Learning: In the openshift-virtualization-tests repository, when posting test execution plan inline review comments using GitHub API, the full test execution plan content must go in the `comments[].body` field (which appears on Files Changed tab), NOT in the top-level `body` field (which appears in PR discussion thread). The top-level `body` field should be omitted or left empty to avoid posting redundant comments in the PR discussion thread.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: The test execution plan for PR `#1904` focuses on cluster-type conditional logic where nmstate functionality is bypassed on cloud clusters (Azure/AWS) but fully functional on bare-metal/PSI clusters, requiring different test strategies for each environment type.

Learnt from: EdDev
Repo: RedHatQE/openshift-virtualization-tests PR: 3612
File: tests/network/user_defined_network/ip_specification/test_ip_specification.py:10-10
Timestamp: 2026-01-26T20:29:54.623Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, when introducing STD (Standard Test Definition) PRs that define test case structure without implementation, it's acceptable and intentional to use `__test__ = False` to block pytest collection until the tests are fully implemented in a follow-up PR. The STD workflow involves: (1) defining test structure, markers, and documentation first, (2) implementing the actual test logic and removing the collection blocker in a subsequent PR.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 1160
File: tests/storage/storage_migration/test_mtc_storage_class_migration.py:165-176
Timestamp: 2025-06-17T07:45:37.776Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers explicit fixture parameters over composite fixtures in test methods, even when there are many parameters, as they find this approach more readable and maintainable for understanding test dependencies.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:83-97
Timestamp: 2025-06-23T19:19:31.961Z
Learning: In OpenShift Virtualization mass machine type transition tests, the kubevirt_api_lifecycle_automation_job requires cluster-admin privileges to function properly, as confirmed by the test maintainer akri3i.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-07T14:51:53.484Z
Learning: In the openshift-virtualization-tests repository, the team has decided to avoid using predefined time constants (like TIMEOUT_2MIN, TIMEOUT_5SEC) and prefers using explicit numeric values for timeout parameters.

Learnt from: azhivovk
Repo: RedHatQE/openshift-virtualization-tests PR: 3072
File: tests/conftest.py:50-55
Timestamp: 2025-12-17T12:33:03.550Z
Learning: In the RedHatQE/openshift-virtualization-tests repository test environment, there is only one SR-IOV Node Network Policy (SNNP) deployed per cluster after deployment. The sriov_node_policy fixture implementation that retrieves the first policy from the namespace is intentional and correct for this single-policy environment. The team will adapt the test setup if the deployment strategy changes in the future.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR `#2199` depends on PR `#2139` which adds architecture-specific OS_FLAVOR attributes to the Images.Cirros class (OS_FLAVOR_CIRROS for x86_64/ARM64, OS_FLAVOR_FEDORA for s390x), enabling conditional logic based on the underlying OS flavor in tests.

Learnt from: EdDev
Repo: RedHatQE/openshift-virtualization-tests PR: 3930
File: tests/network/vmi/libippersistence.py:79-88
Timestamp: 2026-02-23T11:38:56.512Z
Learning: In tests/network/vmi/libippersistence.py, the monitor_vmi_interfaces helper function should accept an optional expected_interface_count parameter (default=2) to allow tests with different NIC configurations to reuse the monitoring logic without hardcoding the interface count.

Learnt from: EdDev
Repo: RedHatQE/openshift-virtualization-tests PR: 3930
File: tests/network/vmi/libippersistence.py:79-88
Timestamp: 2026-02-23T11:38:56.512Z
Learning: In tests/network/vmi/libippersistence.py, the monitor_vmi_interfaces helper function should accept an optional expected_interface_count parameter (default=2) to allow tests with different NIC configurations to reuse the monitoring logic without hardcoding the interface count.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR `#2199` depends on PR `#2139` which adds the OS_FLAVOR attribute to the Images.Cirros class, making Images.Cirros.OS_FLAVOR available for conditional logic in tests.

Copy link
Contributor

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The 2nd commit is fixing the 1st, please squash.
  • @azhivovk , please re-review this again.

We observed flakiness of test_sriov_interfaces_post_reboot, where sometimes the interface status report
(post reboot) is not yet fully updated, which is accepted and shouldn't fail the test immediately.
Therefore the check is now given a grace period to allow completion of the interface status, before
checking it.
# Grace the check with a timeout rather than immediately declaring failure. The reason is that although
# the interface already appears in the VMI status, it's not immediately fully updated with all the
# required fields.
try:
Copy link
Contributor

@servolkov servolkov Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, consider using a helper with retry decorator and then more clear test structure:

The helper (precisely and roughly, use better naming if you need):

@retry(wait_timeout=120, sleep=5)
def _check_interface_match(vm, expected_iface):
    current_iface = vm.vmi.interfaces[1]
    return current_iface if current_iface == expected_iface else None

def check_interface_match(vm, expected_iface):
    try:
        return _check_interface_match(vm, expected)
    except TimeoutExpiredError:
        "add any logs if needed"
        return None

and the test will be smth like:

def test_sriov_interfaces_post_reboot(
    self,
    vm4_interfaces,
    restarted_sriov_vm4,
):
    actual_iface = check_interface_match(restarted_sriov_vm4, vm4_interfaces[1])
    assert actual_iface == vm4_interfaces[1]

It brings readability and avoids complicated code inside the test.

BTW, we already have lookup_iface_status, probably, it is even better to use it, but the logic is same.

for sample in TimeoutSampler(
wait_timeout=120,
sleep=5,
func=lambda: restarted_sriov_vm4.vmi.interfaces[1] == vm4_interfaces[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edi suggested to use lookup_iface_status to wait for the interfaces and then we can drop timeout sampler

# required fields.
try:
for sample in TimeoutSampler(
wait_timeout=120,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 120s timeout seems high for this case. It was flaky before this addition and going from 0 to 120 seems like a lot of additional wait in case this will fail.
By the time the test runs, restarted_sriov_vm4 already waited for guest agent + all interfaces via wait_for_vm_interfaces. We're only waiting for remaining fields (IP, MAC) to settle, and for that, the codebase typically uses 30-60s (e.g. IP change after migration: 60s, interface status lookup: 30s). I'd suggest we reduce this to 60s.

Copy link
Contributor

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 3 points to feedback on this:

  1. The flakiness of the test is not limited to the check post restart. The original interface information is by definition also flake in nature, as it does not wait for specifics. E.g. if an IP is expected, then it should explicitly wait for it and not assume it is there, otherwise in some cases you will see it in the initial collection and sometime not.
  2. We already have good alternatives to wait_for_vm_interfaces and custom made retries. Inventing new ones is not recommended.
  3. Waiting in several locations (fixture and in the test body) for different reasons needs a good reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants