Skip to content

Comments

[VIRT] Collect VNC screenshot on post-migration verification timeout#3953

Open
dshchedr wants to merge 2 commits intoRedHatQE:mainfrom
dshchedr:add-vnc-screenshot-migration-verify
Open

[VIRT] Collect VNC screenshot on post-migration verification timeout#3953
dshchedr wants to merge 2 commits intoRedHatQE:mainfrom
dshchedr:add-vnc-screenshot-migration-verify

Conversation

@dshchedr
Copy link
Collaborator

@dshchedr dshchedr commented Feb 24, 2026

Add VNC screenshot collection when VM non-responsive after migration.

Signed-off-by: dshchedr@redhat.com
Assisted-by: Claude

Short description:
More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:

Summary by CodeRabbit

  • Bug Fixes
    • Improved VM migration timeout handling: related waits are now handled together so that if a timeout occurs an error is logged and a VNC snapshot is captured to aid troubleshooting; the error is then propagated.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

The change wraps the two VM-post-migration waits inside a single try block in verify_vm_migrated. On TimeoutExpiredError it logs an error, captures a VNC screenshot for the VM, and re-raises the exception for upstream handling.

Changes

Cohort / File(s) Summary
Error handling & diagnostics
utilities/virt.py
Combined wait_for_vm_interfaces and wait_for_ssh_connectivity into one try block; on TimeoutExpiredError log an error, capture a VNC screenshot for the VM, then re-raise the exception. No public API changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete. While it has a brief opening line and sign-off, all required template sections (Short description, More details, What this PR does, Issue fixes, Special notes, Jira ticket) are empty or only contain template comments. Fill in all required template sections: expand short description, add technical details, explain the bug/feature being addressed, reference any related issues, note reviewer considerations, and provide Jira ticket URL or write NONE.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: collecting VNC screenshots when VM verification times out after migration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest verify-bugs-are-open - verify-bugs-are-open
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No WIP, hold, conflict labels
  5. Verified: PR must be marked as verified (if verification is enabled)

📊 Review Process

Approvers and Reviewers

Approvers:

  • dshchedr
  • myakove
  • rnetser
  • vsibirsk

Reviewers:

  • RoniKishner
  • dshchedr
  • geetikakay
  • rnetser
  • vsibirsk
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is automatically removed on each new commit
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utilities/virt.py (1)

1886-1907: 🛠️ Refactor suggestion | 🟠 Major

MEDIUM: verify_vm_migrated lacks a Google-format docstring despite new observable side effects

Per coding guidelines: "Google-format docstrings REQUIRED for all public functions with non-obvious return values OR side effects." This PR adds a new side effect (VNC screenshot collection on timeout) to a public utility function that previously had none. The function still has no docstring at all.

📝 Proposed docstring
 def verify_vm_migrated(
     vm,
     node_before,
     wait_for_interfaces=True,
     check_ssh_connectivity=False,
 ):
+    """
+    Verify that a VM has successfully migrated to a new node.
+
+    Asserts the VMI is running on a different node and that its migration state
+    is marked completed. Optionally waits for guest-agent network interfaces and
+    SSH connectivity. On timeout, collects a VNC screenshot for debugging before
+    re-raising the exception.
+
+    Args:
+        vm: VM object whose VMI was migrated.
+        node_before: Node object the VMI was running on before migration.
+        wait_for_interfaces (bool): Wait for guest-agent to report network interfaces.
+        check_ssh_connectivity (bool): Verify SSH is reachable after migration.
+
+    Raises:
+        AssertionError: If the VMI is still on the original node or migration is incomplete.
+        TimeoutExpiredError: If interfaces or SSH connectivity wait times out.
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utilities/virt.py` around lines 1886 - 1907, Add a Google-style docstring to
the public function verify_vm_migrated explaining its parameters (vm,
node_before, wait_for_interfaces, check_ssh_connectivity), behavior (asserts
migration completed and that VMI moved off node_before, optionally waits for
interfaces via wait_for_vm_interfaces and SSH via wait_for_ssh_connectivity),
return value (None), raised exceptions (AssertionError, TimeoutExpiredError),
and observable side effects including that on TimeoutExpiredError it invokes
collect_vnc_screenshot_for_vms to capture VNC screenshots for debugging; mention
any preconditions (vm.vmi and vm.vmi.instance expected) and that the function
relies on migrationState.completed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@utilities/virt.py`:
- Around line 1905-1907: Add a LOGGER.error call that logs the
TimeoutExpiredError with context (vm.name, vm.namespace) before re-raising, and
protect the diagnostic call so it cannot mask the original exception: in the
except TimeoutExpiredError block, first call LOGGER.error(...) about the
timeout, then call collect_vnc_screenshot_for_vms(...) inside its own try/except
Exception that logs any failure (also with vm context) but does not raise, and
finally re-raise the original TimeoutExpiredError; reference
TimeoutExpiredError, collect_vnc_screenshot_for_vms, and LOGGER.error to locate
the change.

---

Outside diff comments:
In `@utilities/virt.py`:
- Around line 1886-1907: Add a Google-style docstring to the public function
verify_vm_migrated explaining its parameters (vm, node_before,
wait_for_interfaces, check_ssh_connectivity), behavior (asserts migration
completed and that VMI moved off node_before, optionally waits for interfaces
via wait_for_vm_interfaces and SSH via wait_for_ssh_connectivity), return value
(None), raised exceptions (AssertionError, TimeoutExpiredError), and observable
side effects including that on TimeoutExpiredError it invokes
collect_vnc_screenshot_for_vms to capture VNC screenshots for debugging; mention
any preconditions (vm.vmi and vm.vmi.instance expected) and that the function
relies on migrationState.completed.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a83fef2 and 9b66ca1.

📒 Files selected for processing (1)
  • utilities/virt.py

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.56%. Comparing base (84584c3) to head (7ed9447).
⚠️ Report is 32 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3953   +/-   ##
=======================================
  Coverage   98.56%   98.56%           
=======================================
  Files          25       25           
  Lines        2297     2307   +10     
=======================================
+ Hits         2264     2274   +10     
  Misses         33       33           
Flag Coverage Δ
utilities 98.56% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
utilities/virt.py (1)

1905-1908: ⚠️ Potential issue | 🟠 Major

HIGH: collect_vnc_screenshot_for_vms can mask the root-cause TimeoutExpiredError.

collect_vnc_screenshot_for_vms calls run_virtctl_command unconditionally (see utilities/data_collector.py lines 113–118). If the virtctl call fails — VNC server unavailable, base directory unwritable, pod unreachable — Python replaces the in-flight TimeoutExpiredError with the new exception before the raise on line 1908 executes. The CI failure message then points to the screenshot error, not the migration-verification timeout, making diagnosis significantly harder.

The LOGGER.error (line 1906) was a good fix from the previous review round. The remaining unresolved part is protecting the diagnostic side-effect so it can never suppress the original exception.

🛡️ Proposed fix — guard the screenshot call
     except TimeoutExpiredError:
         LOGGER.error(f"VM {vm.name} unresponsive after migration; getting VNC screenshot")
-        collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace)
+        try:
+            collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace)
+        except Exception as screenshot_error:
+            LOGGER.error(f"Failed to collect VNC screenshot for VM {vm.name}: {screenshot_error}")
         raise

The inner except Exception is intentional here: collect_vnc_screenshot_for_vms is a diagnostic side-effect whose failure must never replace the root-cause exception. Based on learnings, the established repo pattern is LOGGER.error before re-raising (servolkov/PR3387 learning).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utilities/virt.py` around lines 1905 - 1908, The TimeoutExpiredError handler
currently calls collect_vnc_screenshot_for_vms directly which can raise its own
exceptions (from run_virtctl_command) and mask the original TimeoutExpiredError;
wrap the call to collect_vnc_screenshot_for_vms(vm_name=vm.name,
vm_namespace=vm.namespace) in its own try/except Exception that logs the
diagnostic failure via LOGGER.error (including the exception) but does not
suppress or replace the original TimeoutExpiredError, then re-raise the original
TimeoutExpiredError as before (i.e., keep the existing LOGGER.error("VM ...
unresponsive...") and ensure only diagnostic errors are caught and logged
locally).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@utilities/virt.py`:
- Around line 1905-1908: The TimeoutExpiredError handler currently calls
collect_vnc_screenshot_for_vms directly which can raise its own exceptions (from
run_virtctl_command) and mask the original TimeoutExpiredError; wrap the call to
collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace) in
its own try/except Exception that logs the diagnostic failure via LOGGER.error
(including the exception) but does not suppress or replace the original
TimeoutExpiredError, then re-raise the original TimeoutExpiredError as before
(i.e., keep the existing LOGGER.error("VM ... unresponsive...") and ensure only
diagnostic errors are caught and logged locally).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b66ca1 and fa72b48.

📒 Files selected for processing (1)
  • utilities/virt.py

@rnetser
Copy link
Collaborator

rnetser commented Feb 24, 2026

/approve
/lgtm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
utilities/virt.py (1)

1892-1901: ⚠️ Potential issue | 🟠 Major

HIGH: Screenshot collection can hide the real timeout

If collect_vnc_screenshot_for_vms() throws, the original TimeoutExpiredError never re-raises, so CI reports the screenshot failure instead of the real post‑migration timeout. Guard the diagnostic call with a narrow exception list, log the failure, and then re‑raise the original timeout so root cause stays visible.

✅ Proposed fix (guard diagnostic side‑effect)
     except TimeoutExpiredError:
         LOGGER.error(f"VM {vm.name} unresponsive after migration; getting VNC screenshot")
-        collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace)
+        try:
+            collect_vnc_screenshot_for_vms(vm_name=vm.name, vm_namespace=vm.namespace)
+        except (CommandExecFailed, OSError) as screenshot_error:
+            LOGGER.error(
+                f"Failed to collect VNC screenshot for VM {vm.name} in {vm.namespace}: {screenshot_error}"
+            )
         raise

As per coding guidelines: Exception handling: Do not catch bare Exception - catch specific exception types only; Exception handling: NEVER silently swallow exceptions - at minimum, log the error before continuing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@utilities/virt.py` around lines 1892 - 1901, The TimeoutExpiredError handler
currently calls collect_vnc_screenshot_for_vms() directly which can raise and
obscure the original TimeoutExpiredError; change the except TimeoutExpiredError
block to capture the original exception (e.g., orig_exc = sys.exc_info()[1]),
then call collect_vnc_screenshot_for_vms() inside its own narrow try/except
(catch diagnostic-specific exceptions or at minimum Exception as diag_err), log
the diagnostic failure via LOGGER.exception or LOGGER.error including diag_err,
and finally re-raise the original TimeoutExpiredError (raise orig_exc) so the
root timeout remains visible; references: wait_for_vm_interfaces,
wait_for_ssh_connectivity, collect_vnc_screenshot_for_vms, TimeoutExpiredError,
LOGGER.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@utilities/virt.py`:
- Around line 1892-1901: The TimeoutExpiredError handler currently calls
collect_vnc_screenshot_for_vms() directly which can raise and obscure the
original TimeoutExpiredError; change the except TimeoutExpiredError block to
capture the original exception (e.g., orig_exc = sys.exc_info()[1]), then call
collect_vnc_screenshot_for_vms() inside its own narrow try/except (catch
diagnostic-specific exceptions or at minimum Exception as diag_err), log the
diagnostic failure via LOGGER.exception or LOGGER.error including diag_err, and
finally re-raise the original TimeoutExpiredError (raise orig_exc) so the root
timeout remains visible; references: wait_for_vm_interfaces,
wait_for_ssh_connectivity, collect_vnc_screenshot_for_vms, TimeoutExpiredError,
LOGGER.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa72b48 and 7ed9447.

📒 Files selected for processing (1)
  • utilities/virt.py

@dshchedr
Copy link
Collaborator Author

/retest build-container

@dshchedr
Copy link
Collaborator Author

/build-and-push-container

@openshift-virtualization-qe-bot-5

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-3953 published

@dshchedr
Copy link
Collaborator Author

/verified

smoke tests green
also checked locally that screenshot successfully collected when test failed

@openshift-virtualization-qe-bot-3
Copy link
Contributor

@coderabbitai

Test execution plan request details

CRITICAL: You MUST post an inline review comment on the first changed line of the first file.
The inline comment should contain the Test Execution Plan with the smoke tests recommendation.
Do NOT submit a formal review - just post the inline comment directly.

As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan.
You will post an inline review comment with the test execution plan on the first changed file.
If you fail to run or post a comment, retry.

Analysis Requirements:

  1. Examine code changes in each modified file
  2. Identify affected code paths, functions, and classes
  3. Analyze pytest-specific elements: fixtures (scope, dependencies), parametrization, markers, conftest changes
  4. Trace test dependencies through imports, shared utilities, and fixture inheritance
  5. Detect new tests introduced in the PR
  6. Smoke Test Impact Analysis: Determine if any changes could affect smoke tests by checking:
    • Changes to files/functions used by tests marked with @pytest.mark.smoke
    • Changes to fixtures or utilities imported by smoke tests
    • Changes to conftest.py files that may affect smoke test execution
    • Changes to core infrastructure code (utilities/, libs/) that smoke tests depend on

Your deliverable:
Your change request comment will be based on the following requirements:

Test Execution Plan

  • Run smoke tests: True / False (Based on your smoke test impact analysis - True if any changes may affect smoke tests)
  • path/to/test_file.py - When the entire test file needs verification
  • path/to/test_file.py::TestClass::test_method - When specific test(s) needed
  • path/to/test_file.py::test_function - When specific test(s) needed
  • -m marker - When specific marker(s) can be used to cover multiple cases.

Guidelines:

  • Include only tests directly affected by the changes
  • Use a full file path only if ALL tests in that file require verification
  • Use file path + test name if only specific tests are needed
  • If a test marker can cover multiple files/tests, provide the marker
  • Balance coverage vs over-testing - Keep descriptions minimal

Hardware-Related Checks (SR-IOV, GPU, DPDK):

When PR modifies fixtures for hardware-specific resources:

  • Collection Safety: Fixtures MUST have existence checks (return None when hardware unavailable)
  • Test Plan: MUST verify both WITH and WITHOUT hardware:
    • Run affected tests on cluster WITH hardware
    • Verify collection succeeds on cluster WITHOUT hardware

CRITICAL WORKFLOW COMPLETION RULES:

When responding to this test execution plan request, you MUST follow these rules EXACTLY:

  1. YOUR ONLY DELIVERABLE: Post an inline review comment containing the test execution plan on the first changed line
  2. THEN STOP IMMEDIATELY - Do NOT generate any additional response
  3. FALLBACK ONLY: If submitting the review fails after retrying, post as a regular PR comment
  4. SILENCE = SUCCESS: After successfully submitting the review, your task is complete. No confirmation needed.

ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):

  • ❌ Do NOT post acknowledgment messages like "Test execution plan posted", "Review posted successfully", "I've successfully posted"
  • ❌ Do NOT mention review IDs, URLs, or confirmation of posting in the PR thread
  • ❌ Do NOT add any follow-up comments after submitting the review
  • ❌ Do NOT reply to confirm task completion
  • ❌ Do NOT explain what you did - just do it and stop

Remember: The pull request review is visible to users. Additional comments are redundant noise.

CRITICAL: Before setting "Run smoke tests: True", you MUST use shell scripts to:
- Trace the actual fixture dependency chain from smoke tests to changed fixtures
- Verify that smoke tests actually import/use changed utilities or functions
- Confirm the dependency path exists; do NOT assume based on scope or semantics
- Be conservative: Session-scoped fixtures or infrastructure-sounding names do NOT
automatically mean smoke test impact. Only flag smoke test impact when you can
demonstrate a concrete dependency path.
WRONG: "This session-scoped storage fixture might affect smoke tests"
RIGHT: "Smoke test X uses fixture Y, which depends on the changed fixture Z"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Get current PR file changes ==="
gh pr view 3953 --json files,headRefOid --jq '{commit: .headRefOid, files: .files[] | {path: .path, additions: .additions, deletions: .deletions}}'

echo -e "\n=== Get current diff for utilities/virt.py ==="
gh pr diff 3953 -- utilities/virt.py | head -50

echo -e "\n=== Find smoke tests that use migration ==="
rg "@pytest\.mark\.smoke" --type=py -B2 -A5 | grep -E "(def test_.*migrat|@pytest\.mark\.smoke)" | head -30

echo -e "\n=== Verify migrate_vm_and_verify dependency ==="
rg "def migrate_vm_and_verify" utilities/virt.py -A20 | grep "verify_vm_migrated"

echo -e "\n=== Check RHEL smoke test for migration usage ==="
rg "migrate_vm_and_verify|verify_vm_migrated" tests/virt/cluster/common_templates/rhel/test_rhel_os_support.py -C2

Length of output: 2803


🏁 Script executed:

#!/bin/bash

echo "=== Get raw file content to see changes ==="
gh api /repos/RedHatQE/openshift-virtualization-tests/pulls/3953/files | jq -r '.[] | select(.filename == "utilities/virt.py") | .patch' | head -40

echo -e "\n=== Check if migrate_vm_and_verify calls verify_vm_migrated ==="
rg "def migrate_vm_and_verify" utilities/virt.py -A25

echo -e "\n=== Find all migration test files ==="
fd "migration" tests/ --type=f -e py | grep test_ | head -15

echo -e "\n=== Check if scale tests use verify_vm_migrated ==="
rg "verify_vm_migrated" tests/scale/ -C2

Length of output: 3829


🏁 Script executed:

#!/bin/bash

echo "=== Post inline review comment with test execution plan ==="
gh api \
  --method POST \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/RedHatQE/openshift-virtualization-tests/pulls/3953/reviews \
  -f event='COMMENT' \
  -f commit_id='7ed94479ceb2caa32cac0b7fe7a36515b44e158f' \
  -f body='' \
  -f 'comments[][path]=utilities/virt.py' \
  -F 'comments[][position]=4' \
  -f 'comments[][body]=**Test Execution Plan**

**Run smoke tests: True**

Smoke test impact confirmed through dependency chain:
- `tests/virt/cluster/common_templates/rhel/test_rhel_os_support.py::TestRhelOsSupport::test_migrate_vm` (smoke test) → `migrate_vm_and_verify()` → `verify_vm_migrated()` (modified function)

**Migration-related tests:**
- `tests/virt/node/migration_and_maintenance/test_vm_disk_load_with_migration.py`
- `tests/virt/node/migration_and_maintenance/test_vm_memory_load_with_migration.py`
- `tests/virt/node/migration_and_maintenance/test_post_copy_migration.py`
- `tests/network/migration/test_migration.py`
- `tests/network/migration/test_masquerade_connectivity_after_migration.py`
- `tests/virt/cluster/migration_and_maintenance/test_migration_policy.py`

**Scale tests:**
- `tests/scale/test_scale_benchmark.py` - Uses `verify_vm_migrated` directly

**Rationale:**
The change adds VNC screenshot collection when VM verification times out after migration (wraps `wait_for_vm_interfaces` and `wait_for_ssh_connectivity` in try-except for `TimeoutExpiredError`). This affects all migration workflows that call `verify_vm_migrated()` or `migrate_vm_and_verify()`, including smoke tests that verify SSH connectivity after migration.'

Length of output: 1795



🧠 Learnings used
Learnt from: EdDev
Repo: RedHatQE/openshift-virtualization-tests PR: 3612
File: tests/network/user_defined_network/ip_specification/test_ip_specification.py:10-10
Timestamp: 2026-01-26T20:29:54.623Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, when introducing STD (Standard Test Definition) PRs that define test case structure without implementation, it's acceptable and intentional to use `__test__ = False` to block pytest collection until the tests are fully implemented in a follow-up PR. The STD workflow involves: (1) defining test structure, markers, and documentation first, (2) implementing the actual test logic and removing the collection blocker in a subsequent PR.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3358
File: tests/network/sriov/test_sriov.py:21-21
Timestamp: 2026-01-07T09:52:22.771Z
Learning: For PRs that only remove or modify pytest markers (like removing pytest.mark.post_upgrade) without changing test logic in openshift-virtualization-tests, collection verification using `pytest --collect-only` is sufficient. Full test execution is not required to verify marker-only changes.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T10:28:54.212Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for test execution plans, only set "Run smoke tests: True" if there is a verified, traceable dependency path from smoke tests to the changed code. Session-scoped fixtures or infrastructure-sounding changes do NOT automatically imply smoke test impact - the dependency chain must be explicitly verified using shell scripts before recommending smoke test execution.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-22T15:56:00.157Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, do NOT use "REQUEST_CHANGES" review type if the PR author has already marked the PR as verified (e.g., with `/verified` command). Test execution plans are informational guides, not blocking requirements. Use COMMENT event for informational test plans, or only REQUEST_CHANGES if there are actual code issues that need to be addressed before merging.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 3196
File: tests/network/upgrade/test_upgrade_network.py:4-4
Timestamp: 2025-12-22T16:27:44.327Z
Learning: For PRs that remove test cases (especially redundant test cleanup PRs in openshift-virtualization-tests), test collection verification (pytest --collect-only showing selected/deselected counts) is sufficient to confirm the removal was clean and the test module remains functional. Full test execution is not required for test deletion PRs.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:00:59.076Z
Learning: In the openshift-virtualization-tests repository, when responding to test execution plan requests from openshift-virtualization-qe-bot-3, CodeRabbit must post ONLY an inline review comment on the Files Changed tab and then stop immediately without generating any follow-up comments in the PR discussion thread. No acknowledgment messages, no confirmation of posting, no explanation - silence after posting the inline review equals success. Additional comments create empty/meaningless reviews that clutter the PR.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T20:11:03.645Z
Learning: In the openshift-virtualization-tests repository, when determining smoke test impact for changes affecting py_config["os_login_param"], follow this verification methodology: (1) Find all smoke tests: `rg "pytest.mark.smoke" --type=py -B2 | grep "def test_"`, (2) For each smoke test file, search for VM creation patterns: `rg "VirtualMachineForTests|running_vm|VirtualMachineForTestsFromTemplate|wait_for_ssh|check_ssh_connectivity"`, (3) Trace the dependency chain: smoke test → VirtualMachineForTests/running_vm() → wait_for_ssh_connectivity() (default enabled) → vm.login_params property → py_config["os_login_param"][vm.os_flavor], (4) Check utilities/virt.py for login_params usage: `rg "os_login_param|login_params" utilities/virt.py -C3`. Any smoke test creating VMs with default SSH connectivity checks (running_vm with check_ssh_connectivity=True) depends on os_login_param, even if the test doesn't directly reference it.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 2469
File: utilities/sanity.py:139-142
Timestamp: 2025-11-08T07:36:57.616Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep refactoring PRs (like PR `#2469`) strictly focused on moving/organizing code into more granular modules without adding new functionality, error handling, or behavioral changes. Such improvements should be handled in separate PRs.

Learnt from: qwang1
Repo: RedHatQE/openshift-virtualization-tests PR: 3301
File: utilities/unittests/test_oadp.py:48-57
Timestamp: 2026-01-19T07:54:03.630Z
Learning: The pytest marker requirement ("all new tests MUST have markers") applies only to integration/functional tests in the `tests/` directory (e.g., `tests/virt/`, `tests/storage/`, `tests/network/`), not to unit tests in `utilities/unittests/`. Unit test files in `utilities/unittests/` (such as test_oadp.py, test_sanity.py, test_operator.py, etc.) do not use pytest markers and this is the established repository convention.
</learning>

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: The test execution plan for PR `#1904` focuses on cluster-type conditional logic where nmstate functionality is bypassed on cloud clusters (Azure/AWS) but fully functional on bare-metal/PSI clusters, requiring different test strategies for each environment type.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3062
File: conftest.py:333-333
Timestamp: 2025-12-16T15:09:49.597Z
Learning: In the openshift-virtualization-tests repository, when conftest.py or utilities/bitwarden.py changes affect py_config["os_login_param"], smoke test impact must be determined by: (1) finding all smoke tests using `rg "pytest.mark.smoke"`, (2) checking each for VM creation patterns (VirtualMachineForTests, running_vm, VirtualMachineForTestsFromTemplate), (3) tracing whether running_vm is called with default check_ssh_connectivity=True, which accesses vm.login_params property that reads py_config["os_login_param"][vm.os_flavor]. The dependency chain is: smoke test → VM creation → running_vm → wait_for_ssh_connectivity → vm.login_params → os_login_param. Any smoke test creating VMs with SSH connectivity (the default) depends on os_login_param.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-09-29T19:05:24.987Z
Learning: For PR `#1904` test execution, the critical validation point is test_connectivity_over_migration_between_localnet_vms which should fail gracefully on cloud clusters but pass on bare-metal/PSI clusters, representing the core nmstate conditional logic functionality.

Learnt from: dshchedr
Repo: RedHatQE/openshift-virtualization-tests PR: 1716
File: tests/virt/conftest.py:289-297
Timestamp: 2025-08-09T01:52:26.683Z
Learning: When user dshchedr moves working code from one location to another in the openshift-virtualization-tests repository, they prefer not to modify it unless there's a real issue, maintaining the original implementation to avoid introducing unnecessary changes.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-16T14:06:22.391Z
Learning: In the openshift-virtualization-tests repository, when posting test execution plan inline review comments using GitHub API, the full test execution plan content must go in the `comments[].body` field (which appears on Files Changed tab), NOT in the top-level `body` field (which appears in PR discussion thread). The top-level `body` field should be omitted or left empty to avoid posting redundant comments in the PR discussion thread.

Learnt from: servolkov
Repo: RedHatQE/openshift-virtualization-tests PR: 1776
File: libs/net/node_network.py:25-31
Timestamp: 2025-08-20T23:43:28.117Z
Learning: In the RedHatQE/openshift-virtualization-tests project, servolkov's team always uses bare metal (BM) clusters with IPv4 setup in their testing environment, making defensive checks for IPv4 data presence potentially redundant in their networking code.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3371
File: scripts/tests_analyzer/compare_coderabbit_decisions.py:199-289
Timestamp: 2026-01-13T10:06:14.822Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep pagination loops inline rather than extracting them into generic helper functions when the loops have different URL patterns and unique post-processing logic, as the inline approach improves readability and makes each endpoint's behavior more explicit.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3228
File: .coderabbit.yaml:30-41
Timestamp: 2026-01-05T10:33:55.037Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers minimal pre-merge checks in CodeRabbit configuration: only docstrings enforcement (80% threshold) is needed, not title or description checks.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 1160
File: tests/storage/storage_migration/test_mtc_storage_class_migration.py:165-176
Timestamp: 2025-06-17T07:45:37.776Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers explicit fixture parameters over composite fixtures in test methods, even when there are many parameters, as they find this approach more readable and maintainable for understanding test dependencies.

Learnt from: jpeimer
Repo: RedHatQE/openshift-virtualization-tests PR: 3571
File: tests/storage/storage_migration/utils.py:158-167
Timestamp: 2026-01-25T13:18:26.819Z
Learning: In the openshift-virtualization-tests repository, user jpeimer prefers to avoid nitpicky style changes (e.g., removing `.keys()` from dict membership checks) because verifying every change is expensive. Be cautious about suggesting low-impact stylistic improvements that require verification overhead.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:24-64
Timestamp: 2025-06-23T19:28:20.281Z
Learning: In OpenShift Virtualization mass machine type transition tests, the machine type glob pattern "pc-q35-rhel8.*.*" is intentionally hard-coded in the kubevirt_api_lifecycle_automation_job as it's used only once for this specific test case, with plans to update it in the future if the job needs to support other machine types.

Learnt from: RoniKishner
Repo: RedHatQE/openshift-virtualization-tests PR: 1411
File: utilities/os_utils.py:246-279
Timestamp: 2025-07-22T17:13:59.166Z
Learning: In the RedHatQE/openshift-virtualization-tests repository, CentOS preferences follow the format "centos-stream<version>" (e.g., "centos-stream9", "centos-stream10"). The generate_instance_type_centos_os_matrix function correctly uses regex to extract numeric versions and constructs the latest version string in the same format as the input preferences for proper comparison.

Learnt from: akri3i
Repo: RedHatQE/openshift-virtualization-tests PR: 1210
File: tests/virt/cluster/general/mass_machine_type_transition_tests/conftest.py:83-97
Timestamp: 2025-06-23T19:19:31.961Z
Learning: In OpenShift Virtualization mass machine type transition tests, the kubevirt_api_lifecycle_automation_job requires cluster-admin privileges to function properly, as confirmed by the test maintainer akri3i.

Learnt from: yossisegev
Repo: RedHatQE/openshift-virtualization-tests PR: 0
File: :0-0
Timestamp: 2025-12-07T14:51:53.484Z
Learning: In the openshift-virtualization-tests repository, the team has decided to avoid using predefined time constants (like TIMEOUT_2MIN, TIMEOUT_5SEC) and prefers using explicit numeric values for timeout parameters.

Learnt from: vsibirsk
Repo: RedHatQE/openshift-virtualization-tests PR: 2045
File: tests/virt/cluster/vm_lifecycle/conftest.py:46-47
Timestamp: 2025-09-15T06:49:53.478Z
Learning: In the openshift-virtualization-tests repo, large fixture refactoring efforts like the golden image data source migration are handled incrementally by directory/team ownership. The virt/cluster directory is handled separately from virt/node, tests/infra, tests/storage, etc., with each area managed by relevant teams in follow-up PRs.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR `#2199` depends on PR `#2139` which adds architecture-specific OS_FLAVOR attributes to the Images.Cirros class (OS_FLAVOR_CIRROS for x86_64/ARM64, OS_FLAVOR_FEDORA for s390x), enabling conditional logic based on the underlying OS flavor in tests.

Learnt from: vamsikrishna-siddu
Repo: RedHatQE/openshift-virtualization-tests PR: 2199
File: tests/storage/test_online_resize.py:108-113
Timestamp: 2025-09-28T14:43:07.181Z
Learning: In the openshift-virtualization-tests repo, PR `#2199` depends on PR `#2139` which adds the OS_FLAVOR attribute to the Images.Cirros class, making Images.Cirros.OS_FLAVOR available for conditional logic in tests.

Learnt from: rnetser
Repo: RedHatQE/openshift-virtualization-tests PR: 3371
File: scripts/tests_analyzer/compare_coderabbit_decisions.py:1-3
Timestamp: 2026-01-12T18:39:22.849Z
Learning: In the openshift-virtualization-tests repository, user rnetser prefers to keep tool attribution comments (like "Generated using Claude cli") in production code files. These references should not be flagged for removal.

@SiboWang1997
Copy link
Contributor

/lgtm

@vsibirsk
Copy link
Collaborator

/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.