From 6f1d8dcebf5192576bb7490f9a2034b30c40d5ff Mon Sep 17 00:00:00 2001 From: r2dedios Date: Thu, 19 Feb 2026 10:01:49 +0100 Subject: [PATCH 1/2] chore(rh-virt): Added Snapshot Skills based on common CRUD toolset; Refactored rh-virt skills based on repo's design principles, and Claude's skill guidelines Signed-off-by: r2dedios --- CLAUDE.md | 17 + rh-virt/README.md | 14 +- rh-virt/SKILL_TEMPLATE.md | 887 ++++++++++++++++++ rh-virt/docs/troubleshooting/INDEX.md | 12 +- .../docs/troubleshooting/lifecycle-errors.md | 2 +- .../docs/troubleshooting/network-errors.md | 4 +- .../docs/troubleshooting/runtime-errors.md | 8 +- .../docs/troubleshooting/scheduling-errors.md | 4 +- .../docs/troubleshooting/storage-errors.md | 4 +- rh-virt/skills/vm-clone/SKILL.md | 64 +- .../skills/{vm-creator => vm-create}/SKILL.md | 107 +-- rh-virt/skills/vm-delete/SKILL.md | 94 +- rh-virt/skills/vm-inventory/SKILL.md | 52 +- rh-virt/skills/vm-lifecycle-manager/SKILL.md | 48 +- rh-virt/skills/vm-snapshot-create/SKILL.md | 423 +++++++++ rh-virt/skills/vm-snapshot-delete/SKILL.md | 447 +++++++++ rh-virt/skills/vm-snapshot-list/SKILL.md | 402 ++++++++ rh-virt/skills/vm-snapshot-restore/SKILL.md | 580 ++++++++++++ 18 files changed, 2945 insertions(+), 224 deletions(-) create mode 100644 rh-virt/SKILL_TEMPLATE.md rename rh-virt/skills/{vm-creator => vm-create}/SKILL.md (94%) create mode 100644 rh-virt/skills/vm-snapshot-create/SKILL.md create mode 100644 rh-virt/skills/vm-snapshot-delete/SKILL.md create mode 100644 rh-virt/skills/vm-snapshot-list/SKILL.md create mode 100644 rh-virt/skills/vm-snapshot-restore/SKILL.md diff --git a/CLAUDE.md b/CLAUDE.md index 518b3b4..05d93b2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -644,6 +644,9 @@ last_updated: YYYY-MM-DD 4. Include concrete examples 5. Test with `Skill` tool invocation +**Collection-Specific Standards:** +- **rh-virt**: Follow `rh-virt/SKILL_TEMPLATE.md` and validate with `rh-virt/SKILLS_CHECKLIST.md` for enhanced quality standards including mandatory Common Issues and Example Usage sections + ### Adding an Agent 1. Create `agents/.md` @@ -686,6 +689,20 @@ The `rh-sre` pack is the most complete implementation, demonstrating: When creating new collection, use `rh-sre` as the architectural reference. +### rh-virt (Quality-Controlled Pattern) + +The `rh-virt` pack demonstrates skill quality standardization: +- Comprehensive skill templates (`SKILL_TEMPLATE.md`) +- Validation checklist (`SKILLS_CHECKLIST.md`) +- Risk-based color coding (cyan/green/blue/yellow/red) +- Mandatory Common Issues and Example Usage sections +- Consistent section ordering and formatting + +Use `rh-virt` as reference for collections requiring high consistency and maintainability. + +When creating new collections, follow the pattern that best matches your needs: +- **Tool-focused domains** (infrastructure, operations) → Follow rh-virt pattern + ## Key Principles ### Core Architecture diff --git a/rh-virt/README.md b/rh-virt/README.md index 5b37776..c886969 100644 --- a/rh-virt/README.md +++ b/rh-virt/README.md @@ -98,7 +98,7 @@ claude plugin install openshift-virtualization The pack provides 5 specialized skills for complete VM lifecycle management: -### 1. **vm-creator** - Virtual Machine Provisioning +### 1. **vm-create** - Virtual Machine Provisioning Create new virtual machines in OpenShift Virtualization with automatic error diagnosis and workarounds. @@ -291,7 +291,7 @@ The server provides two toolsets enabled via `--toolsets core,kubevirt`: ``` User: "Create a VM called web-server in namespace production" -→ vm-creator skill creates the VM +→ vm-create skill creates the VM User: "Start the web-server VM" → vm-lifecycle-manager skill starts the VM @@ -383,7 +383,7 @@ Agent: "⏳ Storage cloning in progress... (45%) ``` User: "Create a Fedora VM called test-vm in namespace demo" -→ vm-creator skill creates the VM +→ vm-create skill creates the VM → Detects ErrorUnschedulable status → Consults docs/troubleshooting/scheduling-errors.md for domain knowledge → Diagnoses: Node taints prevent scheduling @@ -396,7 +396,7 @@ Agent: "⚠️ VM Scheduling Issue Detected How would you like to proceed?" User: "apply workaround" -→ vm-creator patches VM with tolerations +→ vm-create patches VM with tolerations → Verifies VM can now be scheduled → Reports success @@ -460,7 +460,7 @@ MCP server is configured in `.mcp.json`: ### Automatic Diagnosis (Recommended) -The **vm-creator** skill includes automatic error diagnosis and workaround proposals. When VMs encounter scheduling issues: +The **vm-create** skill includes automatic error diagnosis and workaround proposals. When VMs encounter scheduling issues: 1. **Detection**: Skill automatically detects ErrorUnschedulable and other error states 2. **Diagnosis**: Consults `docs/troubleshooting/INDEX.md` and category files to understand root cause @@ -492,7 +492,7 @@ The **vm-creator** skill includes automatic error diagnosis and workaround propo 2. Check namespace exists and ServiceAccount has access 3. Verify RBAC permissions for VirtualMachine resources 4. Check cluster resource availability (CPU, memory, storage) -5. Let vm-creator skill run automatic diagnosis (see Workflow 4 above) +5. Let vm-create skill run automatic diagnosis (see Workflow 4 above) ### Skills Not Triggering @@ -525,7 +525,7 @@ rh-virt/ │ └── .ai-index/ # Semantic indexing for AI discovery │ └── semantic-index.json └── skills/ - ├── vm-creator/SKILL.md # VM provisioning with auto-diagnosis + ├── vm-create/SKILL.md # VM provisioning with auto-diagnosis ├── vm-lifecycle-manager/SKILL.md # VM power management ├── vm-inventory/SKILL.md # VM discovery and status ├── vm-delete/SKILL.md # VM destruction and cleanup diff --git a/rh-virt/SKILL_TEMPLATE.md b/rh-virt/SKILL_TEMPLATE.md new file mode 100644 index 0000000..c8ecee5 --- /dev/null +++ b/rh-virt/SKILL_TEMPLATE.md @@ -0,0 +1,887 @@ +# Skill Template for rh-virt Collection + +This template provides the standardized structure for all skills in the `rh-virt` agentic collection. Use this when creating new skills to ensure consistency, maintainability, and compliance with Claude Code requirements. + +## Overview + +This template implements: +- **Repository Standards**: rh-virt collection-specific patterns and conventions +- **Claude Guidelines**: Official skill structure from `/CLAUDE.md` (Design Principles #1-7) +- **MCP Integration**: OpenShift Virtualization MCP server tool usage patterns +- **Human-in-the-Loop**: Safety confirmations for critical operations + +**Reference**: See `/CLAUDE.md` "Design Principles for Skills and Agents" for complete rationale. + +## Quick Start + +1. Copy this template to `skills//SKILL.md` +2. Replace all `` with actual content +3. Follow the validation checklist at the end of this template +4. Verify compliance with `SKILLS_CHECKLIST.md` +5. Test the skill before committing + +## Claude Code Requirements Summary + +Before using this template, understand these mandatory requirements: + +### 1. Document Consultation Transparency (Design Principle #1) +- Skills MUST actually read documentation using Read tool before invoking MCP tools +- Skills MUST declare consultation to user: "I consulted [file] to understand [topic]" +- **REQUIRED** for rh-virt when relevant troubleshooting docs exist (docs/troubleshooting/) + +### 2. Precise Parameter Specification (Design Principle #2) +- Provide exact parameter names and formats with examples +- Ensures first-attempt success when invoking MCP tools + +### 3. Skill Precedence and Conciseness (Design Principle #3) +- Description field in YAML frontmatter MUST be under 500 tokens +- Focus on "when to use" with 3-5 concrete examples + +### 4. Dependencies Declaration (Design Principle #4) +- List all MCP servers, tools, related skills, and documentation +- Follows specific format (see Dependencies section below) + +### 5. Human-in-the-Loop Requirements (Design Principle #5) +- Skills performing critical operations MUST require explicit user confirmation +- Positioned AFTER Dependencies section (not before) +- Read-only skills use "Not Applicable" pattern + +### 6. Mandatory Skill Sections (Design Principle #6) +- All sections must appear in correct order (see Template Structure below) + +### 7. MCP Server Availability Verification (Design Principle #7) +- Verify MCP server configuration before executing +- NEVER expose credential values in output + +--- + +## Template Structure + +### YAML Frontmatter + +**Purpose**: Loaded at agent initialization to help Claude decide which skill to invoke. + +**Requirements** (from CLAUDE.md Design Principle #3): +- Description field MUST be under 500 tokens total +- Focus on "when to use" with 3-5 concrete user phrases +- Include clear anti-patterns with alternatives +- Keep summary line under 100 characters for readability + +```yaml +--- +name: # REQUIRED: Lowercase with dashes, matches directory name +description: | # REQUIRED: Must be under 500 tokens total + # IMPORTANT: Keep entire description field under 500 tokens + # This includes all lines: summary, use cases, and anti-patterns + # Claude loads this at agent initialization - conciseness matters! + + (max 100 characters) + + Use when: + - "" # Actual phrases users would say + - "" # Not generic descriptions + - "" # At least 3 examples required + + (1 sentence) + + NOT for (use instead). + +model: inherit # REQUIRED: Always "inherit" unless special case + # Only use "sonnet" or "haiku" if skill needs specific model +color: # REQUIRED: red|yellow|blue|green|cyan - see Color Guide below + # Indicates operation risk level for user safety +--- + +# / Skill + + + +**Implementation Note** (OPTIONAL): + +## Prerequisites + +**Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) + +**Required MCP Tools**: +- `tool_name` (from openshift-virtualization) - Brief description of what it does +- `another_tool` (from openshift-virtualization) - Brief description + +**Required Environment Variables**: +- `KUBECONFIG` - Path to Kubernetes configuration file with cluster access +- `OTHER_VAR` (if applicable) - Description + +**Required Cluster Setup**: +- OpenShift cluster (>= 4.19) +- OpenShift Virtualization operator installed +- ServiceAccount with RBAC permissions to +- + +### Prerequisite Verification + +**Before executing, verify MCP server availability:** + +1. **Check MCP Server Configuration** + - Verify `openshift-virtualization` exists in `.mcp.json` + - If missing → Report to user with setup instructions + +2. **Check Environment Variables** + - Verify `KUBECONFIG` is set (check presence only, never expose value) + - If missing → Report to user + +3. **Check ** (optional verification) + - Verify + - If missing → Report to user + +**Human Notification Protocol:** + +When prerequisites fail: + +``` +❌ Cannot execute : MCP server 'openshift-virtualization' is not available + +📋 Setup Instructions: +1. Add openshift-virtualization to .mcp.json +2. Set KUBECONFIG environment variable: export KUBECONFIG="/path/to/kubeconfig" +3. Restart Claude Code to reload MCP servers + +🔗 Documentation: https://github.com/openshift/openshift-mcp-server + +❓ How would you like to proceed? +Options: +- "setup" - Help configure the MCP server now +- "skip" - Skip this skill +- "abort" - Stop workflow + +Please respond with your choice. +``` + +⚠️ **SECURITY**: Never display actual KUBECONFIG path or credential values in output. + +## When to Use This Skill + +**Trigger this skill when:** +- User +- User +- User +- User explicitly invokes `/` command + +**User phrases that trigger this skill:** +- "" +- "" +- "" +- `/` (explicit command) + +**Do NOT use this skill when:** +- User wants to → Use `` skill instead +- User wants to → Use `` skill instead +- User wants to + +## Workflow + +**Purpose**: Step-by-step instructions for executing this skill. + +**Requirements** (from CLAUDE.md Design Principles #1-2): +- Document Consultation BEFORE tool invocation (REQUIRED when relevant troubleshooting docs exist) +- Precise parameter specifications with examples +- Clear error handling for each step + +### Step 1: + + + +**CRITICAL (Include if relevant troubleshooting documentation exists)**: Document consultation MUST happen BEFORE tool invocation. + +**Document Consultation** (REQUIRED when relevant troubleshooting docs exist): +1. **Action**: Read [doc.md](../../docs/troubleshooting/doc.md) using the Read tool to understand [specific topic] +2. **Output to user**: "I consulted [doc.md](../../docs/troubleshooting/doc.md) to understand [specific topic]." + +**When to consult troubleshooting docs**: +- VM creation/lifecycle skills → Read scheduling-errors.md, storage-errors.md, network-errors.md +- VM snapshot skills → Read storage-errors.md +- Any skill encountering errors → Read relevant troubleshooting doc before reporting to user + +**Available troubleshooting documentation**: +- docs/troubleshooting/INDEX.md - Master index of all troubleshooting docs +- docs/troubleshooting/scheduling-errors.md - VM scheduling failures (ErrorUnschedulable, taints, resources) +- docs/troubleshooting/storage-errors.md - Storage and PVC issues +- docs/troubleshooting/network-errors.md - Network attachment failures (Multus, NAD) +- docs/troubleshooting/lifecycle-errors.md - VM lifecycle state errors +- docs/troubleshooting/runtime-errors.md - Runtime and crash issues + +See CLAUDE.md Design Principle #1 for complete rationale. + +**MCP Tool**: `tool_name` or `category__tool_name` (from openshift-virtualization) + +**Parameters** (Design Principle #2 - Precise Parameter Specification): + +Provide exact parameter names, types, formats, and examples to ensure first-attempt success. + +```json +{ + "param1": "", // REQUIRED: Description of param1 + // Example: "database-01" (exact format shown) + "param2": "", // OPTIONAL: Description of param2 + // Example: "production" (namespace name) + "param3": true // REQUIRED: Description of what this boolean controls + // Example: true (enables feature X) +} +``` + +**Alternative Format** (bullet list - recommended by CLAUDE.md): +- `param1`: [exact specification] - REQUIRED + - Example: `"database-01"` (VM name to operate on) +- `param2`: [exact specification] - OPTIONAL + - Example: `"production"` (namespace where VM exists) +- `param3`: [exact specification] - REQUIRED + - Example: `true` (whether to enable feature X) + +**Example tool invocation:** +```json +tool_name({ + "param1": "database-01", + "param2": "production", + "param3": true +}) +``` + +**Expected Output**: + +**Error Handling**: +- If +- If +- If + +**Extract/Store Information:** +- `field1` - +- `field2` - + +### Step 2: + + + +### Step N: Report Results + +**On success:** + +```markdown +## ✓ Successful + +****: `` (namespace: ``) + +###
+- **Detail 1**: +- **Detail 2**: +- **Detail 3**: + +### +- ✓ +- ✓ + +### Next Steps + +**To :** +"" + +**To :** +"" +``` + +**On failure:** + +```markdown +## ❌ Failed + +**Error**: + +****: `` (namespace: ``) + +**Common Causes:** +- **** - +- **** - +- **** - + +**Troubleshooting Steps:** + +1. **:** + + +2. **:** + + +3. **:** + + +Would you like help troubleshooting this error? +``` + +## Common Issues + +### Issue 1: + +**Error**: "" + +**Cause**: + +**Solution:** +1. +2. +3. + +**Related**: See [.md](../../docs/troubleshooting/.md) for more details + +### Issue 2: + +**Error**: "" + +**Cause**: + +**Solution:** +1. +2. + +### Issue 3: + +**Error**: "" + +**Cause**: + +**Solution:** +1. +2. + +**Minimum**: Include at least 3 common issues. Add more based on actual user pain points. + +## Dependencies + +**Purpose**: Declare all external dependencies for debugging and prerequisite verification. + +**Requirements** (from CLAUDE.md Design Principle #4): +- List MCP servers, tools, related skills, and documentation +- Makes dependencies explicit for troubleshooting +- Enables proper error handling when dependencies missing + +### Required MCP Servers +- `openshift-virtualization` - OpenShift MCP server with kubevirt toolset + - Source: https://github.com/openshift/openshift-mcp-server + +### Required MCP Tools + +List each tool with its purpose, parameters, and source. + +- `tool_name` (from openshift-virtualization) - Brief description of what it does + - **Used for**: + - **Parameters**: + - **Source**: https://github.com/openshift/openshift-mcp-server + +- `another_tool` (from openshift-virtualization) - Brief description + - **Used for**: + - **Parameters**: + - **Source**: https://github.com/openshift/openshift-mcp-server + +### Related Skills + +List skills that complement or replace this skill. + +- `skill-name-1` - When to use it instead of this skill (alternative) +- `skill-name-2` - Complementary skill (use together) +- `skill-name-3` - Follow-up skill (use after this one completes) + +### Reference Documentation + +**Internal Troubleshooting Documentation**: +- [INDEX.md](../../docs/troubleshooting/INDEX.md) - Master troubleshooting index +- [scheduling-errors.md](../../docs/troubleshooting/scheduling-errors.md) - VM scheduling failures +- [storage-errors.md](../../docs/troubleshooting/storage-errors.md) - Storage and PVC issues +- [network-errors.md](../../docs/troubleshooting/network-errors.md) - Network attachment failures +- [lifecycle-errors.md](../../docs/troubleshooting/lifecycle-errors.md) - VM lifecycle errors +- [runtime-errors.md](../../docs/troubleshooting/runtime-errors.md) - Runtime and crash issues + +**Official Red Hat Documentation**: +- [Topic - OpenShift ](https://docs.redhat.com/en/documentation/openshift_container_platform//html-single/virtualization/index#section) - Main documentation +- [Blog Post Title](https://www.redhat.com/en/blog/post-slug) - Additional context + +**Upstream Documentation**: +- [KubeVirt Topic](https://kubevirt.io/user-guide/...) - Upstream project docs +- [Kubernetes Topic](https://kubernetes.io/docs/...) - K8s specification + +**IMPORTANT**: Always use the latest stable OpenShift version available for documentation links (check https://docs.redhat.com/ for current version). + +## Critical: Human-in-the-Loop Requirements + +**Purpose**: Define when and how to request user confirmation for critical operations. + +**Requirements** (from CLAUDE.md Design Principle #5): +- Skills performing critical operations MUST require explicit user confirmation +- This section appears AFTER Dependencies section (Design Principle #6) +- Read-only skills use "Not Applicable" pattern (see vm-inventory for example) + +**When to Include**: +- Playbook execution (ansible-mcp-server) +- System modifications (package updates, config changes) +- Multi-system operations (batch remediation) +- Data deletion or irreversible actions +- Resource creation that consumes cluster capacity + +**When to Omit**: +- Read-only operations (list/view operations) - Use "Not Applicable" pattern instead + +--- + +**IMPORTANT:** This skill . You MUST: + +1. **** (e.g., "Before Creating Resources") + - + - + - + - Ask: "Should I proceed with [specific action]?" + - Wait for explicit user confirmation + +2. **** (e.g., "Before Destructive Actions") + - Display preview of changes + - + - Ask: "Review the changes above. Should I execute this?" + - Wait for explicit "yes" or "proceed" + +3. **Never Auto-Execute** + - **NEVER without ** + - **NEVER when ** + - **NEVER skip ** + - **NEVER assume approval** - always wait for explicit user confirmation + +**Why This Matters:** +- ****: +- ****: +- ****: + +**Rationale**: Prevents unintended automation; maintains user control over critical operations. + +--- + +**For Read-Only Skills** (use this pattern instead): + +**Not applicable** - This skill performs read-only operations and does not modify any cluster resources. No user confirmation is required. + +**Read-only operations:** +- +- +- + +**No modifications performed:** +- ✓ Does not change cluster state +- ✓ Does not modify resources +- ✓ Does not consume cluster resources + +## Security Considerations + +- **RBAC Enforcement**: Requires for +- **Data Protection**: +- **Namespace Isolation**: +- **Storage Quotas**: +- **Audit Trail**: +- **KUBECONFIG Security**: Credentials never exposed in output +- ****: + +## Example Usage + +### Example 1: + +``` +User: "" + +Agent: [Invokes skill] + [] + + + +## + +**Field**: `value` + +| Column 1 | Column 2 | Column 3 | +|----------|----------|----------| +| data1 | data2 | data3 | + +User: "" + +Agent: [] + + + +## ✓ + +Next steps: "" +``` + +### Example 2: (OPTIONAL) + + + +**Minimum**: Include at least 1 complete example showing the full workflow. + +## Advanced Features (OPTIONAL) + +Include this section only if there are advanced use cases. + +### + + + +### + + + +**Examples:** +- Batch operations +- Special configurations +- Integration with other tools +- Performance optimizations +``` + +--- + +## Color Guide + +Use the following color codes based on operation characteristics: + +| Color | When to Use | Examples | +|-------|-------------|----------| +| **cyan** | Read-only operations (list/view) | vm-inventory, vm-snapshot-list | +| **green** | Additive operations (create new resources) | vm-create, vm-snapshot-create | +| **blue** | Reversible state changes | vm-lifecycle-manager, vm-clone | +| **yellow** | Destructive but recoverable operations | vm-snapshot-delete | +| **red** | Irreversible/critical operations (data loss risk) | vm-delete, vm-snapshot-restore | + +--- + +## Comprehensive Validation Checklist + +Before committing your skill, verify compliance with both **repository standards** and **Claude Code requirements**. + +### 1. YAML Frontmatter (Design Principle #3) + +- [ ] **name**: Lowercase with dashes, matches directory name +- [ ] **description**: Under 500 tokens total (CRITICAL) +- [ ] **description**: Includes 3-5 concrete "Use when" examples +- [ ] **description**: Includes anti-patterns with alternatives ("NOT for X, use Y instead") +- [ ] **description**: Summary line under 100 characters +- [ ] **model**: Set to "inherit" (unless special case requires specific model) +- [ ] **color**: Matches operation type (see Color Guide) + +### 2. Section Presence and Order (Design Principle #6) + +Sections MUST appear in this exact order: + +- [ ] 1. Skill title (`# /skill-name Skill`) +- [ ] 2. **Prerequisites** (with MCP server verification) +- [ ] 3. **When to Use This Skill** (with anti-patterns) +- [ ] 4. **Workflow** (with step-by-step instructions) +- [ ] 5. **Common Issues** (at least 3 issues documented) +- [ ] 6. **Dependencies** (MCP servers, tools, skills, docs) +- [ ] 7. **Critical: Human-in-the-Loop Requirements** (if applicable) +- [ ] 8. **Security Considerations** +- [ ] 9. **Example Usage** (at least 1 complete example) +- [ ] 10. **Advanced Features** (optional) + +### 3. Prerequisites Section (Design Principle #7) + +- [ ] Lists required MCP servers with setup guide links +- [ ] Lists required MCP tools with descriptions +- [ ] Lists required environment variables (if applicable) +- [ ] Includes prerequisite verification steps +- [ ] Includes Human Notification Protocol for failures +- [ ] **SECURITY**: Never exposes credential values in output + +### 4. Workflow Section (Design Principles #1-2) + +- [ ] **Document Consultation** pattern included when relevant troubleshooting docs exist + - REQUIRED when skill relates to documented error scenarios + - Consultation happens BEFORE tool invocation + - Includes "Output to user" declaration + - References specific troubleshooting docs (scheduling-errors.md, storage-errors.md, etc.) +- [ ] **Parameters**: Precise specifications with exact formats +- [ ] **Parameters**: Includes examples for each parameter +- [ ] **Expected Output**: Describes what tools return +- [ ] **Error Handling**: Documented for each step + +### 5. Common Issues Section + +- [ ] At least 3 common issues documented +- [ ] Each issue has: Error, Cause, Solution +- [ ] Solutions include 2-4 actionable steps +- [ ] Concise format (8-12 lines per issue) + +### 6. Dependencies Section (Design Principle #4) + +- [ ] **Required MCP Servers**: Listed with source links +- [ ] **Required MCP Tools**: Listed with parameters and use cases +- [ ] **Related Skills**: Listed with relationship explanation +- [ ] **Reference Documentation**: Latest stable OpenShift version +- [ ] Documentation links are valid and accessible + +### 7. Human-in-the-Loop Section (Design Principle #5) + +- [ ] Positioned AFTER Dependencies section (not before) +- [ ] Required for: resource creation, state changes, destructive operations +- [ ] Uses "Not Applicable" pattern for read-only skills +- [ ] Specifies exact confirmation points +- [ ] Includes "Why This Matters" rationale +- [ ] Uses **NEVER** statements to prevent auto-execution + +### 8. Security Considerations + +- [ ] RBAC enforcement documented +- [ ] Data protection mechanisms explained +- [ ] Namespace isolation described +- [ ] Audit trail mentioned +- [ ] KUBECONFIG security confirmed (credentials never exposed) + +### 9. Example Usage + +- [ ] At least 1 complete example included +- [ ] Shows realistic user-agent interaction +- [ ] Demonstrates full workflow from start to finish +- [ ] Uses actual markdown formatting in output + +### 10. Quality and Style + +- [ ] No emojis (unless explicitly requested by user) +- [ ] Markdown formatting correct (tables, code blocks, lists) +- [ ] No credential exposure in examples or text +- [ ] Cross-references use relative paths +- [ ] Skill name used consistently throughout + +### 11. Testing + +- [ ] Skill file loads without YAML parse errors +- [ ] All internal links are valid (skills, docs) +- [ ] All external links are accessible (Red Hat docs, GitHub) +- [ ] Skill has been tested with actual MCP server +- [ ] All workflow steps produce expected results + +### 12. Repository Standards + +- [ ] File located at `skills//SKILL.md` +- [ ] Directory name matches frontmatter `name` field +- [ ] Color code appropriate for operation risk +- [ ] Follows rh-virt collection conventions +- [ ] No conflicts with existing skill names + +--- + +**Quick Verification Commands**: + +```bash +# Check frontmatter token count (rough estimate) +wc -w skills//SKILL.md | head -n 20 + +# Verify section order +grep "^## " skills//SKILL.md + +# Check for credential exposure (should return nothing) +grep -i "password\|secret\|token.*=" skills//SKILL.md +``` + +--- + +See `SKILLS_CHECKLIST.md` for the complete validation guide with scoring criteria. + +--- + +## Tips for Writing Great Skills + +### Content Quality + +1. **Be Specific**: Use concrete examples, not generic placeholders + - ❌ "Create a VM with specified parameters" + - ✅ "Create VM database-01 with 4GB RAM in production namespace" + +2. **Show Real Output**: Include actual markdown formatting in examples + - Use real table data, actual status messages, realistic error text + - Demonstrate what users will actually see + +3. **Test First**: Run the skill before documenting to capture real behavior + - Document actual tool outputs, not assumptions + - Include real error messages you encountered + +4. **Think Like Users**: Use actual phrases users would say + - "List all VMs" not "Enumerate virtual machine resources" + - "Start the database VM" not "Initiate VM power-on sequence" + +### Claude Code Compliance + +5. **Follow Design Principles**: Reference `/CLAUDE.md` for rationale + - **Principle #1**: Document Consultation (read docs before tools) + - **Principle #2**: Precise Parameters (exact formats with examples) + - **Principle #3**: Concise Descriptions (under 500 tokens) + - **Principle #4**: Dependencies Declaration (explicit listing) + - **Principle #5**: Human-in-the-Loop (user confirmation for critical ops) + - **Principle #6**: Mandatory Sections (correct order) + - **Principle #7**: MCP Verification (check availability) + +6. **Link Everything**: Cross-reference related skills and documentation + - Use relative paths: `../../docs/file.md` or `../other-skill/SKILL.md` + - Verify links are valid before committing + +7. **Keep Current**: Use latest stable OpenShift version for all docs + - Check https://docs.redhat.com/ for current version + - Update documentation links when new versions release + +### Style and Safety + +8. **No Jargon**: Explain technical terms when first introduced + - First use: "VirtualMachineInstance (VMI) - the running pod for a VM" + - Subsequent: "VMI" is fine + +9. **Error First**: Document common failures before edge cases + - Users hit common issues 90% of the time + - Rare edge cases can go in Advanced Features + +10. **Security Always**: Never expose credentials, always check RBAC + - ✓ "KUBECONFIG is set" + - ❌ "KUBECONFIG=/path/to/kubeconfig" + - ✓ "Requires update permissions for VirtualMachines" + - ❌ Never include actual secrets, tokens, passwords + +### Maintainability + +11. **Consistency Matters**: Follow this template exactly + - Same section order across all skills + - Same formatting for MCP tools + - Same pattern for error handling + +12. **Version Control**: Document when you reference external content + - "As of OpenShift 4.21..." for version-specific behavior + - Link to specific doc versions when behavior may change + +13. **Self-Documenting**: Skill should be understandable without external context + - Don't assume user has read other skills + - Repeat critical information (RBAC requirements, prerequisites) + - Link to related skills for more detail + +### Common Pitfalls to Avoid + +- ❌ Hardcoding version numbers (use "latest stable" unless version-specific) +- ❌ Assuming prerequisites are met (always verify MCP server availability) +- ❌ Verbose Common Issues sections (keep 8-12 lines per issue) +- ❌ Missing anti-patterns in frontmatter (always include "NOT for X") +- ❌ Parameters without examples (always show exact format) +- ❌ Human-in-the-Loop before Dependencies (wrong section order) +- ❌ Description over 500 tokens (violates Claude requirement) + +### Quality Checklist Before Committing + +- [ ] Ran the skill with actual MCP server +- [ ] Captured real tool outputs in examples +- [ ] Verified all links are valid +- [ ] Checked frontmatter under 500 tokens +- [ ] Included 3-5 "Use when" examples +- [ ] Documented 3+ common issues +- [ ] Positioned Human-in-the-Loop correctly (after Dependencies) +- [ ] No credentials exposed anywhere +- [ ] Followed color guide for operation type +- [ ] Used latest stable OpenShift version + +**Remember**: Skills are loaded by Claude Code's agent system. Clear, precise, concise documentation helps Claude make correct decisions about when to invoke your skill. + +--- + +## Claude Code Design Principles Reference + +For complete details, see `/CLAUDE.md` "Design Principles for Skills and Agents" section. + +### Principle #1: Document Consultation Transparency + +**What**: Skills MUST read documentation before invoking tools, then declare consultation to user. + +**Why**: Ensures AI enriches context with domain knowledge; provides transparency to users. + +**How**: +```markdown +**Document Consultation** (REQUIRED - Execute FIRST): +1. **Action**: Read [doc.md](path) using Read tool to understand [topic] +2. **Output to user**: "I consulted [doc.md](path) to understand [topic]." +``` + +**Status for rh-virt**: REQUIRED when relevant troubleshooting documentation exists. Skills should consult docs/troubleshooting/ files before handling errors or complex operations. + +### Principle #2: Precise Parameter Specification + +**What**: Provide exact parameter names, types, and formats with examples. + +**Why**: Ensures first-attempt success when invoking MCP tools; reduces wasted cycles. + +**How**: +```markdown +**Parameters**: +- `param1`: "value" (exact format) + - Example: `"database-01"` (VM name to operate on) +``` + +### Principle #3: Skill Precedence and Conciseness + +**What**: Description field MUST be under 500 tokens; focus on "when to use". + +**Why**: Minimizes token usage at agent initialization while maintaining clarity. + +**How**: Keep frontmatter concise with 3-5 concrete examples, defer details to skill body. + +### Principle #4: Dependencies Declaration + +**What**: List all MCP servers, tools, related skills, and documentation. + +**Why**: Makes dependencies explicit for debugging and error handling. + +**How**: Follow required format in Dependencies section of template. + +### Principle #5: Human-in-the-Loop Requirements + +**What**: Skills performing critical operations MUST require explicit user confirmation. + +**Why**: Prevents unintended automation; maintains user control. + +**How**: Position AFTER Dependencies section; specify exact confirmation points. + +### Principle #6: Mandatory Skill Sections + +**What**: All sections must appear in correct order. + +**Why**: Standardizes skill structure for consistency and completeness. + +**How**: Follow exact order: Prerequisites → When to Use → Workflow → Common Issues → Dependencies → Human-in-the-Loop → Security → Examples. + +### Principle #7: MCP Server Availability Verification + +**What**: Verify MCP server configuration before executing. + +**Why**: Provides graceful degradation and clear user guidance when dependencies missing. + +**How**: Include verification steps in Prerequisites section with Human Notification Protocol. + +**CRITICAL SECURITY**: Never expose credential values in output (only report presence/absence). + +--- + +## Repository Context + +**Collection**: rh-virt (OpenShift Virtualization management) +**MCP Server**: openshift-virtualization (https://github.com/openshift/openshift-mcp-server) +**Pattern**: MCP-first approach (always use MCP tools, not CLI fallbacks) +**Documentation**: docs/troubleshooting/ contains error resolution guides (6 documents) + +**Related Collections**: +- rh-sre: Reference implementation with full docs/ directory and semantic indexing +- Use rh-sre as architectural reference for advanced patterns + +--- + +## Getting Help + +- **Template Issues**: See `SKILLS_CHECKLIST.md` for detailed validation guide +- **Claude Requirements**: Read `/CLAUDE.md` for complete design principles +- **MCP Server**: Check https://github.com/openshift/openshift-mcp-server for tool documentation +- **OpenShift Virt**: Reference https://docs.redhat.com/ for latest virtualization docs + +--- + +**Last Updated**: 2026-02-18 +**Template Version**: 2.0 (Claude Code compliant) diff --git a/rh-virt/docs/troubleshooting/INDEX.md b/rh-virt/docs/troubleshooting/INDEX.md index 0031af3..a2e7696 100644 --- a/rh-virt/docs/troubleshooting/INDEX.md +++ b/rh-virt/docs/troubleshooting/INDEX.md @@ -39,7 +39,7 @@ This guide provides comprehensive diagnostic procedures and workarounds for Virt - **Networking**: Secondary network attachment failures - **Runtime**: CrashLoopBackOff, guest OS boot failures -This guide is consulted by all rh-virt skills (vm-creator, vm-inventory, vm-lifecycle-manager, vm-delete, vm-clone) when diagnosing and remediating VM issues. +This guide is consulted by all rh-virt skills (vm-create, vm-inventory, vm-lifecycle-manager, vm-delete, vm-clone) when diagnosing and remediating VM issues. --- @@ -55,7 +55,7 @@ The troubleshooting documentation is organized by error category for easier navi - ErrorUnschedulable - Insufficient Resources - ErrorUnschedulable - Node Selector Mismatch -**Skills that use this**: vm-creator, vm-lifecycle-manager +**Skills that use this**: vm-create, vm-lifecycle-manager --- @@ -68,7 +68,7 @@ The troubleshooting documentation is organized by error category for easier navi - Storage Deletion Failures - DataVolume Cloning Failures -**Skills that use this**: vm-creator, vm-delete, vm-clone +**Skills that use this**: vm-create, vm-delete, vm-clone --- @@ -90,7 +90,7 @@ The troubleshooting documentation is organized by error category for easier navi **Errors covered**: - CrashLoopBackOff -**Skills that use this**: vm-creator, vm-lifecycle-manager +**Skills that use this**: vm-create, vm-lifecycle-manager --- @@ -100,7 +100,7 @@ The troubleshooting documentation is organized by error category for easier navi **Errors covered**: - Network Attachment Failures (Multus, SR-IOV) -**Skills that use this**: vm-creator +**Skills that use this**: vm-create --- @@ -225,7 +225,7 @@ Each error section includes: ## 🔍 Quick Navigation by Skill -**vm-creator**: +**vm-create**: - [Scheduling Errors](scheduling-errors.md) - ErrorUnschedulable diagnostics - [Storage Errors](storage-errors.md) - ErrorDataVolumeNotReady, storage provisioning - [Runtime Errors](runtime-errors.md) - CrashLoopBackOff diff --git a/rh-virt/docs/troubleshooting/lifecycle-errors.md b/rh-virt/docs/troubleshooting/lifecycle-errors.md index 084f2ce..6167fcc 100644 --- a/rh-virt/docs/troubleshooting/lifecycle-errors.md +++ b/rh-virt/docs/troubleshooting/lifecycle-errors.md @@ -439,7 +439,7 @@ oc describe vm -n | grep -A 20 "Events:" Extract `.spec.domain.resources.requests.memory`. - If too low, delete VM and recreate with larger instance type (e.g., change from "small" to "medium" in vm-creator). + If too low, delete VM and recreate with larger instance type (e.g., change from "small" to "medium" in vm-create). **CLI Fallback** (if MCP unavailable): ```bash diff --git a/rh-virt/docs/troubleshooting/network-errors.md b/rh-virt/docs/troubleshooting/network-errors.md index a3e920e..82fa313 100644 --- a/rh-virt/docs/troubleshooting/network-errors.md +++ b/rh-virt/docs/troubleshooting/network-errors.md @@ -26,7 +26,7 @@ This document covers VM secondary network attachment failures using Multus CNI a - Multus CNI failures - SR-IOV device attachment issues -**Skills that use this**: vm-creator +**Skills that use this**: vm-create --- @@ -364,7 +364,7 @@ oc describe pod virt-launcher--xxx -n 7. **Recreate VM with corrected network configuration** (if needed): - If network attachment is fundamentally broken, delete and recreate VM with correct NAD references using vm-creator skill. + If network attachment is fundamentally broken, delete and recreate VM with correct NAD references using vm-create skill. **Verification** (Use MCP Tools First): diff --git a/rh-virt/docs/troubleshooting/runtime-errors.md b/rh-virt/docs/troubleshooting/runtime-errors.md index e6f3099..804d3cb 100644 --- a/rh-virt/docs/troubleshooting/runtime-errors.md +++ b/rh-virt/docs/troubleshooting/runtime-errors.md @@ -25,7 +25,7 @@ This document covers VM runtime failures where the virt-launcher pod or guest OS - virt-launcher pod repeatedly restarting - Guest OS kernel panics on boot -**Skills that use this**: vm-creator, vm-lifecycle-manager +**Skills that use this**: vm-create, vm-lifecycle-manager --- @@ -278,7 +278,7 @@ virtctl console -n Extract `.spec.domain.resources.requests.memory`. - If too high for node, delete and recreate with smaller instance type using vm-creator skill (change from "large" to "medium" or "small"). + If too high for node, delete and recreate with smaller instance type using vm-create skill (change from "large" to "medium" or "small"). **CLI Fallback** (if MCP unavailable): ```bash @@ -334,7 +334,7 @@ virtctl console -n 7. **Simplify VM configuration** (eliminate variables): - Try creating minimal VM using vm-creator skill with: + Try creating minimal VM using vm-create skill with: - Small instance type - No secondary networks - Simple container disk (e.g., Fedora) @@ -344,7 +344,7 @@ virtctl console -n 8. **Recreate VM with different workload** (test disk image): - If guest OS consistently crashes, use vm-creator skill to try different OS image (e.g., switch from Ubuntu to Fedora). This tests if issue is workload-specific. + If guest OS consistently crashes, use vm-create skill to try different OS image (e.g., switch from Ubuntu to Fedora). This tests if issue is workload-specific. **Verification** (Use MCP Tools First): diff --git a/rh-virt/docs/troubleshooting/scheduling-errors.md b/rh-virt/docs/troubleshooting/scheduling-errors.md index 0c216f5..e47c8c4 100644 --- a/rh-virt/docs/troubleshooting/scheduling-errors.md +++ b/rh-virt/docs/troubleshooting/scheduling-errors.md @@ -27,7 +27,7 @@ This document covers VM scheduling failures where the Kubernetes scheduler canno - VM shows status `ErrorUnschedulable` after creation or start attempt - VM events mention scheduling failures, taints, resources, or node selectors -**Skills that use this**: vm-creator, vm-lifecycle-manager +**Skills that use this**: vm-create, vm-lifecycle-manager --- @@ -280,7 +280,7 @@ oc describe vm -n | grep "Insufficient" **Solutions** (Use MCP Tools First): 1. **Scale cluster** - Add more worker nodes (cluster admin task, no MCP tool) -2. **Reduce VM resources** - Delete and recreate with smaller instance type using vm-creator skill +2. **Reduce VM resources** - Delete and recreate with smaller instance type using vm-create skill 3. **Delete unused VMs** - Use vm-delete skill to free up resources 4. **Check resource quotas**: diff --git a/rh-virt/docs/troubleshooting/storage-errors.md b/rh-virt/docs/troubleshooting/storage-errors.md index 0207804..16ba584 100644 --- a/rh-virt/docs/troubleshooting/storage-errors.md +++ b/rh-virt/docs/troubleshooting/storage-errors.md @@ -29,7 +29,7 @@ This document covers VM storage-related failures including storage provisioning, - DataVolume cloning operations fail - PVC provisioning issues -**Skills that use this**: vm-creator, vm-delete, vm-clone +**Skills that use this**: vm-create, vm-delete, vm-clone --- @@ -1001,7 +1001,7 @@ oc get pvc -rootdisk -n -o jsonpath='{.status.phas # Should return: Bound ``` -**Alternative**: If cloning continuously fails, use vm-creator skill to create new VM with container disk or DataSource instead. +**Alternative**: If cloning continuously fails, use vm-create skill to create new VM with container disk or DataSource instead. --- diff --git a/rh-virt/skills/vm-clone/SKILL.md b/rh-virt/skills/vm-clone/SKILL.md index e832613..42f2c75 100644 --- a/rh-virt/skills/vm-clone/SKILL.md +++ b/rh-virt/skills/vm-clone/SKILL.md @@ -19,35 +19,6 @@ color: blue Clone existing virtual machines in OpenShift Virtualization, creating new VMs with copied configuration and optional storage cloning. This skill is ideal for creating test environments, scaling workloads, or duplicating VM templates. -## Critical: Human-in-the-Loop Requirements - -**IMPORTANT:** This skill creates new resources that consume cluster capacity. You MUST: - -1. **Before Cloning** - - Verify source VM exists and get full configuration - - Ask user for clone configuration (name, namespace, storage strategy) - - Present clone preview with resource impact - - Wait for explicit user confirmation - -2. **Configuration Confirmation** - - Display source VM details - - Show target VM configuration - - Indicate storage cloning strategy - - Estimate resource consumption (CPU, memory, storage) - - Ask: "Proceed with VM cloning? (yes/no)" - - Wait for explicit "yes" - -3. **Never Auto-Execute** - - **NEVER clone without user confirmation** - - **NEVER assume storage strategy** - always ask user - - **NEVER proceed if user says "no", "wait", "cancel"** - -**Why This Matters:** -- **Resource Consumption**: Clones consume cluster resources (CPU, memory, storage) -- **Storage Costs**: Storage cloning can consume significant disk space -- **Naming Conflicts**: Duplicate names cause errors -- **Network Configuration**: May need adjustment for clones - ## Prerequisites **Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) @@ -149,7 +120,7 @@ Please respond with your choice. - "/vm-clone" (explicit command) **Do NOT use this skill when:** -- User wants to create a new VM from scratch → Use `/vm-creator` skill instead +- User wants to create a new VM from scratch → Use `/vm-create` skill instead - User wants a point-in-time backup → Use snapshots instead - User wants to move/migrate a VM → Use migration tools instead - User wants to resize a VM → Modify existing VM instead @@ -867,7 +838,7 @@ Modifications could include: - Source: https://github.com/openshift/openshift-mcp-server/blob/main/pkg/toolsets/core/resources.go ### Related Skills -- `vm-creator` - Create new VMs from scratch (alternative to cloning) +- `vm-create` - Create new VMs from scratch (alternative to cloning) - `vm-inventory` - List and verify source/target VMs - `vm-lifecycle-manager` - Start cloned VMs after creation - `vm-delete` - Clean up failed clones or unwanted copies @@ -875,11 +846,40 @@ Modifications could include: ### Reference Documentation - [storage-errors.md](../../docs/troubleshooting/storage-errors.md) - VM cloning failure scenarios, storage provisioning issues, and DataVolume cloning errors (optionally consulted when cloning operations fail) - [Troubleshooting INDEX](../../docs/troubleshooting/INDEX.md) - Navigation hub for discovering additional error categories when encountering unexpected issues outside the categories above -- [OpenShift Virtualization Cloning](https://docs.openshift.com/container-platform/latest/virt/virtual_machines/cloning_vms/virt-cloning-vm.html) +- [OpenShift Virtualization Cloning](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/virtual_machines/cloning_vms/virt-cloning-vm.html) - [DataVolume Cloning](https://github.com/kubevirt/containerized-data-importer/blob/main/doc/datavolumes.md#cloning) - [KubeVirt VirtualMachine API](https://kubevirt.io/api-reference/) - [CSI Volume Cloning](https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/) +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill creates new resources that consume cluster capacity. You MUST: + +1. **Before Cloning** + - Verify source VM exists and get full configuration + - Ask user for clone configuration (name, namespace, storage strategy) + - Present clone preview with resource impact + - Wait for explicit user confirmation + +2. **Configuration Confirmation** + - Display source VM details + - Show target VM configuration + - Indicate storage cloning strategy + - Estimate resource consumption (CPU, memory, storage) + - Ask: "Proceed with VM cloning? (yes/no)" + - Wait for explicit "yes" + +3. **Never Auto-Execute** + - **NEVER clone without user confirmation** + - **NEVER assume storage strategy** - always ask user + - **NEVER proceed if user says "no", "wait", "cancel"** + +**Why This Matters:** +- **Resource Consumption**: Clones consume cluster resources (CPU, memory, storage) +- **Storage Costs**: Storage cloning can consume significant disk space +- **Naming Conflicts**: Duplicate names cause errors +- **Network Configuration**: May need adjustment for clones + ## Security Considerations - **RBAC Enforcement**: Requires create permissions in target namespace diff --git a/rh-virt/skills/vm-creator/SKILL.md b/rh-virt/skills/vm-create/SKILL.md similarity index 94% rename from rh-virt/skills/vm-creator/SKILL.md rename to rh-virt/skills/vm-create/SKILL.md index a710af9..153b6fe 100644 --- a/rh-virt/skills/vm-creator/SKILL.md +++ b/rh-virt/skills/vm-create/SKILL.md @@ -1,5 +1,5 @@ --- -name: vm-creator +name: vm-create description: | Create new virtual machines in OpenShift Virtualization with automatic instance type resolution and OS selection. Use this skill when users request: - "Create a new VM" @@ -9,32 +9,16 @@ description: | This skill handles VM creation with intelligent defaults for OpenShift Virtualization. + NOT for managing existing VMs (use vm-lifecycle-manager or vm-delete instead). + model: inherit color: green --- -# /vm-creator Skill +# /vm-create Skill Create virtual machines in OpenShift Virtualization using the `vm_create` tool from the openshift-virtualization MCP server. -## Critical: Human-in-the-Loop Requirements - -**IMPORTANT:** This skill requires explicit user confirmation before creating VMs. You MUST: - -1. **Wait for user confirmation** on all VM configuration parameters before executing `vm_create`. -2. **Do NOT proceed** with VM creation until the user explicitly approves the configuration. -3. **Present configuration clearly** in a table format and wait for user response. -4. **Never auto-execute** VM creation without user approval - creating VMs is a additive operation that consumes cluster resources, affects quotas and may incur cost. -5. **Allow modifications** - If user wants to change parameters, update and re-confirm before proceeding. - -If the user says "no" or wants modifications, address their concerns before proceeding. - -**Why this matters:** -- VM creation consumes cluster resources (CPU, memory, storage) -- VMs persist until explicitly deleted -- Incorrect configuration can impact cluster performance -- User should verify namespace, sizing, and other parameters - ## Prerequisites **Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) @@ -72,7 +56,7 @@ If the user says "no" or wants modifications, address their concerns before proc When prerequisites fail: ``` -❌ Cannot execute vm-creator: MCP server 'openshift-virtualization' is not available +❌ Cannot execute vm-create: MCP server 'openshift-virtualization' is not available 📋 Setup Instructions: 1. Add openshift-virtualization to .mcp.json: @@ -120,7 +104,7 @@ Please respond with your choice. ## When to Use This Skill **Trigger this skill when:** -- User explicitly invokes `/vm-creator` command +- User explicitly invokes `/vm-create` command - User requests creating a new virtual machine - Deploying VMs with specific OS (Fedora, Ubuntu, RHEL, CentOS, Debian) - Setting up VMs with custom sizing (small, medium, large) @@ -130,54 +114,13 @@ Please respond with your choice. - "Create a Fedora VM in namespace vms" - "Deploy a medium Ubuntu VM with 100Gi disk" - "Set up a RHEL VM called database-01" -- "/vm-creator" (explicit command) +- "/vm-create" (explicit command) **Do NOT use this skill when:** - User wants to start/stop existing VMs → Use `/vm-lifecycle-manager` skill instead - User wants to list VMs → Use `/vm-inventory` skill instead - User only wants information about VMs (not creation) → Use `/vm-inventory` skill instead -## CRITICAL: MCP Tools First Policy - -**MANDATORY REQUIREMENT**: You MUST ALWAYS use MCP tools from the openshift-virtualization server for ALL cluster operations. - -**MCP Tools Available:** -- `namespaces_list` - List all namespaces -- `resources_list` - List resources (StorageClass, VirtualMachine, etc.) -- `resources_get` - Get specific resource details -- `resources_create_or_update` - Create or update resources -- `resources_delete` - Delete resources -- `pods_list` - List pods -- `pods_exec` - Execute commands in pods -- `events_list` - List cluster events -- And many more... - -**Policy:** -1. **ALWAYS check if an MCP tool exists** for the operation you need to perform -2. **ONLY use kubectl/oc CLI commands** when: - - No equivalent MCP tool exists for that specific operation - - The MCP tool has been tried and failed - - You have explicit confirmation that the MCP approach is not possible - -**Examples:** -- ❌ WRONG: `kubectl get namespaces` → ✅ CORRECT: Use `namespaces_list` MCP tool -- ❌ WRONG: `kubectl get storageclass -o json` → ✅ CORRECT: Use `resources_list` MCP tool with apiVersion="storage.k8s.io/v1", kind="StorageClass" -- ❌ WRONG: `kubectl get vm -n ` → ✅ CORRECT: Use `resources_get` MCP tool with apiVersion="kubevirt.io/v1", kind="VirtualMachine" -- ❌ WRONG: `kubectl config view --minify` → ⚠️ ACCEPTABLE: No MCP equivalent exists for kubeconfig context detection - -**Why this matters:** -- MCP tools provide structured, validated outputs -- Better error handling and user experience -- Consistent interface across all operations -- Reduced dependency on CLI tools -- Better integration with Claude Code environment - -**If you catch yourself about to use kubectl/oc:** -1. STOP -2. Check the available MCP tools list above -3. Use the MCP tool instead -4. Only proceed with kubectl/oc if absolutely no alternative exists - ## Workflow ### Step 1: Gather VM Requirements and Confirm Configuration @@ -997,7 +940,7 @@ Would you like me to display the default credentials for the VM? (yes/no) Would you like help troubleshooting this error? ``` -## Advanced Usage +## Advanced Features ### Custom Container Disk Images @@ -1130,11 +1073,29 @@ vm_create({ - [runtime-errors.md](../../docs/troubleshooting/runtime-errors.md) - CrashLoopBackOff and guest OS failures - [network-errors.md](../../docs/troubleshooting/network-errors.md) - Network attachment failures - [Troubleshooting INDEX](../../docs/troubleshooting/INDEX.md) - Navigation hub for discovering additional error categories when encountering unexpected issues outside the categories above -- [OpenShift Virtualization Documentation](https://docs.openshift.com/container-platform/latest/virt/about_virt/about-virt.html) -- [Troubleshooting VMs](https://docs.openshift.com/container-platform/latest/virt/virtual_machines/troubleshooting_vms.html) +- [OpenShift Virtualization Documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/about_virt/about-virt.html) +- [Troubleshooting VMs](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/virtual_machines/troubleshooting_vms.html) - [KubeVirt VirtualMachine API](https://kubevirt.io/api-reference/) - [OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server) -- [StorageClass Documentation](https://docs.openshift.com/container-platform/latest/storage/understanding-persistent-storage.html) +- [StorageClass Documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#storage/understanding-persistent-storage.html) + +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill requires explicit user confirmation before creating VMs. You MUST: + +1. **Wait for user confirmation** on all VM configuration parameters before executing `vm_create`. +2. **Do NOT proceed** with VM creation until the user explicitly approves the configuration. +3. **Present configuration clearly** in a table format and wait for user response. +4. **Never auto-execute** VM creation without user approval - creating VMs is a additive operation that consumes cluster resources, affects quotas and may incur cost. +5. **Allow modifications** - If user wants to change parameters, update and re-confirm before proceeding. + +If the user says "no" or wants modifications, address their concerns before proceeding. + +**Why this matters:** +- VM creation consumes cluster resources (CPU, memory, storage) +- VMs persist until explicitly deleted +- Incorrect configuration can impact cluster performance +- User should verify namespace, sizing, and other parameters ## Security Considerations @@ -1152,7 +1113,7 @@ vm_create({ ``` User: "Create a VM" -Agent: [Invokes /vm-creator skill] +Agent: [Invokes /vm-create skill] [Detects current namespace from kubeconfig: "production"] [Queries available StorageClasses, finds default: "ocs-storagecluster-ceph-rbd"] [No parameters provided, uses AskUserQuestion for full interactive menu] @@ -1215,7 +1176,7 @@ Next Steps: "Start VM my-app-server in namespace production" ``` User: "Create an Ubuntu VM called web-01 in namespace apps" -Agent: [Invokes /vm-creator skill] +Agent: [Invokes /vm-create skill] [Detects VM name="web-01", namespace="apps", OS="ubuntu"] [Queries StorageClasses for menu] [Uses AskUserQuestion only for: Performance, Size, Storage, Storage Class, Autostart] @@ -1273,7 +1234,7 @@ VM Details: ``` User: "Deploy a memory-optimized RHEL VM called db-server in production namespace" -Agent: [Invokes /vm-creator skill] +Agent: [Invokes /vm-create skill] [Detects: name="db-server", namespace="production", OS="rhel", performance="m1" (memory-optimized)] [Uses AskUserQuestion for: Size, Storage, Storage Class, Autostart] @@ -1336,7 +1297,7 @@ VM Details: ``` User: "Create a VM named apache-server in namespace vms" -Agent: [Invokes /vm-creator skill] +Agent: [Invokes /vm-create skill] [Validates name: "apache-server" ✓] [Checks if VM exists...] @@ -1375,7 +1336,7 @@ Agent: [Validates new name: "apache-server-02" ✓] ``` User: "Create a VM named test in namespace dev" -Agent: [Invokes /vm-creator skill] +Agent: [Invokes /vm-create skill] [Interactive menu presented...] [Processes responses] diff --git a/rh-virt/skills/vm-delete/SKILL.md b/rh-virt/skills/vm-delete/SKILL.md index f1d7068..a2639d8 100644 --- a/rh-virt/skills/vm-delete/SKILL.md +++ b/rh-virt/skills/vm-delete/SKILL.md @@ -19,50 +19,6 @@ color: red Permanently delete virtual machines and their associated resources (storage, DataVolumes) from OpenShift Virtualization clusters. This skill enforces strict safety protocols including typed confirmation and pre-deletion validation. -## Critical: Human-in-the-Loop Requirements - -**CRITICAL: This skill performs IRREVERSIBLE operations.** You MUST: - -1. **Pre-Deletion Validation** (Execute FIRST, before asking for confirmation) - - Verify VM exists using `resources_get` - - Check VM running state - - Discover dependent resources (DataVolumes, PVCs) - - Check for protection labels (`protected: "true"`) - -2. **Safety Checks** - - **REFUSE deletion** if VM has label `protected: "true"` - - **REQUIRE VM to be stopped first** if currently running - - **List all resources** that will be deleted (VM, storage, DataVolumes) - -3. **Typed Confirmation Protocol** (MANDATORY) - - Display complete deletion scope - - **Require user to type the exact VM name** to confirm - - Accept only exact match (case-sensitive) - - If name doesn't match → Cancel operation, do not proceed - -4. **Deletion Options** - - Ask user what to delete: - - Option 1: VM only (preserve storage) - - Option 2: VM + storage (complete cleanup) - - Option 3: Cancel - - **NEVER assume** which option user wants - -5. **Final Confirmation Before Each Deletion** - - After typed confirmation, show exactly what will be deleted - - Ask: "Proceed with permanent deletion? (yes/cancel)" - - Wait for explicit "yes" - -6. **Never Auto-Execute** - - **NEVER delete without explicit typed confirmation** - - **NEVER proceed if user says "no", "wait", "cancel"** - - **NEVER skip the typed verification step** - -**Why This Matters:** -- **Permanent**: Deleted VMs cannot be recovered -- **Data Loss**: Storage deletion destroys all VM data -- **Service Impact**: Deleting running VMs causes immediate service outage -- **Accidental Deletion**: Typed verification prevents mistakes (typos, wrong VM name) - ## Prerequisites **Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) @@ -166,7 +122,7 @@ Please respond with your choice. **Do NOT use this skill when:** - User wants to stop a VM temporarily → Use `/vm-lifecycle-manager` skill instead -- User wants to create a VM → Use `/vm-creator` skill instead +- User wants to create a VM → Use `/vm-create` skill instead - User wants to view VMs → Use `/vm-inventory` skill instead - User wants to pause or suspend VM → Use lifecycle management (not deletion) @@ -910,18 +866,62 @@ Names do not match. Deletion cancelled for safety. ### Related Skills - `vm-inventory` - Verify VM exists and get details before deletion - `vm-lifecycle-manager` - Stop running VMs before deletion (used internally by vm-delete) -- `vm-creator` - Create VMs after cleanup operations +- `vm-create` - Create VMs after cleanup operations ### Reference Documentation - [lifecycle-errors.md](../../docs/troubleshooting/lifecycle-errors.md) - VM deletion failure scenarios, finalizer issues, and stuck Terminating states (optionally consulted when deletion operations fail) - [storage-errors.md](../../docs/troubleshooting/storage-errors.md) - Storage deletion strategies and PVC cleanup procedures (optionally consulted when storage deletion fails) - [Troubleshooting INDEX](../../docs/troubleshooting/INDEX.md) - Navigation hub for discovering additional error categories when encountering unexpected issues outside the categories above -- [OpenShift Virtualization Documentation](https://docs.openshift.com/container-platform/latest/virt/about_virt/about-virt.html) +- [OpenShift Virtualization Documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/about_virt/about-virt.html) - [KubeVirt VirtualMachine API](https://kubevirt.io/api-reference/) - [Kubernetes Finalizers](https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/) - [PVC Deletion](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#delete) - [OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server) +## Critical: Human-in-the-Loop Requirements + +**CRITICAL: This skill performs IRREVERSIBLE operations.** You MUST: + +1. **Pre-Deletion Validation** (Execute FIRST, before asking for confirmation) + - Verify VM exists using `resources_get` + - Check VM running state + - Discover dependent resources (DataVolumes, PVCs) + - Check for protection labels (`protected: "true"`) + +2. **Safety Checks** + - **REFUSE deletion** if VM has label `protected: "true"` + - **REQUIRE VM to be stopped first** if currently running + - **List all resources** that will be deleted (VM, storage, DataVolumes) + +3. **Typed Confirmation Protocol** (MANDATORY) + - Display complete deletion scope + - **Require user to type the exact VM name** to confirm + - Accept only exact match (case-sensitive) + - If name doesn't match → Cancel operation, do not proceed + +4. **Deletion Options** + - Ask user what to delete: + - Option 1: VM only (preserve storage) + - Option 2: VM + storage (complete cleanup) + - Option 3: Cancel + - **NEVER assume** which option user wants + +5. **Final Confirmation Before Each Deletion** + - After typed confirmation, show exactly what will be deleted + - Ask: "Proceed with permanent deletion? (yes/cancel)" + - Wait for explicit "yes" + +6. **Never Auto-Execute** + - **NEVER delete without explicit typed confirmation** + - **NEVER proceed if user says "no", "wait", "cancel"** + - **NEVER skip the typed verification step** + +**Why This Matters:** +- **Permanent**: Deleted VMs cannot be recovered +- **Data Loss**: Storage deletion destroys all VM data +- **Service Impact**: Deleting running VMs causes immediate service outage +- **Accidental Deletion**: Typed verification prevents mistakes (typos, wrong VM name) + ## Security Considerations - **RBAC Enforcement**: Deletion requires explicit RBAC permissions for VirtualMachine, PVC, and DataVolume resources diff --git a/rh-virt/skills/vm-inventory/SKILL.md b/rh-virt/skills/vm-inventory/SKILL.md index 195bc3a..53dc1e4 100644 --- a/rh-virt/skills/vm-inventory/SKILL.md +++ b/rh-virt/skills/vm-inventory/SKILL.md @@ -9,6 +9,8 @@ description: | This skill provides comprehensive VM inventory and status reporting. + NOT for creating or modifying VMs (use vm-create or vm-lifecycle-manager instead). + model: inherit color: cyan --- @@ -17,23 +19,6 @@ color: cyan List and inspect virtual machines in OpenShift Virtualization clusters. This skill provides read-only access to VM information without making any modifications. -## Critical: Human-in-the-Loop Requirements - -**Not applicable** - This skill performs read-only operations and does not modify any cluster resources. No user confirmation is required. - -**Read-only operations:** -- Listing VirtualMachines across namespaces or in specific namespaces -- Retrieving VM details, status, and resource configurations -- Displaying VM health conditions and resource usage -- Filtering VMs by labels or field selectors -- Viewing VM network, storage, and node placement information - -**No modifications performed:** -- ✓ Does not change VM state (start/stop/restart) -- ✓ Does not modify VM configuration -- ✓ Does not delete VMs or resources -- ✓ Does not consume cluster resources - ## Prerequisites **Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) @@ -145,7 +130,7 @@ Please respond with your choice. - "/vm-inventory" (explicit command) **Do NOT use this skill when:** -- User wants to create a VM → Use `/vm-creator` skill instead +- User wants to create a VM → Use `/vm-create` skill instead - User wants to start/stop VMs → Use `/vm-lifecycle-manager` skill instead - User wants to modify VM configuration → Different operation (not inventory) @@ -523,7 +508,7 @@ Found 3 VMs: - web-dev-01 (development) - Running ``` -## Common Queries and Responses +## Common Issues ### "Show me all running VMs" @@ -680,7 +665,7 @@ No VMs were found in this namespace. - Insufficient permissions to view VMs **Next steps:** -- Create a VM: Use /vm-creator skill +- Create a VM: Use /vm-create skill - List all namespaces: "Show me all namespaces" - Check permissions: `oc auth can-i list virtualmachines -n production` ``` @@ -707,7 +692,7 @@ No VMs were found in this namespace. ## Integration with Other Skills -**Before creating a VM** (vm-creator): +**Before creating a VM** (vm-create): - Use vm-inventory to check if VM name already exists - Verify namespace exists and has capacity @@ -742,17 +727,34 @@ No VMs were found in this namespace. **Important**: Always attempt MCP tools first. Only use CLI commands after MCP tool failure and with user confirmation. ### Related Skills -- `vm-creator` - Create VMs after checking inventory +- `vm-create` - Create VMs after checking inventory - `vm-lifecycle-manager` - Manage VMs discovered in inventory - `vm-troubleshooter` (planned) - Diagnose problematic VMs from inventory ### Reference Documentation - [Troubleshooting INDEX](../../docs/troubleshooting/INDEX.md) - VM status interpretation and navigation hub for discovering error-specific troubleshooting guides (optionally consulted when displaying VM details with error states) -- [OpenShift Virtualization Documentation](https://docs.openshift.com/container-platform/latest/virt/about_virt/about-virt.html) +- [OpenShift Virtualization Documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/about_virt/about-virt.html) - [KubeVirt VirtualMachine API](https://kubevirt.io/api-reference/) -- [Accessing VMs](https://docs.openshift.com/container-platform/latest/virt/virtual_machines/virt-accessing-vm-consoles.html) +- [Accessing VMs](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/virtual_machines/virt-accessing-vm-consoles.html) - [VM Status Conditions](https://kubevirt.io/user-guide/virtual_machines/vm_status_conditions/) +## Critical: Human-in-the-Loop Requirements + +**Not applicable** - This skill performs read-only operations and does not modify any cluster resources. No user confirmation is required. + +**Read-only operations:** +- Listing VirtualMachines across namespaces or in specific namespaces +- Retrieving VM details, status, and resource configurations +- Displaying VM health conditions and resource usage +- Filtering VMs by labels or field selectors +- Viewing VM network, storage, and node placement information + +**No modifications performed:** +- ✓ Does not change VM state (start/stop/restart) +- ✓ Does not modify VM configuration +- ✓ Does not delete VMs or resources +- ✓ Does not consume cluster resources + ## Security Considerations - Read-only operations - no modifications to VMs @@ -906,5 +908,5 @@ To create a VM: "Create a VM in namespace test" ``` -The /vm-creator skill will help you set up a new virtual machine. +The /vm-create skill will help you set up a new virtual machine. ``` diff --git a/rh-virt/skills/vm-lifecycle-manager/SKILL.md b/rh-virt/skills/vm-lifecycle-manager/SKILL.md index daa60e0..6051b3d 100644 --- a/rh-virt/skills/vm-lifecycle-manager/SKILL.md +++ b/rh-virt/skills/vm-lifecycle-manager/SKILL.md @@ -9,6 +9,8 @@ description: | This skill handles VM state transitions safely with user confirmation for each action. + NOT for creating VMs (use vm-create) or deleting VMs (use vm-delete). + model: inherit color: blue --- @@ -17,24 +19,6 @@ color: blue Control virtual machine power state and lifecycle operations in OpenShift Virtualization using the `vm_lifecycle` tool from the openshift-virtualization MCP server. -## Critical: Human-in-the-Loop Requirements - -**IMPORTANT:** This skill requires explicit user confirmation before executing lifecycle actions. You MUST: - -1. **Wait for user confirmation** on the specific action (start/stop/restart) before executing `vm_lifecycle` -2. **Do NOT proceed** with the lifecycle operation until the user explicitly approves -3. **Present the action clearly** with VM name, namespace, and intended state change -4. **Never auto-execute** VM lifecycle changes without user approval - these operations impact running services -5. **Warn about consequences** - stopping VMs interrupts services, starting consumes resources, restarting causes brief downtime - -If the user says "no" or wants to reconsider, do not proceed with the operation. - -**Why this matters:** -- **Start**: Consumes cluster resources (CPU, memory), affects resource availability for other VMs -- **Stop**: Interrupts running services and applications, may cause service downtime -- **Restart**: Causes brief service interruption, running processes will be terminated -- User should verify they're targeting the correct VM and understand the impact - ## Prerequisites **Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) @@ -134,7 +118,7 @@ Please respond with your choice. - "/vm-lifecycle-manager" (explicit command) **Do NOT use this skill when:** -- User wants to create a new VM → Use `/vm-creator` skill instead +- User wants to create a new VM → Use `/vm-create` skill instead - User wants to list VMs → Use `/vm-inventory` skill instead - User wants to delete a VM → Different operation (not lifecycle management) @@ -405,7 +389,7 @@ Would you like help troubleshooting this error? 1. Check cluster resource availability 2. Stop other VMs first to free resources 3. Consider scaling cluster nodes -4. Resize VM to smaller instance type (using `/vm-creator` to recreate) +4. Resize VM to smaller instance type (using `/vm-create` to recreate) ## Understanding RunStrategy @@ -424,7 +408,7 @@ When you execute lifecycle actions, the VM's `runStrategy` changes: **Note**: This skill sets `Always` for start/restart and `Halted` for stop. -## Advanced Usage +## Advanced Features ### Batch Operations @@ -476,7 +460,7 @@ User: "Start web-server if it's not running" - `vm_lifecycle` (from openshift-virtualization) - Manage VM power state (start/stop/restart) ### Related Skills -- `vm-creator` - Create VMs before managing their lifecycle +- `vm-create` - Create VMs before managing their lifecycle - `vm-inventory` - Check current VM status before lifecycle operations - `vm-troubleshooter` (planned) - Diagnose VM startup/shutdown issues @@ -484,11 +468,29 @@ User: "Start web-server if it's not running" - [lifecycle-errors.md](../../docs/troubleshooting/lifecycle-errors.md) - VM start/stop failures and stuck transitions (optionally consulted when lifecycle operations fail) - [scheduling-errors.md](../../docs/troubleshooting/scheduling-errors.md) - ErrorUnschedulable and resource constraint errors (optionally consulted when VM won't start) - [Troubleshooting INDEX](../../docs/troubleshooting/INDEX.md) - Navigation hub for discovering additional error categories when encountering unexpected issues outside the categories above -- [OpenShift Virtualization Documentation](https://docs.openshift.com/container-platform/latest/virt/about_virt/about-virt.html) +- [OpenShift Virtualization Documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt/about_virt/about-virt.html) - [KubeVirt VirtualMachine Lifecycle](https://kubevirt.io/user-guide/virtual_machines/lifecycle/) - [RunStrategy Documentation](https://kubevirt.io/user-guide/virtual_machines/run_strategies/) - [OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server) +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill requires explicit user confirmation before executing lifecycle actions. You MUST: + +1. **Wait for user confirmation** on the specific action (start/stop/restart) before executing `vm_lifecycle` +2. **Do NOT proceed** with the lifecycle operation until the user explicitly approves +3. **Present the action clearly** with VM name, namespace, and intended state change +4. **Never auto-execute** VM lifecycle changes without user approval - these operations impact running services +5. **Warn about consequences** - stopping VMs interrupts services, starting consumes resources, restarting causes brief downtime + +If the user says "no" or wants to reconsider, do not proceed with the operation. + +**Why this matters:** +- **Start**: Consumes cluster resources (CPU, memory), affects resource availability for other VMs +- **Stop**: Interrupts running services and applications, may cause service downtime +- **Restart**: Causes brief service interruption, running processes will be terminated +- User should verify they're targeting the correct VM and understand the impact + ## Security Considerations - VM lifecycle changes require RBAC permissions (update VirtualMachine resources) diff --git a/rh-virt/skills/vm-snapshot-create/SKILL.md b/rh-virt/skills/vm-snapshot-create/SKILL.md new file mode 100644 index 0000000..3c927bd --- /dev/null +++ b/rh-virt/skills/vm-snapshot-create/SKILL.md @@ -0,0 +1,423 @@ +--- +name: vm-snapshot-create +description: | + Create virtual machine snapshots for backup and recovery with comprehensive storage backend analysis. + + Use when: + - "Create a snapshot of VM [name]" + - "Backup VM [name] before upgrade" + - "Take a snapshot of [vm]" + + Analyzes storage capabilities, guest agent status, and consistency guarantees before creating snapshots. + + NOT for VM cloning (use vm-clone to create independent copies). + +model: inherit +color: green +--- + +# /vm-snapshot-create Skill + +Create virtual machine snapshots in OpenShift Virtualization. Snapshots capture the state and data of a VM at a specific point in time, enabling backup, recovery, and testing workflows. + +**Implementation Note**: This skill uses generic Kubernetes resource tools (`resources_create_or_update`) to manage VirtualMachineSnapshot resources. Dedicated snapshot tools do not currently exist in the openshift-virtualization MCP server. + +## Prerequisites + +**Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) + +**Required MCP Tools**: +- `resources_create_or_update` (from openshift-virtualization) - Create VirtualMachineSnapshot +- `resources_get` (from openshift-virtualization) - Verify VM exists and get status +- `resources_list` (from openshift-virtualization) - List StorageClass, VolumeSnapshotClass + +**Required Environment Variables**: +- `KUBECONFIG` - Path to Kubernetes configuration file with cluster access + +**Required Cluster Setup**: +- OpenShift cluster (>= 4.19) +- OpenShift Virtualization operator installed +- ServiceAccount with RBAC permissions to create VirtualMachineSnapshot resources +- Storage backend with snapshot support (CSI driver with snapshot capabilities) + +## When to Use This Skill + +**Trigger this skill when:** +- User wants to create a backup of a VM before changes +- User wants to create a recovery point +- User explicitly requests snapshot creation + +**User phrases that trigger this skill:** +- "Create a snapshot of VM database-01" +- "Backup VM web-server before upgrade" +- "Take a snapshot of production-app" + +**Do NOT use this skill when:** +- User wants to clone a VM → Use `vm-clone` skill (creates independent copy) +- User wants to list snapshots → Use `vm-snapshot-list` skill +- User wants to restore from snapshot → Use `vm-snapshot-restore` skill + +## Workflow + +### Step 1: Gather Snapshot Information + +**Required Information from User:** +1. **VM Name** - Name of the VM to snapshot +2. **Namespace** - Namespace where VM exists +3. **Snapshot Name** (Optional) - Name for the snapshot (auto-generated if not provided) + +If namespace not provided, ask for it explicitly. + +### Step 2: Verify VM Exists and Get Status + +**MCP Tool**: `resources_get` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "kubevirt.io/v1", + "kind": "VirtualMachine", + "namespace": "", + "name": "" +} +``` + +**Expected Output**: VirtualMachine resource with status + +**Error Handling**: +- If VM not found → Report error, suggest using vm-inventory skill +- If permission denied → Report RBAC error + +**Extract VM Details:** +- Current status (Running, Stopped) +- Storage configuration (DataVolumes, PVCs) +- **IMPORTANT**: Save `status.volumeSnapshotStatuses` for storage analysis + +### Step 3: Verify Storage Snapshot Capabilities + +**CRITICAL: This comprehensive storage analysis MUST execute BEFORE asking user about VM running state.** + +This step analyzes storage backend capabilities to determine snapshot behavior and requirements. The analysis includes 9 substeps. + +[Continue with all 9 substeps from the original file: 1c.1 through 1c.9, checking volume snapshot status, hot-plugged volumes, storage class, VolumeSnapshotClass, CSI driver capabilities, guest agent status, Windows VSS, and storing analysis results] + +### Step 4: Check VM Running State (Enhanced with Storage Analysis) + +**From the VM resource in Step 2**, check `status.printableStatus`. + +**Use storage analysis results from Step 3** to provide accurate guidance. + +[Include the three scenarios: VM must be stopped, VM can run (online supported), VM is stopped - with all the guest agent and Windows VSS warnings] + +### Step 5: Stop Running VM (if user chose "stop-and-snapshot") + +**ONLY execute if user chose "stop-and-snapshot" in Step 4.** + +Use `vm_lifecycle` MCP tool or vm-lifecycle-manager skill to stop the VM. + +### Step 6: Estimate Storage Consumption + +**From the VM resource**, estimate snapshot storage: +- Initial snapshot may be same size as VM disk +- Subsequent snapshots smaller (only deltas) + +### Step 7: Present Snapshot Configuration for Confirmation + +**Include storage analysis results from Step 3 in the configuration presentation.** + +[Include the full confirmation template with storage backend analysis, guest agent status, volumes to snapshot, etc.] + +**Wait for user confirmation.** + +**Handle response:** +- If "yes" → Proceed to Step 8 (execute snapshot) +- If "no", "cancel", or anything else → Cancel operation + +### Step 8: Create the Snapshot + +**ONLY PROCEED AFTER user confirmation in Step 7.** + +**MCP Tool**: `resources_create_or_update` (from openshift-virtualization) + +**Construct VirtualMachineSnapshot YAML:** + +```yaml +apiVersion: snapshot.kubevirt.io/v1beta1 +kind: VirtualMachineSnapshot +metadata: + name: + namespace: +spec: + source: + apiGroup: kubevirt.io + kind: VirtualMachine + name: +``` + +**If snapshot name not provided by user**, generate one: +- Format: `-snapshot-` +- Example: `database-01-snapshot-20260218-143022` + +**Parameters**: +```json +{ + "resource": "apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineSnapshot\nmetadata:\n name: \n namespace: \nspec:\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: " +} +``` + +**Report progress:** +```markdown +📸 Creating VM snapshot... +✓ Snapshot `` created for VM `` +``` + +### Step 9: Monitor Snapshot Status + +**After creation, monitor snapshot readiness using `resources_get`.** + +Check `status.phase`: +- `InProgress` → Still creating +- `Succeeded` → Snapshot ready +- `Failed` → Snapshot failed + +**Wait up to 5 minutes for snapshot to complete.** + +### Step 10: Report Snapshot Creation Results + +**Extract snapshot indications** from `status.indications`: +- `GuestAgent` - Guest agent coordinated the snapshot +- `Online` - Snapshot taken while VM was running + +**On success:** + +```markdown +## ✓ VM Snapshot Created Successfully + +**VM**: `` (namespace: ``) +**Snapshot**: `` + +### Snapshot Details +- **Name**: `` +- **Status**: Ready +- **Created**: +- **VM Status at Snapshot**: + +### Snapshot Coordination (from status.indications) + +- ✅ **Guest Agent Coordination**: Active +- ✅ **Filesystem Freeze/Thaw**: Performed during snapshot +- ✅ **Actual Consistency**: Application-consistent + + + +- ⚠️ **Guest Agent Coordination**: Not active +- ⚠️ **Actual Consistency**: Crash-consistent (best-effort) + + +### Next Steps + +**To list all snapshots:** +"List snapshots for VM " + +**To restore from this snapshot:** +"Restore VM from snapshot " + +**To delete this snapshot:** +"Delete snapshot " +``` + +## Dependencies + +### Required MCP Servers +- `openshift-virtualization` - OpenShift MCP server with kubevirt toolset + +### Required MCP Tools +- `resources_create_or_update` (from openshift-virtualization) - Create VirtualMachineSnapshot +- `resources_get` (from openshift-virtualization) - Verify VM and snapshot status +- `resources_list` (from openshift-virtualization) - List StorageClass, VolumeSnapshotClass + +### Related Skills +- `vm-snapshot-list` - List snapshots after creation +- `vm-snapshot-restore` - Restore VMs from snapshots +- `vm-snapshot-delete` - Delete old snapshots +- `vm-lifecycle-manager` - Stop VMs before snapshot +- `vm-inventory` - List VMs before creating snapshots + +### Reference Documentation + +**Official Red Hat Documentation:** +- [OpenShift Virtualization Snapshots - OpenShift 4.20](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt-managing-vm-snapshots) +- [Live Snapshots in OpenShift Virtualization](https://www.redhat.com/en/blog/live-snapshots-in-openshift-virtualization) + +**Upstream Documentation:** +- [KubeVirt VM Snapshots](https://kubevirt.io/user-guide/operations/snapshot_restore_api/) +- [CSI Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/) + +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill performs operations that affect VM data and storage. You MUST: + +1. **Before Creating Snapshots** + - Analyze storage backend capabilities + - Verify VM exists and get current state + - Check for hot-plugged volumes (blocks snapshots) + - Show storage consumption estimate + - Present snapshot configuration + - Ask: "Proceed with snapshot creation? (yes/no)" + - Wait for explicit "yes" + +2. **Never Auto-Execute** + - **NEVER create without user confirmation** + - **NEVER skip storage analysis** + - **NEVER skip hot-plugged volume check** + +**Why This Matters:** +- **Storage Consumption**: Snapshots consume storage proportional to VM disk size +- **Hot-Plugged Volumes**: Cannot snapshot VMs with hot-plugged volumes +- **Consistency**: Online vs offline snapshots have different consistency guarantees +- **Guest Agent**: Required for application-consistent snapshots + +## Common Issues + +### Issue 1: Snapshot Creation Fails - VolumeSnapshotClass Not Found + +**Error**: "VolumeSnapshotClass not found" or "CSI driver doesn't support snapshots" + +**Cause**: The storage backend doesn't have a VolumeSnapshotClass configured for the CSI driver, or the CSI driver doesn't support snapshots at all. + +**Solution:** +1. **Check if VolumeSnapshotClass exists**: Use `resources_list` with apiVersion="snapshot.storage.k8s.io/v1", kind="VolumeSnapshotClass" +2. **Verify CSI driver supports snapshots**: Check StorageClass provisioner field +3. **Contact cluster admin**: Request VolumeSnapshotClass configuration for your storage backend +4. **Alternative**: Use `vm-clone` skill for VM backup instead of snapshots + +### Issue 2: Snapshot Creation Blocked - Hot-Plugged Volumes Detected + +**Error**: "Cannot create snapshot - VM has hot-plugged volumes" + +**Cause**: The VM has volumes that were attached after VM creation without restarting the VM. Hot-plugged volumes block snapshot creation in OpenShift Virtualization. + +**Solution:** +1. **Stop the VM**: Use vm-lifecycle-manager skill to stop the VM +2. **Remove hot-plugged volumes**: Detach volumes that aren't needed +3. **Persist volumes to VM spec**: Add hot-plugged volumes to `spec.template.spec.volumes` to make them permanent +4. **Restart the VM**: Start the VM to apply the changes +5. **Retry snapshot**: Once hot-plugged volumes are resolved, create the snapshot + +**Related**: See [OpenShift Virtualization documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt-hot-plugging-virtual-disks) for hot-plugging details + +### Issue 3: Snapshot Created but Consistency Warning + +**Error**: Snapshot created successfully but shows "crash-consistent" without guest agent + +**Cause**: The VM doesn't have QEMU guest agent installed or running, so the snapshot couldn't coordinate filesystem freeze/thaw during creation. + +**Solution:** +1. **For Linux VMs**: Install qemu-guest-agent package + ```bash + # RHEL/CentOS/Fedora + sudo dnf install qemu-guest-agent + sudo systemctl enable --now qemu-guest-agent + ``` +2. **For Windows VMs**: Install VirtIO drivers which include the guest agent +3. **Verify agent status**: Check VM status for `AgentConnected: True` condition +4. **Future snapshots**: Once guest agent is running, subsequent snapshots will be application-consistent +5. **Current snapshot**: The crash-consistent snapshot is still usable, but may have minor inconsistencies + +## Security Considerations + +- **RBAC Enforcement**: Requires permissions for VirtualMachineSnapshot resources +- **Storage Quotas**: Respects namespace storage quotas +- **Hot-Plugged Volume Detection**: Prevents snapshots when hot-plugged volumes present +- **KUBECONFIG Security**: Credentials never exposed in output +- **Namespace Isolation**: Snapshots scoped to namespace boundaries +- **Audit Trail**: All snapshot operations logged in Kubernetes API audit logs + +## Example Usage + +### Example 1: Create Snapshot Before Upgrade + +``` +User: "Create a snapshot of database-01 before I upgrade it" + +Agent: "Which namespace is database-01 in?" + +User: "production" + +Agent: [Invokes vm-snapshot-create skill] + [Step 2: Verifies VM exists] + [Step 3: Analyzes storage backend - AWS EBS, online snapshots supported] + [Step 4: Checks VM status - Running, guest agent active] + [Step 6: Estimates storage - 50Gi] + [Step 7: Presents configuration] + +## VM Snapshot - Review Configuration + +**VM to Snapshot:** +- **Name**: `database-01` +- **Namespace**: `production` +- **Status**: Running +- **Storage**: 50Gi + +**Storage Backend Analysis:** +- **Storage Class**: `gp3-csi` +- **CSI Driver**: `ebs.csi.aws.com` +- **Online Snapshots**: ✅ Supported +- **Consistency Level**: `crash-consistent` +- **VolumeSnapshotClass**: `ebs-snapclass` ✓ + +**Guest Agent Status:** +- ✅ **QEMU Guest Agent**: Connected and running +- ✅ **Filesystem Freeze/Thaw**: Will be performed during snapshot +- ✅ **Expected Consistency**: Application-consistent (with freeze) + +**Snapshot Details:** +- **Snapshot Name**: `database-01-snapshot-20260218-143022` (auto-generated) +- **Type**: Online snapshot +- **Estimated Storage**: ~50Gi (initial snapshot) + +**Proceed with snapshot creation? (yes/no)** + +User: "yes" + +Agent: [Step 8: Creates snapshot] + +📸 Creating VM snapshot... +✓ Snapshot `database-01-snapshot-20260218-143022` created for VM `database-01` + + [Step 9: Monitors status] + +⏳ Snapshot creation in progress... + Status: InProgress + +⏳ Snapshot creation in progress... + Status: Succeeded + + [Step 10: Reports results] + +## ✓ VM Snapshot Created Successfully + +**VM**: `database-01` (namespace: `production`) +**Snapshot**: `database-01-snapshot-20260218-143022` + +### Snapshot Details +- **Name**: `database-01-snapshot-20260218-143022` +- **Status**: Ready +- **Created**: 2026-02-18 14:30:22 UTC +- **VM Status at Snapshot**: Running + +### Snapshot Coordination (from status.indications) +- ✅ **Guest Agent Coordination**: Active +- ✅ **Filesystem Freeze/Thaw**: Performed during snapshot +- ✅ **Actual Consistency**: Application-consistent + +### Next Steps + +**To list all snapshots:** +"List snapshots for VM database-01" + +**To restore from this snapshot:** +"Restore VM database-01 from snapshot database-01-snapshot-20260218-143022" + +You can now safely upgrade the database. If the upgrade fails, restore using the command above. +``` diff --git a/rh-virt/skills/vm-snapshot-delete/SKILL.md b/rh-virt/skills/vm-snapshot-delete/SKILL.md new file mode 100644 index 0000000..14f2fb4 --- /dev/null +++ b/rh-virt/skills/vm-snapshot-delete/SKILL.md @@ -0,0 +1,447 @@ +--- +name: vm-snapshot-delete +description: | + Permanently delete virtual machine snapshots to free storage space. + + Use when: + - "Delete snapshot [snapshot-name]" + - "Remove old snapshots for VM [name]" + - "Free up snapshot storage" + + Requires user confirmation before deletion. + + NOT for restoring VMs (use vm-snapshot-restore instead). + +model: inherit +color: yellow +--- + +# /vm-snapshot-delete Skill + +Permanently delete virtual machine snapshots in OpenShift Virtualization. Deleting snapshots frees storage but removes recovery points. + +## Prerequisites + +**Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) + +**Required MCP Tools**: +- `resources_get` (from openshift-virtualization) - Verify snapshot exists +- `resources_list` (from openshift-virtualization) - List related snapshots +- `resources_delete` (from openshift-virtualization) - Delete snapshot + +**Required Environment Variables**: +- `KUBECONFIG` - Path to Kubernetes configuration file with cluster access + +**Required Cluster Setup**: +- OpenShift cluster (>= 4.19) +- OpenShift Virtualization operator installed +- ServiceAccount with RBAC permissions to delete VirtualMachineSnapshot resources + +### Prerequisite Verification + +**Before executing, verify MCP server availability:** + +1. **Check MCP Server Configuration** + - Verify `openshift-virtualization` exists in `.mcp.json` + - If missing → Report to user with setup instructions + +2. **Check Environment Variables** + - Verify `KUBECONFIG` is set (check presence only, never expose value) + - If missing → Report to user + +## When to Use This Skill + +**Trigger this skill when:** +- User wants to free storage by removing old snapshots +- User wants to delete a specific snapshot +- User wants to implement snapshot retention policies + +**User phrases that trigger this skill:** +- "Delete snapshot pre-upgrade-backup" +- "Remove old snapshots for VM database-01" +- "Delete all snapshots older than 7 days" +- "Free up snapshot storage" + +**Do NOT use this skill when:** +- User wants to create snapshots → Use `vm-snapshot-create` skill +- User wants to restore from snapshot → Use `vm-snapshot-restore` skill +- User wants to list snapshots → Use `vm-snapshot-list` skill + +## Workflow + +### Step 1: Gather Delete Information + +**Required Information from User:** +1. **Snapshot Name** - Name of snapshot to delete +2. **Namespace** - Namespace where snapshot exists + +If namespace not provided, ask for it. + +### Step 2: Verify Snapshot Exists + +**MCP Tool**: `resources_get` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "", + "name": "" +} +``` + +**Expected Output**: VirtualMachineSnapshot resource + +**Error Handling**: +- If snapshot not found → Report error + +**If snapshot not found:** +```markdown +❌ Snapshot Not Found + +**Snapshot**: `` not found in namespace ``. + +**To list available snapshots:** +"List snapshots in namespace " + +Delete operation cancelled. +``` + +**STOP workflow**. + +**If snapshot found**, extract snapshot details: +- `spec.source.name` - VM name +- `metadata.creationTimestamp` - Creation timestamp +- `status.phase` - Snapshot status +- Calculate age from creationTimestamp + +### Step 3: List Other Snapshots for Same VM + +**MCP Tool**: `resources_list` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "", + "labelSelector": "vm.kubevirt.io/name=" +} +``` + +**Fallback**: If label selector doesn't work, list all snapshots and filter by `spec.source.name`. + +**Count snapshots** for the VM to determine if this is the last snapshot. + +### Step 4: Present Snapshot Details and Confirm Deletion + +```markdown +## ⚠️ Snapshot Deletion - Review + +**Snapshot to Delete**: `` + +### Snapshot Details +- **Snapshot Name**: `` +- **VM**: `` +- **Namespace**: `` +- **Created**: +- **Age**: +- **Status**: + +### Impact of Deletion +- ✗ Snapshot will be permanently deleted +- ✗ This recovery point will be lost +- ✗ Cannot restore VM to this snapshot state after deletion +- ✓ Storage will be freed + +### Recovery Impact +**Before deletion, consider:** +- Is this snapshot still needed for recovery? +- Are there other recovery points available? +- Could you need to restore to this state in the future? + +**Available snapshots for VM ``:** + + + +⚠️ **WARNING**: This is the ONLY snapshot for VM ``. After deletion, no snapshot recovery points will exist. + + +--- + +**Proceed with snapshot deletion? This action cannot be undone. (yes/no)** +``` + +**Wait for user confirmation.** + +**Handle response:** +- If "yes" → Proceed to Step 5 (execute deletion) +- If "no", "cancel", or anything else → Cancel operation + +**On cancellation:** +```markdown +Snapshot deletion cancelled by user. Snapshot `` preserved. +``` + +**STOP workflow**. + +### Step 5: Delete the Snapshot + +**ONLY PROCEED AFTER user confirmation in Step 4.** + +**MCP Tool**: `resources_delete` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "", + "name": "" +} +``` + +**Example tool invocation:** +```json +resources_delete({ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "production", + "name": "old-snapshot" +}) +``` + +**Expected Output**: VirtualMachineSnapshot deleted successfully + +**Error Handling**: +- If snapshot not found → Report error (may have been deleted externally) +- If permission denied → Report RBAC error +- If snapshot in use → Report error (snapshot may be in restore process) + +**Report progress:** +```markdown +🗑️ Deleting snapshot... +✓ Snapshot `` deleted +``` + +### Step 6: Report Deletion Results + +**On success:** + +```markdown +## ✓ Snapshot Deleted Successfully + +**Snapshot**: `` (VM: ``, namespace: ``) + +### Deletion Summary +- ✓ Snapshot permanently deleted +- ✓ Storage freed +- ✓ Recovery point removed + +### Impact +- ✗ Cannot restore VM to state +- ✗ Snapshot `` no longer available + + +### Remaining Snapshots for VM `` + + + +These snapshots are still available for recovery. + + + +⚠️ **No snapshots remain** for VM ``. Consider creating new snapshots for future recovery points. + + +--- + +### Next Steps + +**To create a new snapshot:** +"Create snapshot of VM " + +**To list remaining snapshots:** +"List snapshots for VM " +``` + +**On failure:** + +```markdown +## ❌ Snapshot Deletion Failed + +**Error**: + +**Snapshot**: `` (VM: ``, namespace: ``) + +**Common Causes:** +- **Snapshot not found** - May have been deleted externally +- **Insufficient RBAC permissions** - ServiceAccount lacks delete permissions +- **Snapshot in use** - Snapshot may be in active restore process +- **Storage backend error** - CSI driver or storage backend issue + +**Troubleshooting Steps:** + +1. **Verify snapshot still exists:** + "List snapshots for VM " + +2. **Check if snapshot is being used for restore:** + Use `resources_list` to check for active VirtualMachineRestore resources + +3. **Check permissions:** + Use CLI: `oc auth can-i delete virtualmachinesnapshots -n ` + +4. **Wait and retry** if snapshot is in use by restore operation + +Would you like help troubleshooting this error? +``` + +## Common Issues + +### Issue 1: Snapshot Not Found + +**Error**: "Snapshot `` not found in namespace ``" + +**Cause**: Snapshot doesn't exist, was deleted, or wrong namespace/name. + +**Solution:** +1. List snapshots to verify name: "List snapshots in namespace " +2. Check spelling (names are case-sensitive) +3. Try listing in other namespaces if unsure + +### Issue 2: Snapshot In Use During Restore + +**Error**: "Snapshot is in use by restore operation" + +**Cause**: An active VirtualMachineRestore is using this snapshot. + +**Solution:** +1. Check for active restores: Use `resources_list` with apiVersion="snapshot.kubevirt.io/v1beta1", kind="VirtualMachineRestore" +2. Wait for restore to complete, or delete the VirtualMachineRestore resource +3. Retry snapshot deletion + +### Issue 3: Permission Denied + +**Error**: "Forbidden: User lacks permissions to delete virtualmachinesnapshots" + +**Cause**: Missing RBAC permissions for snapshot deletion. + +**Solution:** +1. Check permissions: `oc auth can-i delete virtualmachinesnapshots -n ` +2. Contact cluster admin to grant delete permissions for virtualmachinesnapshots +3. Required permissions: delete verb on snapshot.kubevirt.io/virtualmachinesnapshots + +## Dependencies + +### Required MCP Servers +- `openshift-virtualization` - OpenShift MCP server with kubevirt toolset + +### Required MCP Tools +- `resources_get` (from openshift-virtualization) - Get snapshot details + - Parameters: apiVersion, kind, namespace, name + - Source: https://github.com/openshift/openshift-mcp-server + +- `resources_list` (from openshift-virtualization) - List related snapshots + - Parameters: apiVersion, kind, namespace, labelSelector + - Source: https://github.com/openshift/openshift-mcp-server + +- `resources_delete` (from openshift-virtualization) - Delete Kubernetes resources + - Parameters: apiVersion, kind, namespace, name + - Source: https://github.com/openshift/openshift-mcp-server + +### Related Skills +- `vm-snapshot-list` - List snapshots before deletion +- `vm-snapshot-create` - Create new snapshots +- `vm-snapshot-restore` - Restore VMs from snapshots + +### Reference Documentation +- [OpenShift Virtualization Snapshots](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt-managing-vm-snapshots) +- [KubeVirt VM Snapshots](https://kubevirt.io/user-guide/operations/snapshot_restore_api/) + +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill performs destructive operations. You MUST: + +1. **Before Deleting Snapshots** + - Show snapshot details (VM, age, size) + - Confirm snapshot won't be needed for recovery + - List other available snapshots for the VM + - Ask: "Proceed with snapshot deletion? (yes/no)" + - Wait for explicit "yes" + +2. **Never Auto-Execute** + - **NEVER delete without user confirmation** + - **ALWAYS show what will be lost before deletion** + +**Why This Matters:** +- **Recovery Point Loss**: Deleting snapshots removes recovery options permanently +- **No Undo**: Snapshot deletion cannot be reversed +- **Last Snapshot Warning**: Users must know if deleting the only snapshot for a VM + +## Security Considerations + +- **RBAC Enforcement**: Requires delete permissions for VirtualMachineSnapshot resources +- **User Confirmation**: Always requires explicit "yes" before deletion +- **Last Snapshot Warning**: Warns users when deleting the only snapshot for a VM +- **Namespace Isolation**: Snapshots scoped to namespace boundaries +- **Audit Trail**: Deletions logged in Kubernetes API audit logs + +## Example Usage + +``` +User: "Delete snapshot database-01-daily-backup" + +Agent: "Which namespace is the snapshot in?" + +User: "production" + +Agent: [Invokes vm-snapshot-delete skill] + [Step 2: Verifies snapshot exists] + [Step 3: Lists other snapshots] + [Step 4: Presents deletion confirmation] + +## ⚠️ Snapshot Deletion - Review + +Snapshot to Delete: `database-01-daily-backup` + +Snapshot Details: +- Name: `database-01-daily-backup` +- VM: `database-01` +- Namespace: `production` +- Created: 2024-01-14 02:00:00 UTC +- Age: 1 day + +Impact of Deletion: +- ✗ Snapshot will be permanently deleted +- ✗ This recovery point will be lost +- ✓ Storage will be freed + +Available snapshots for VM `database-01`: +- `database-01-pre-upgrade` (created 2024-01-15 10:30) + +Proceed with snapshot deletion? This action cannot be undone. (yes/no) + +User: "yes" + +Agent: [Step 5: Deletes snapshot] + +🗑️ Deleting snapshot... +✓ Snapshot `database-01-daily-backup` deleted + + [Step 6: Reports results] + +## ✓ Snapshot Deleted Successfully + +Snapshot: `database-01-daily-backup` + +Deletion Summary: +- ✓ Snapshot permanently deleted +- ✓ Storage freed +- ✓ Recovery point removed + +Remaining Snapshots for VM `database-01`: +- `database-01-pre-upgrade` (created 2024-01-15 10:30) + +This snapshot is still available for recovery. +``` diff --git a/rh-virt/skills/vm-snapshot-list/SKILL.md b/rh-virt/skills/vm-snapshot-list/SKILL.md new file mode 100644 index 0000000..3f199f8 --- /dev/null +++ b/rh-virt/skills/vm-snapshot-list/SKILL.md @@ -0,0 +1,402 @@ +--- +name: vm-snapshot-list +description: | + List virtual machine snapshots across namespaces with status, age, and recovery information. + + Use when: + - "List snapshots for VM [name]" + - "Show snapshots in namespace [name]" + - "What snapshots exist for [vm]?" + + Read-only operation - no user confirmation required. + + NOT for creating/deleting snapshots (use vm-snapshot-create/delete instead). + +model: inherit +color: cyan +--- + +# /vm-snapshot-list Skill + +List virtual machine snapshots in OpenShift Virtualization. This read-only skill displays snapshot information including status, age, size, and recovery options. + +## Prerequisites + +**Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) + +**Required MCP Tools**: +- `resources_list` (from openshift-virtualization) - List VirtualMachineSnapshot resources +- `resources_get` (from openshift-virtualization) - Get snapshot details + +**Required Environment Variables**: +- `KUBECONFIG` - Path to Kubernetes configuration file with cluster access + +**Required Cluster Setup**: +- OpenShift cluster (>= 4.19) +- OpenShift Virtualization operator installed +- ServiceAccount with RBAC permissions to list VirtualMachineSnapshot resources + +### Prerequisite Verification + +**Before executing, verify MCP server availability:** + +1. **Check MCP Server Configuration** + - Verify `openshift-virtualization` exists in `.mcp.json` + - If missing → Report to user with setup instructions + +2. **Check Environment Variables** + - Verify `KUBECONFIG` is set (check presence only, never expose value) + - If missing → Report to user + +**Human Notification Protocol:** + +When prerequisites fail: + +``` +❌ Cannot execute vm-snapshot-list: MCP server 'openshift-virtualization' is not available + +📋 Setup Instructions: +1. Add openshift-virtualization to .mcp.json +2. Set KUBECONFIG environment variable +3. Restart Claude Code to reload MCP servers + +🔗 Documentation: https://github.com/openshift/openshift-mcp-server +``` + +## When to Use This Skill + +**Trigger this skill when:** +- User wants to list available snapshots for recovery +- User wants to see snapshot status and age +- User wants to verify snapshot existence before restore +- User wants to identify old snapshots for deletion + +**User phrases that trigger this skill:** +- "List all snapshots for web-server VM" +- "Show snapshots in namespace production" +- "What snapshots exist?" +- "Display VM snapshots" + +**Do NOT use this skill when:** +- User wants to create a snapshot → Use `vm-snapshot-create` skill +- User wants to restore from snapshot → Use `vm-snapshot-restore` skill +- User wants to delete snapshots → Use `vm-snapshot-delete` skill + +## Workflow + +### Step 1: Gather Information + +**Required Information from User:** +1. **Namespace** - Namespace to list snapshots from +2. **VM Name** (Optional) - Filter snapshots by specific VM + +If user doesn't provide namespace, ask for it. + +### Step 2: List Snapshots + +**MCP Tool**: `resources_list` (from openshift-virtualization) + +**Parameters** (with VM filter using label selector): +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "", + "labelSelector": "vm.kubevirt.io/name=" +} +``` + +**Parameters** (all snapshots in namespace): +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "" +} +``` + +**Note**: The label selector `vm.kubevirt.io/name=` may not always exist. If no results are returned, fall back to listing all snapshots and filtering by checking `spec.source.name` field in the results. + +**Expected Output**: List of VirtualMachineSnapshot resources + +**Parse each snapshot to extract**: +- `metadata.name` - Snapshot name +- `metadata.namespace` - Namespace +- `metadata.creationTimestamp` - Creation time +- `spec.source.name` - VM name +- `status.phase` - Status (InProgress, Succeeded, Failed) +- `status.readyToUse` - Ready for restore (true/false) + +**Error Handling**: +- If namespace not found → Report error +- If permission denied → Report RBAC error +- If no snapshots found → Report "No snapshots found" + +### Step 3: Report Snapshot List + +**If snapshots found:** + +```markdown +## VM Snapshots + +**Namespace**: `` + +**VM**: `` + + +### Available Snapshots + +| Snapshot Name | VM Name | Status | Created | Age | ReadyToUse | +|---------------|---------|--------|---------|-----|------------| +| `pre-upgrade-snapshot` | `database-01` | Succeeded ✓ | 2024-01-15 10:30 | 2 days | true | +| `backup-snapshot` | `database-01` | Succeeded ✓ | 2024-01-10 08:00 | 7 days | true | +| `test-snapshot` | `web-server` | Succeeded ✓ | 2024-01-14 14:20 | 3 days | true | + +**Total Snapshots**: 3 + +--- + +### Snapshot Details + +**Snapshot: `pre-upgrade-snapshot`** +- **VM**: `database-01` +- **Status**: Succeeded ✓ +- **Created**: 2024-01-15 10:30:00 UTC +- **Age**: 2 days +- **Ready to Use**: true + +**Snapshot: `backup-snapshot`** +- **VM**: `database-01` +- **Status**: Succeeded ✓ +- **Created**: 2024-01-10 08:00:00 UTC +- **Age**: 7 days +- **Ready to Use**: true + +**Snapshot: `test-snapshot`** +- **VM**: `web-server` +- **Status**: Succeeded ✓ +- **Created**: 2024-01-14 14:20:00 UTC +- **Age**: 3 days +- **Ready to Use**: true + +--- + +### Actions + +**To restore from a snapshot:** +``` +"Restore VM from snapshot " +``` + +**To delete a snapshot:** +``` +"Delete snapshot " +``` + +**To create a new snapshot:** +``` +"Create snapshot of VM " +``` +``` + +**If no snapshots found:** + +```markdown +## VM Snapshots + +**Namespace**: `` + +**VM**: `` + + +**No snapshots found.** + + +No snapshots exist for VM `` in namespace ``. + + +No snapshots exist in namespace ``. + + +**To create a snapshot:** +``` +"Create snapshot of VM " +``` +``` + +## Common Issues + +### Issue 1: Permission Denied + +**Error**: "Forbidden: User lacks permissions to list virtualmachinesnapshots" + +**Cause**: Missing RBAC permissions for listing snapshots. + +**Solution:** +1. Check permissions: `oc auth can-i list virtualmachinesnapshots -n ` +2. Contact cluster admin to grant list/get permissions for virtualmachinesnapshots +3. Try listing in a different namespace where you have permissions + +### Issue 2: No Snapshots Found + +**Error**: "No snapshots exist in namespace ``" + +**Cause**: Namespace has no snapshots, or wrong namespace. + +**Solution:** +1. Verify correct namespace name +2. List snapshots without VM filter to see all snapshots +3. Check other namespaces: Use `namespaces_list` to see available namespaces +4. Check if snapshots were recently deleted: Use `events_list` in namespace + +### Issue 3: Snapshot Shows Failed Status + +**Error**: Snapshot listed but `status.phase: Failed` or `readyToUse: false` + +**Cause**: Snapshot creation failed due to storage issues, hot-plugged volumes, or missing VolumeSnapshotClass. + +**Solution:** +1. Get snapshot details: Use `resources_get` to check `status.conditions` for error messages +2. Check cluster events: Use `events_list` for snapshot-related errors +3. Common fixes: + - VolumeSnapshotClass missing: Contact cluster admin + - Hot-plugged volumes: Stop VM, persist volumes, recreate snapshot + - Failed permanently: Delete and recreate snapshot + +## Dependencies + +### Required MCP Servers +- `openshift-virtualization` - OpenShift MCP server with kubevirt toolset + +### Required MCP Tools +- `resources_list` (from openshift-virtualization) - List Kubernetes resources by type + - Used for: Listing VirtualMachineSnapshot resources + - Parameters: apiVersion, kind, namespace, labelSelector (optional) + - Source: https://github.com/openshift/openshift-mcp-server + +### Related Skills +- `vm-snapshot-create` - Create VM snapshots +- `vm-snapshot-restore` - Restore VMs from snapshots +- `vm-snapshot-delete` - Delete VM snapshots +- `vm-inventory` - List VMs before creating snapshots + +### Reference Documentation +- [OpenShift Virtualization Snapshots](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt-managing-vm-snapshots) +- [KubeVirt VM Snapshots](https://kubevirt.io/user-guide/operations/snapshot_restore_api/) + +## Security Considerations + +- **RBAC Enforcement**: Requires list permissions for VirtualMachineSnapshot resources +- **Namespace Isolation**: Only lists snapshots within specified namespace +- **Read-Only Operation**: No modifications or destructive actions +- **KUBECONFIG Security**: Credentials never exposed in output + +## Example Usage + +### Example 1: List Snapshots for Specific VM + +``` +User: "List all snapshots for database-01" + +Agent: "Which namespace is database-01 in?" + +User: "production" + +Agent: [Invokes vm-snapshot-list skill] + [Lists snapshots with VM filter] + +## VM Snapshots + +**Namespace**: `production` +**VM**: `database-01` + +### Available Snapshots + +| Snapshot Name | VM Name | Status | Created | Age | ReadyToUse | +|---------------|---------|--------|---------|-----|------------| +| `database-01-pre-upgrade` | `database-01` | Succeeded ✓ | 2024-01-15 10:30 | 2 hours | true | +| `database-01-daily-backup` | `database-01` | Succeeded ✓ | 2024-01-14 02:00 | 1 day | true | + +**Total Snapshots**: 2 + +--- + +### Snapshot Details + +**Snapshot: `database-01-pre-upgrade`** +- **VM**: `database-01` +- **Status**: Succeeded ✓ +- **Created**: 2024-01-15 10:30:00 UTC +- **Age**: 2 hours +- **Ready to Use**: true + +**Snapshot: `database-01-daily-backup`** +- **VM**: `database-01` +- **Status**: Succeeded ✓ +- **Created**: 2024-01-14 02:00:00 UTC +- **Age**: 1 day +- **Ready to Use**: true + +--- + +### Actions + +**To restore from a snapshot:** +"Restore VM database-01 from snapshot " + +**To delete a snapshot:** +"Delete snapshot " + +**To create a new snapshot:** +"Create snapshot of VM database-01" +``` + +### Example 2: List All Snapshots in Namespace + +``` +User: "Show me all snapshots in namespace production" + +Agent: [Invokes vm-snapshot-list skill] + [Lists all snapshots without VM filter] + +## VM Snapshots + +**Namespace**: `production` + +### Available Snapshots + +| Snapshot Name | VM Name | Status | Created | Age | ReadyToUse | +|---------------|---------|--------|---------|-----|------------| +| `database-01-pre-upgrade` | `database-01` | Succeeded ✓ | 2024-01-15 10:30 | 2 hours | true | +| `database-01-daily-backup` | `database-01` | Succeeded ✓ | 2024-01-14 02:00 | 1 day | true | +| `web-server-backup` | `web-server` | Succeeded ✓ | 2024-01-13 03:00 | 2 days | true | + +**Total Snapshots**: 3 + +VMs with snapshots: `database-01` (2), `web-server` (1) +``` + +### Example 3: No Snapshots Found + +``` +User: "List snapshots for test-vm" + +Agent: "Which namespace is test-vm in?" + +User: "dev" + +Agent: [Invokes vm-snapshot-list skill] + [Lists snapshots - none found] + +## VM Snapshots + +**Namespace**: `dev` +**VM**: `test-vm` + +**No snapshots found.** + +No snapshots exist for VM `test-vm` in namespace `dev`. + +**To create a snapshot:** +"Create snapshot of VM test-vm" +``` diff --git a/rh-virt/skills/vm-snapshot-restore/SKILL.md b/rh-virt/skills/vm-snapshot-restore/SKILL.md new file mode 100644 index 0000000..4fbd8e6 --- /dev/null +++ b/rh-virt/skills/vm-snapshot-restore/SKILL.md @@ -0,0 +1,580 @@ +--- +name: vm-snapshot-restore +description: | + Restore virtual machines from snapshots with strict safety confirmations to prevent data loss. + + Use when: + - "Restore VM [name] from snapshot [snapshot-name]" + - "Roll back VM [name] to snapshot" + - "Recover VM [name] from backup" + + CRITICAL: Requires VM to be stopped and typed snapshot name confirmation before restore. + + NOT for creating snapshots (use vm-snapshot-create instead). + +model: inherit +color: red +--- + +# /vm-snapshot-restore Skill + +Restore virtual machines from snapshots in OpenShift Virtualization. **CRITICAL**: This operation replaces current VM state with snapshot data. ALL changes since the snapshot will be LOST. + +**Implementation Note**: This skill uses generic Kubernetes resource tools (`resources_create_or_update`) to create VirtualMachineRestore resources. Dedicated restore tools do not currently exist in the openshift-virtualization MCP server. + +## Prerequisites + +**Required MCP Server**: `openshift-virtualization` ([OpenShift MCP Server](https://github.com/openshift/openshift-mcp-server)) + +**Required MCP Tools**: +- `resources_create_or_update` (from openshift-virtualization) - Create VirtualMachineRestore +- `resources_get` (from openshift-virtualization) - Verify VM/snapshot exists, monitor restore +- `vm_lifecycle` (from openshift-virtualization) - Stop VM if running + +**Required Environment Variables**: +- `KUBECONFIG` - Path to Kubernetes configuration file with cluster access + +**Required Cluster Setup**: +- OpenShift cluster (>= 4.19) +- OpenShift Virtualization operator installed +- ServiceAccount with RBAC permissions to create VirtualMachineRestore resources + +## When to Use This Skill + +**Trigger this skill when:** +- User wants to restore a VM to a previous state +- User wants to recover from failed changes/upgrades +- User explicitly requests snapshot restore + +**User phrases that trigger this skill:** +- "Restore VM api-server from snapshot snapshot-20240115" +- "Roll back database-01 to pre-upgrade snapshot" +- "Recover VM web-server from backup" + +**Do NOT use this skill when:** +- User wants to create snapshots → Use `vm-snapshot-create` skill +- User wants to list snapshots → Use `vm-snapshot-list` skill +- User wants to clone a VM → Use `vm-clone` skill + +## Workflow + +### Step 1: Gather Restore Information + +**Required Information from User:** +1. **VM Name** - VM to restore +2. **Namespace** - Namespace where VM exists +3. **Snapshot Name** - Snapshot to restore from + +If any information missing, ask for it. + +### Step 2: Verify VM Exists + +**MCP Tool**: `resources_get` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "kubevirt.io/v1", + "kind": "VirtualMachine", + "namespace": "", + "name": "" +} +``` + +**Error Handling**: +- If VM not found → Report error +- If permission denied → Report RBAC error + +### Step 3: Check VM Running State + +**From the VM resource in Step 2**, check `status.printableStatus`. + +**If VM is Running:** +```markdown +⚠️ VM Must Be Stopped Before Restore + +**VM**: `` (namespace: ``) +**Status**: Running + +**Safety Requirement**: VMs must be stopped before restore to prevent data corruption. + +**Options:** +1. "stop-and-restore" - Stop the VM first, then restore from snapshot +2. "cancel" - Cancel restore operation + +How would you like to proceed? +``` + +**Wait for user response.** + +- If "stop-and-restore" → Stop VM using vm_lifecycle, then continue +- If "cancel" → Stop workflow + +### Step 4: Verify Snapshot Exists + +**MCP Tool**: `resources_get` (from openshift-virtualization) + +**Parameters**: +```json +{ + "apiVersion": "snapshot.kubevirt.io/v1beta1", + "kind": "VirtualMachineSnapshot", + "namespace": "", + "name": "" +} +``` + +**If snapshot not found:** +```markdown +❌ Snapshot Not Found + +**Snapshot**: `` does not exist in namespace ``. + +**To list available snapshots:** +"List snapshots for VM " + +Restore operation cancelled. +``` + +**STOP workflow**. + +**Extract snapshot details:** +- `metadata.creationTimestamp` - Creation time +- `status.phase` - Must be "Succeeded" +- `status.readyToUse` - Must be `true` +- `spec.source.name` - Verify it matches the VM name + +**If snapshot status is not Ready:** +```markdown +❌ Snapshot Not Ready + +**Snapshot**: `` +**Status**: +**Ready to Use**: + +Snapshot is not ready for restore. Only snapshots with "Succeeded" phase and readyToUse=true can be used. + +Restore operation cancelled. +``` + +**STOP workflow**. + +### Step 5: Present Restore Preview and Get Typed Confirmation + +**CRITICAL: User must type the snapshot name to confirm.** + +```markdown +## 🔴 VM RESTORE - Data Loss Warning + +**⚠️ THIS WILL REPLACE CURRENT VM STATE WITH SNAPSHOT DATA ⚠️** + +### What Will Happen + +**VM to Restore**: `` (namespace: ``) +**Snapshot to Restore From**: `` + +**Current VM State** (WILL BE LOST): +- **Last Modified**: +- **Changes Since Snapshot**: ALL changes made after WILL BE PERMANENTLY LOST + +**Snapshot State** (WILL BE RESTORED): +- **Created**: +- **Age**: + +**Time Range of Data Loss**: +- **⚠️ ALL CHANGES in the last WILL BE LOST ⚠️** + +### What Will Be Restored +- ✓ VM configuration (from snapshot time) +- ✓ Disk data (from snapshot time) + +### What Will Be Lost +- ✗ **ALL disk changes** made after +- ✗ **ALL configuration changes** made after + +--- + +**⚠️ CRITICAL: This restore is permanent. Current VM state cannot be recovered unless you create a snapshot now.** + +**To proceed with restore, type the snapshot name exactly as shown:** + +Type `` to confirm: _____ +``` + +**Wait for user to type the snapshot name.** + +**Validation:** +- Compare user input with snapshot name (case-sensitive, exact match) +- **If match**: Proceed to Step 6 +- **If mismatch**: Cancel operation + +**On mismatch:** +```markdown +❌ Confirmation Failed + +**You typed**: `` +**Expected**: `` + +Names do not match. Restore cancelled for safety. + +Operation cancelled. Current VM state preserved. +``` + +**STOP workflow**. + +### Step 6: Final Confirmation Before Restore + +**After typed verification succeeds**, ask for final explicit confirmation. + +```markdown +## ✓ Typed Verification Passed + +**Confirmation received for snapshot**: `` + +### Ready to Restore + +**VM**: `` (namespace: ``) +**From Snapshot**: `` + +**Impact**: +- Current VM state will be replaced with snapshot state +- All changes in the last will be permanently lost + +--- + +**Proceed with VM restore? This action cannot be undone.** +- Type "yes" to execute restore +- Type "cancel" to abort + +Your choice: _____ +``` + +**Wait for user response.** + +**Handle response:** +- If "yes" → Proceed to Step 7 (execute restore) +- If "cancel", "no", "wait", or anything else → Cancel operation + +**On cancellation:** +```markdown +Restore operation cancelled by user. Current VM state preserved. +``` + +**STOP workflow**. + +### Step 7: Execute Restore + +**ONLY PROCEED AFTER**: +- ✓ VM verified (exists, stopped) +- ✓ Snapshot verified (exists, ready) +- ✓ User typed snapshot name correctly +- ✓ User confirmed "yes" + +**MCP Tool**: `resources_create_or_update` (from openshift-virtualization) + +**Construct VirtualMachineRestore YAML:** + +```yaml +apiVersion: snapshot.kubevirt.io/v1beta1 +kind: VirtualMachineRestore +metadata: + name: + namespace: +spec: + target: + apiGroup: kubevirt.io + kind: VirtualMachine + name: + virtualMachineSnapshotName: +``` + +**Generate restore name**: +- Format: `restore--` +- Example: `restore-database-01-20260218-143500` + +**Parameters**: +```json +{ + "resource": "apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineRestore\nmetadata:\n name: \n namespace: \nspec:\n target:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: \n virtualMachineSnapshotName: " +} +``` + +**Report progress:** +```markdown +🔄 Restoring VM from snapshot... +⏳ This may take several minutes... +``` + +### Step 8: Monitor Restore Progress + +**Use `resources_get` to monitor VirtualMachineRestore status.** + +Check `status.complete`: +- `true` → Restore completed +- `false` → Restore in progress + +**Wait up to 10 minutes for restore to complete.** + +### Step 9: Report Restore Results + +**On success:** + +```markdown +## ✓ VM Restored Successfully + +**VM**: `` (namespace: ``) +**Restored From**: Snapshot `` + +### Restore Details +- **Snapshot Created**: +- **Restore Completed**: +- **VM Status**: Stopped (ready to start) + +### Data Loss Confirmation +- ⚠️ All changes made after have been lost + +### Next Steps + +**To start the restored VM:** +"Start VM in namespace " +``` + +**On failure:** + +```markdown +## ❌ VM Restore Failed + +**Error**: + +**VM**: `` +**Snapshot**: `` + +**Current VM State**: UNKNOWN - may be partially restored or unchanged + +**CRITICAL**: Do not start VM until restore issue is resolved + +**Recovery Options:** +1. Try restore again after resolving the error +2. Restore from a different snapshot +3. Contact cluster admin for investigation +``` + +## Dependencies + +### Required MCP Servers +- `openshift-virtualization` - OpenShift MCP server with kubevirt toolset + +### Required MCP Tools +- `resources_create_or_update` (from openshift-virtualization) - Create VirtualMachineRestore +- `resources_get` (from openshift-virtualization) - Verify and monitor +- `vm_lifecycle` (from openshift-virtualization) - Stop VM if running + +### Related Skills +- `vm-snapshot-list` - List snapshots before restore +- `vm-snapshot-create` - Create snapshots before risky operations +- `vm-snapshot-delete` - Delete old snapshots +- `vm-lifecycle-manager` - Start VM after restore + +### Reference Documentation + +**Official Red Hat Documentation:** +- [OpenShift Virtualization Snapshots - OpenShift 4.20](https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html-single/virtualization/index#virt-managing-vm-snapshots) + +**Upstream Documentation:** +- [KubeVirt VM Snapshots](https://kubevirt.io/user-guide/operations/snapshot_restore_api/) + +## Critical: Human-in-the-Loop Requirements + +**IMPORTANT:** This skill performs DESTRUCTIVE operations. You MUST: + +1. **Before Restoring Snapshots** (CRITICAL - Data Loss Risk) + - **REQUIRE VM to be stopped first** if currently running + - Display what will be lost (current VM state since snapshot) + - Show snapshot details (creation time, age) + - **Require typed confirmation** - user must type snapshot name exactly + - Ask: "Proceed with restore? This will replace current VM state. (yes/cancel)" + - Wait for explicit "yes" + +2. **Never Auto-Execute** + - **NEVER restore without user confirmation** + - **NEVER restore to running VMs** without stopping first + - **NEVER skip typed verification for restore operations** + +**Why This Matters:** +- **Data Loss on Restore**: Restoring replaces current VM state - all changes since snapshot are PERMANENTLY LOST +- **No Undo**: Restore cannot be reversed - current data cannot be recovered +- **Typed Confirmation**: Prevents accidental restores to wrong snapshots + +## Common Issues + +### Issue 1: Restore Fails - Insufficient Storage Capacity + +**Error**: "Failed to restore: insufficient storage capacity" or "PVC provisioning failed" + +**Cause**: The namespace doesn't have enough storage quota or the storage backend is full. + +**Solution:** +1. **Check namespace storage quota**: Use `resources_list` with apiVersion="v1", kind="ResourceQuota" to check available storage +2. **Check PVC status**: Use `resources_list` to check PersistentVolumeClaims in the namespace +3. **Delete unnecessary snapshots**: Use vm-snapshot-delete skill to free up storage +4. **Request quota increase**: Contact cluster admin to increase namespace storage quota +5. **Try restore again**: Once storage is available, retry the restore operation + +### Issue 2: Restore Stuck in Progress + +**Error**: VirtualMachineRestore status shows `complete: false` for extended period + +**Cause**: The storage backend is slow, the snapshot is corrupted, or there's a CSI driver issue. + +**Solution:** +1. **Check VirtualMachineRestore status**: Use `resources_get` to check `status.conditions` for detailed error messages +2. **Check snapshot health**: Use vm-snapshot-list skill to verify snapshot is in "Succeeded" state +3. **Check storage backend logs**: Review CSI driver logs for errors (requires cluster admin access) +4. **Wait longer**: Large VMs may take 10+ minutes to restore +5. **Cancel and retry**: Delete the VirtualMachineRestore resource and try again +6. **Alternative snapshot**: If restore continues to fail, try restoring from a different snapshot + +**Related**: If the issue persists, there may be a storage backend problem requiring cluster administrator investigation. + +### Issue 3: User Typed Wrong Snapshot Name + +**Error**: "Confirmation Failed - Names do not match" + +**Cause**: User made a typo when typing the snapshot name for confirmation. + +**Solution:** +1. **Retry with exact name**: Copy-paste the snapshot name from the previous message +2. **Check capitalization**: Snapshot names are case-sensitive +3. **Check for spaces**: Ensure no leading/trailing spaces in the typed name +4. **Use autocomplete**: If available in your terminal, use tab completion +5. **List snapshots**: Use vm-snapshot-list skill to see all available snapshots and verify the correct name + +**This safety mechanism is intentional** - it prevents accidental data loss from restoring to the wrong snapshot. + +## Security Considerations + +- **Data Protection**: Restore requires TWO confirmations (typed + yes) to prevent accidental data loss +- **VM Stop Requirement**: VMs must be stopped before restore +- **Typed Verification**: User must type exact snapshot name +- **RBAC Enforcement**: Requires permissions for VirtualMachineRestore resources +- **Namespace Isolation**: Restore scoped to namespace boundaries +- **Audit Trail**: All restore operations logged in Kubernetes API audit logs +- **No Rollback**: Restore is permanent - current state cannot be recovered after restore + +## Example Usage + +### Example 1: Restore Failed Upgrade + +``` +User: "The database upgrade failed. Restore database-01 from snapshot pre-upgrade-snapshot" + +Agent: "Which namespace is database-01 in?" + +User: "production" + +Agent: [Invokes vm-snapshot-restore skill] + [Step 2: Verifies VM exists] + [Step 3: Checks VM status - Running] + +⚠️ VM Must Be Stopped Before Restore + +**VM**: `database-01` (namespace: `production`) +**Status**: Running + +**Safety Requirement**: VMs must be stopped before restore to prevent data corruption. + +**Options:** +1. "stop-and-restore" - Stop the VM first, then restore from snapshot +2. "cancel" - Cancel restore operation + +How would you like to proceed? + +User: "stop-and-restore" + +Agent: [Stops VM using vm_lifecycle] + +⏸️ Stopping VM before restore... + +VM `database-01` is being stopped gracefully. + + [Waits 10 seconds, verifies stopped] + [Step 4: Verifies snapshot exists and is ready] + [Step 5: Presents restore preview] + +## 🔴 VM RESTORE - Data Loss Warning + +**⚠️ THIS WILL REPLACE CURRENT VM STATE WITH SNAPSHOT DATA ⚠️** + +**VM to Restore**: `database-01` (namespace: `production`) +**Snapshot to Restore From**: `pre-upgrade-snapshot` + +**Current VM State** (WILL BE LOST): +- **Last Modified**: 2026-02-18 16:30:00 UTC +- **Changes Since Snapshot**: ALL changes made after 2026-02-18 10:00:00 WILL BE PERMANENTLY LOST + +**Snapshot State** (WILL BE RESTORED): +- **Created**: 2026-02-18 10:00:00 UTC +- **Age**: 6 hours 30 minutes + +**Time Range of Data Loss**: +- **⚠️ ALL CHANGES in the last 6 hours 30 minutes WILL BE LOST ⚠️** + +**To proceed with restore, type the snapshot name exactly as shown:** + +Type `pre-upgrade-snapshot` to confirm: _____ + +User: "pre-upgrade-snapshot" + +Agent: [Step 6: Typed verification passed] + [Step 7: Final confirmation] + +## ✓ Typed Verification Passed + +**Confirmation received for snapshot**: `pre-upgrade-snapshot` + +**VM**: `database-01` +**From Snapshot**: `pre-upgrade-snapshot` + +**Impact**: +- Current VM state will be replaced with snapshot state +- All changes in the last 6 hours 30 minutes will be permanently lost + +**Proceed with VM restore? This action cannot be undone.** +- Type "yes" to execute restore +- Type "cancel" to abort + +Your choice: _____ + +User: "yes" + +Agent: [Step 8: Executes restore] + +🔄 Restoring VM from snapshot... +⏳ This may take several minutes... + + [Step 9: Monitors progress] + +⏳ Restore in progress... + + [Step 10: Reports results] + +## ✓ VM Restored Successfully + +**VM**: `database-01` (namespace: `production`) +**Restored From**: Snapshot `pre-upgrade-snapshot` + +### Restore Details +- **Snapshot Created**: 2026-02-18 10:00:00 UTC +- **Restore Completed**: 2026-02-18 16:35:22 UTC +- **VM Status**: Stopped (ready to start) + +### Data Loss Confirmation +- ⚠️ All changes made after 2026-02-18 10:00:00 UTC have been lost + +### Next Steps + +**To start the restored VM:** +"Start VM database-01 in namespace production" + +The database has been restored to its pre-upgrade state. The failed upgrade has been rolled back. +``` From 9c42de9be683c5c0b6fdb4923017467a00805239 Mon Sep 17 00:00:00 2001 From: r2dedios Date: Thu, 19 Feb 2026 10:05:47 +0100 Subject: [PATCH 2/2] refact(rh-virt): Fixed minor issue in CLAUDE.md file Signed-off-by: r2dedios --- CLAUDE.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 05d93b2..60f654d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -645,7 +645,7 @@ last_updated: YYYY-MM-DD 5. Test with `Skill` tool invocation **Collection-Specific Standards:** -- **rh-virt**: Follow `rh-virt/SKILL_TEMPLATE.md` and validate with `rh-virt/SKILLS_CHECKLIST.md` for enhanced quality standards including mandatory Common Issues and Example Usage sections +- **rh-virt**: Follow `rh-virt/SKILL_TEMPLATE.md` for enhanced quality standards including mandatory Common Issues and Example Usage sections ### Adding an Agent @@ -693,7 +693,6 @@ When creating new collection, use `rh-sre` as the architectural reference. The `rh-virt` pack demonstrates skill quality standardization: - Comprehensive skill templates (`SKILL_TEMPLATE.md`) -- Validation checklist (`SKILLS_CHECKLIST.md`) - Risk-based color coding (cyan/green/blue/yellow/red) - Mandatory Common Issues and Example Usage sections - Consistent section ordering and formatting