-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Background:
PVM is a pagetable-based virtualization system where kernel and user space separation is implemented via separate page tables. Currently, a process’s address space is defined by the CR3 register and MSR_SWITCH_CR3 (a PVM virtual MSR). Address space switching is performed by the hypercall PVM_HC_LOAD_PGTBL, which loads both CR3 and MSR_SWITCH_CR3, and by swapping CR3 and MSR_SWITCH_CR3 when switching between kernel and user modes.
Goal
However, this ABI deviates from native x86 architecture and should be converted to use a single CR3, as is standard. Using a single CR3 does not eliminate separation — the hypervisor will manage two underlying shadow page tables to maintain proper kernel/user isolation. This change would also remove MSR_PVM_SWITCH_CR3 and the user_pgd argument from PVM_HC_LOAD_PGTBL.
Benefits of the current dual-CR3 design:
• The guest explicitly manages which pages belong to kernel CR3 and which to user CR3, taking responsibility for proper separation.
• It allows reuse of the existing Linux kernel KPTI (Kernel Page Table Isolation) logic inside the PVM guest — the main reason why the current dual-CR3 implementation is relatively simple.
Drawbacks of dual-CR3:
• It deviates from the native x86 architecture, making the ABI less clear.
• Future kernels may remove KPTI once CPUs affected by the Meltdown bug are obsolete (possibly in 10–20 years), making this approach unsustainable long-term.
• Wastes an extra 4 KB root page table per process in the guest.
After adopting a single-CR3 model:
Pros:
• Clear, native x86-compliant ABI.
Cons:
• More complex logic required in the hypervisor to carefully manage shadow page tables that distinguish between kernel and user mappings.
• The new implementation must go beyond simple KPTI and fully emulate native x86 behavior.