-
Notifications
You must be signed in to change notification settings - Fork 30
Enable PMP for memory isolation #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d2552a5 to
319ba96
Compare
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use unified "flexpage" notation.
e264a35 to
4a62d5b
Compare
Got it! Thanks for the correction and the L4 X.2 reference. |
109259d to
f6c3912
Compare
2644558 to
1bb5fcf
Compare
904e972 to
ed800fc
Compare
This comment was marked as outdated.
This comment was marked as outdated.
0d55f21 to
865a5d6
Compare
Finished. And I removed the M-mode fault-handling commits, as they are not aligned with the upcoming work. |
865a5d6 to
7e3992e
Compare
c190707 to
093ff06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 issues found across 25 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="arch/riscv/pmp.c">
<violation number="1" location="arch/riscv/pmp.c:427">
P1: Integer overflow in access check: `addr + size` can wrap around, potentially bypassing PMP security checks. Validate that `addr + size` does not overflow before computing the end address.</violation>
<violation number="2" location="arch/riscv/pmp.c:636">
P1: `next_region_idx` is never incremented after use, causing subsequent fault handlers to overwrite the same PMP region repeatedly instead of using new slots.</violation>
<violation number="3" location="arch/riscv/pmp.c:644">
P0: Use-after-invalidation bug: `victim->pmp_id` is used after `pmp_evict_fpage()` sets it to `PMP_INVALID_REGION`. Save the region index before eviction.</violation>
</file>
<file name="kernel/task.c">
<violation number="1" location="kernel/task.c:385">
P2: Memory leak: `tcb->mspace` is not destroyed in the kernel_stack allocation error path. Since this is inside a `user_mode` block, `mspace` was already created earlier and should be freed.</violation>
</file>
<file name="kernel/syscall.c">
<violation number="1" location="kernel/syscall.c:400">
P1: Holding NOSCHED_ENTER for the entire user-controlled string makes sys_tputs vulnerable to denial of service; a malicious U-mode task can pass an unterminated or extremely long string and block the scheduler indefinitely.</violation>
</file>
<file name="Documentation/hal-riscv-context-switch.md">
<violation number="1" location="Documentation/hal-riscv-context-switch.md:168">
P3: Documentation claims the U-mode SP is set to the user stack top, but the code still subtracts the 256-byte guard zone; update the comment to describe the actual value so stack sizing guidance is correct.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
ef9e1f3 to
6815c83
Compare
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 'git rebase -i' to rework commits, resolving known issues.
6815c83 to
8f07f77
Compare
User mode tasks require kernel stack isolation to prevent malicious or corrupted user stack pointers from compromising kernel memory during interrupt handling. Without this protection, a user task could set its stack pointer to an invalid or controlled address, causing the ISR to write trap frames to arbitrary memory locations. This commit implements stack isolation using the mscratch register as a discriminator between machine mode and user mode execution contexts. The ISR entry performs a blind swap with mscratch: for machine mode tasks (mscratch=0), the swap is immediately undone to restore the kernel stack pointer. For user mode tasks (mscratch=kernel_stack), the swap provides the kernel stack while preserving the user stack pointer in mscratch. Each user mode task is allocated a dedicated 512-byte kernel stack to ensure complete isolation between tasks and prevent stack overflow attacks. The task control block is extended to track per-task kernel stack allocations. A global pointer references the current task's kernel stack and is updated during each context switch. The ISR loads this pointer to access the appropriate per-task kernel stack through mscratch, replacing the previous approach of using a single global kernel stack shared by all user mode tasks. The interrupt frame structure is extended to include dedicated storage for the stack pointer. Task initialization zeroes the entire frame and correctly sets the initial stack pointer to support the new restoration path. For user mode tasks, the initial ISR frame is constructed on the kernel stack rather than the user stack, ensuring the frame is protected from user manipulation. Enumeration constants replace magic number usage for improved code clarity and consistency. The ISR implementation now includes separate entry and restoration paths for each privilege mode. The M-mode path maintains mscratch=0 throughout execution. The U-mode path saves the user stack pointer from mscratch immediately after frame allocation and restores mscratch to the current task's kernel stack address before returning to user mode, enabling the next trap to use the correct per-task kernel stack. Task initialization was updated to configure mscratch appropriately during the first dispatch. The dispatcher checks the current privilege level and sets mscratch to zero for machine mode tasks. For user mode tasks, it loads the current task's kernel stack pointer if available, with a fallback to the global kernel stack for initial dispatch before the first task switch. The main scheduler initialization ensures the first task's kernel stack pointer is set before entering the scheduling loop. The user mode output system call was modified to bypass the asynchronous logger queue and implement task-level synchronization. Direct output ensures strict FIFO ordering for test output clarity, while preventing task preemption during character transmission avoids interleaving when multiple user tasks print concurrently. This ensures each string is output atomically with respect to other tasks. A test helper function was added to support stack pointer manipulation during validation. Following the Linux kernel's context switching pattern, this provides precise control over stack operations without compiler interference. The validation harness uses this to verify syscall stability under corrupted stack pointer conditions. Documentation updates include the calling convention guide's stack layout section, which now distinguishes between machine mode and user mode task stack organization with detailed diagrams of the dual-stack design. The context switching guide's task initialization section reflects the updated function signature for building initial interrupt frames with per-task kernel stack parameters. Testing validates that system calls succeed even when invoked with a malicious stack pointer (0xDEADBEEF), confirming the ISR correctly uses the per-task kernel stack from mscratch rather than the user-controlled stack pointer.
8f07f77 to
8d60dbd
Compare
All known issues have been resolved, and the rebase is complete. |
Too many commits. Lower the counts while maintaining atomic changes in each commit. |
e1f8aea to
1ee2970
Compare
Introduces the foundational structures for hardware-enforced memory protection using RISC-V PMP. This establishes the complete data model before any operational logic is implemented. PMP infrastructure includes CSR definitions for registers, permission encodings, and hardware constants. TOR mode is adopted for its flexibility in supporting arbitrary address ranges without alignment requirements. Region configuration structures enable priority-based management within the 16-slot hardware limit. Memory abstraction provides three layers: flexpages represent contiguous physical regions with protection attributes, memory spaces group flexpages into per-task protection domains, and memory pools define static regions for boot-time kernel protection. Field naming retains address space prefixes while documentation uses memory space terminology to avoid virtual memory confusion. Kernel memory pools are declared from linker symbols, protecting text as read-execute and data regions as read-write. Each task control block receives a memory space pointer to reference its protection domain through the flexpage mechanism.
1ee2970 to
1bf297a
Compare
Adds creation and destruction functions for flexpages and memory spaces, the two core abstractions for per-task memory isolation. Flexpages represent contiguous physical memory regions with hardware-enforced protection attributes. Memory spaces serve as containers grouping flexpages into per-task protection domains, supporting both isolated and shared memory models.
Provide helper functions for runtime-indexed access to PMP control and status registers alongside existing compile-time CSR macros. RISC-V CSR instructions encode register addresses as immediate values in the instruction itself, making dynamic selection impossible through simple arithmetic. These helpers use switch-case dispatch to map runtime indices to specific CSR instructions while preserving type safety. This enables PMP register management code to iterate over regions without knowing exact register numbers at compile-time. These helpers are designed for use by subsequent region management operations and are marked unused to allow incremental development without compiler warnings. PMP implementation is now included in the build system to make these helpers and future PMP functionality available at link time.
Provides the PMP driver stack for hardware-enforced memory protection, consisting of three layers: global configuration state, region management API, and flexpage driver. A centralized shadow configuration maintains a memory copy of hardware register state, serving as the single source of truth for all PMP operations. A public accessor provides controlled access to this shared state across the kernel. The region management API provides functions for configuring, disabling, and locking PMP regions in TOR mode. Hardware initialization clears all regions and establishes clean state. Region configuration validates address ranges and atomically updates both hardware CSRs and shadow state. Access verification checks memory operations against configured region boundaries and permissions. The flexpage driver bridges software memory abstractions with hardware regions, enabling dynamic loading and eviction. Loading translates flexpage attributes to PMP configuration, while eviction disables regions and clears mappings. Victim selection identifies the highest priority flexpage for eviction, with kernel regions protected from selection to maintain system stability. To maintain architectural independence, the architecture layer implements hardware-specific operations while the kernel layer provides portable wrappers. This enables future support for other memory protection units without modifying higher-level kernel logic.
When a task accesses memory not currently loaded in a hardware region, the system raises an access fault. Rather than panicking, the fault handler attempts recovery by dynamically loading the required region, enabling tasks to access more memory than can fit simultaneously in the available hardware regions. The fault handler examines the faulting address from mtval CSR to locate the corresponding flexpage in the task's memory space. If all hardware regions are occupied, a victim selection algorithm identifies the flexpage with highest priority value for eviction, then reuses its hardware slot for the newly required flexpage. This establishes demand-paging semantics for memory protection where region mappings are loaded on first access. The fault recovery mechanism ensures tasks can utilize their full memory space regardless of hardware region constraints, with kernel regions protected from eviction to maintain system stability.
Memory protection requires dynamic reconfiguration when switching between tasks. Each task receives a dedicated memory space with its stack registered as a protected flexpage. During context switches, the scheduler evicts the outgoing task's regions from hardware slots and loads the incoming task's regions, while kernel regions remain locked across all transitions. Kernel text, data, and BSS regions are configured at boot and protected from eviction. User mode tasks operate in isolated memory domains where they cannot access kernel memory or other tasks' stacks. Nested trap handling is required for correct U-mode operation. When a user mode syscall triggers a yield, the resulting nested trap must not corrupt the outer trap's context. Trap nesting depth tracking ensures only the outermost trap performs context switch restoration, and yield from trap context invokes the scheduler directly without additional trap nesting. A test validates mixed-privilege context switching by spawning M-mode and U-mode tasks that continuously yield, verifying correct operation across privilege boundaries.
The previous panic behavior for unrecoverable faults was intentional during early development to surface bugs immediately. With the core functionality now stable, proper task termination is implemented to uphold the isolation principle. This change introduces a zombie state for deferred task cleanup. The fault handler marks the faulting task as terminated and signals the trap handler to initiate a context switch. The scheduler cleans up terminated task resources before selecting the next runnable task. This design addresses the limitation where running tasks cannot be directly cancelled from interrupt context. By deferring cleanup to the scheduler, the system ensures proper resource reclamation without modifying task state during fault handling. Memory regions are also evicted from hardware protection before being freed, preventing stale references after task termination.
1bf297a to
aeda1b2
Compare
Provides comprehensive documentation covering memory abstraction, context switching, and fault handling for the PMP implementation. The test application validates memory isolation through three tests. Test 1 spawns three U-mode tasks that verify stack integrity using magic values across context switches. Test 2a attempts to write to kernel .text from U-mode, triggering task termination. Test 2b attempts to read another task's exported stack address, validating inter-task isolation. CI scripts are adjusted to recognize expected task termination output as successful rather than a crash.
aeda1b2 to
c78a3f3
Compare
The branch is now reduced to 8 commits (without PR #62) while preserving atomic changes. |
This PR implements PMP (Physical Memory Protection) support for RISC-V to enable hardware-enforced memory isolation in Linmo, addressing #30.
Currently Phase 1 (infrastructure) is complete. This branch will continue development through the remaining phases. Phase 1 adds the foundational structures and declarations: PMP hardware layer in arch/riscv with CSR definitions and region management structures, architecture-independent memory abstractions (flex pages, address spaces, memory pools), kernel memory pool declarations from linker symbols, and TCB extension for address space linkage.
The actual PMP operations including region configuration, CSR manipulation, and context switching integration are not yet implemented.
TOR mode is used for its flexibility with arbitrary address ranges without alignment constraints, simplifying region management for task stacks of varying sizes. Priority-based eviction allows the system to manage competing demands when the 16 hardware regions are exhausted, ensuring critical kernel and stack regions remain protected while allowing temporary mappings to be reclaimed as needed.
Summary by cubic
Enables RISC-V PMP for hardware memory isolation (#30). Uses TOR mode with boot-time kernel protection, trap-time flexpage loading, per-task context switching, and U-mode kernel stack isolation via mscratch; unrecoverable access faults terminate the task instead of panicking.
Written for commit c78a3f3. Summary will update on new commits.