Skip to content

Conversation

@HeatCrab
Copy link
Collaborator

@HeatCrab HeatCrab commented Oct 29, 2025

This PR implements PMP (Physical Memory Protection) support for RISC-V to enable hardware-enforced memory isolation in Linmo, addressing #30.

Currently Phase 1 (infrastructure) is complete. This branch will continue development through the remaining phases. Phase 1 adds the foundational structures and declarations: PMP hardware layer in arch/riscv with CSR definitions and region management structures, architecture-independent memory abstractions (flex pages, address spaces, memory pools), kernel memory pool declarations from linker symbols, and TCB extension for address space linkage.

The actual PMP operations including region configuration, CSR manipulation, and context switching integration are not yet implemented.

TOR mode is used for its flexibility with arbitrary address ranges without alignment constraints, simplifying region management for task stacks of varying sizes. Priority-based eviction allows the system to manage competing demands when the 16 hardware regions are exhausted, ensuring critical kernel and stack regions remain protected while allowing temporary mappings to be reclaimed as needed.


Summary by cubic

Enables RISC-V PMP for hardware memory isolation (#30). Uses TOR mode with boot-time kernel protection, trap-time flexpage loading, per-task context switching, and U-mode kernel stack isolation via mscratch; unrecoverable access faults terminate the task instead of panicking.

  • New Features
    • PMP CSR and numeric accessors; TOR-mode region set/disable/lock/read and access checks with shadow state.
    • Kernel memory pools from linker symbols: text RX; data/bss RW (no execute); heap/stack RW (no execute).
    • Flexpages and memory spaces with on-demand load/evict and victim selection; TCB linked to a memory space.
    • U-mode kernel stack isolation via mscratch (ISR frame includes SP); trap handler is nested-trap safe and performs load/evict on access faults with task termination on unrecoverable faults; context switch swaps task regions.

Written for commit c78a3f3. Summary will update on new commits.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unified "flexpage" notation.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from e264a35 to 4a62d5b Compare October 31, 2025 13:25
@HeatCrab
Copy link
Collaborator Author

Use unified "flexpage" notation.

Got it! Thanks for the correction and the L4 X.2 reference.
I've fixed all occurrences to use "flexpage" notation.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 5 times, most recently from 109259d to f6c3912 Compare November 6, 2025 09:16
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 2 times, most recently from 2644558 to 1bb5fcf Compare November 16, 2025 13:18
jserv

This comment was marked as outdated.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 6 times, most recently from 904e972 to ed800fc Compare November 21, 2025 12:38
@HeatCrab

This comment was marked as outdated.

jserv

This comment was marked as resolved.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 6 times, most recently from 0d55f21 to 865a5d6 Compare November 22, 2025 08:36
@HeatCrab
Copy link
Collaborator Author

Rebase the latest 'main' branch to resolve rtsched issues.

Finished. And I removed the M-mode fault-handling commits, as they are not aligned with the upcoming work.
Next, I plan to start U-mode support (#19) on a new branch, and then circle back to complete the PMP development and apply any adjustments that may be needed after the U-mode integration.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from 865a5d6 to 7e3992e Compare December 11, 2025 08:51
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 4 times, most recently from c190707 to 093ff06 Compare January 12, 2026 11:38
@HeatCrab HeatCrab marked this pull request as ready for review January 12, 2026 16:59
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 25 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="arch/riscv/pmp.c">

<violation number="1" location="arch/riscv/pmp.c:427">
P1: Integer overflow in access check: `addr + size` can wrap around, potentially bypassing PMP security checks. Validate that `addr + size` does not overflow before computing the end address.</violation>

<violation number="2" location="arch/riscv/pmp.c:636">
P1: `next_region_idx` is never incremented after use, causing subsequent fault handlers to overwrite the same PMP region repeatedly instead of using new slots.</violation>

<violation number="3" location="arch/riscv/pmp.c:644">
P0: Use-after-invalidation bug: `victim->pmp_id` is used after `pmp_evict_fpage()` sets it to `PMP_INVALID_REGION`. Save the region index before eviction.</violation>
</file>

<file name="kernel/task.c">

<violation number="1" location="kernel/task.c:385">
P2: Memory leak: `tcb->mspace` is not destroyed in the kernel_stack allocation error path. Since this is inside a `user_mode` block, `mspace` was already created earlier and should be freed.</violation>
</file>

<file name="kernel/syscall.c">

<violation number="1" location="kernel/syscall.c:400">
P1: Holding NOSCHED_ENTER for the entire user-controlled string makes sys_tputs vulnerable to denial of service; a malicious U-mode task can pass an unterminated or extremely long string and block the scheduler indefinitely.</violation>
</file>

<file name="Documentation/hal-riscv-context-switch.md">

<violation number="1" location="Documentation/hal-riscv-context-switch.md:168">
P3: Documentation claims the U-mode SP is set to the user stack top, but the code still subtracts the 256-byte guard zone; update the comment to describe the actual value so stack sizing guidance is correct.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from ef9e1f3 to 6815c83 Compare January 14, 2026 06:28
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 'git rebase -i' to rework commits, resolving known issues.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from 6815c83 to 8f07f77 Compare January 14, 2026 06:59
User mode tasks require kernel stack isolation to prevent malicious or
corrupted user stack pointers from compromising kernel memory during
interrupt handling. Without this protection, a user task could set its
stack pointer to an invalid or controlled address, causing the ISR to
write trap frames to arbitrary memory locations.

This commit implements stack isolation using the mscratch register as a
discriminator between machine mode and user mode execution contexts. The
ISR entry performs a blind swap with mscratch: for machine mode tasks
(mscratch=0), the swap is immediately undone to restore the kernel stack
pointer. For user mode tasks (mscratch=kernel_stack), the swap provides
the kernel stack while preserving the user stack pointer in mscratch.

Each user mode task is allocated a dedicated 512-byte kernel stack to
ensure complete isolation between tasks and prevent stack overflow
attacks. The task control block is extended to track per-task kernel
stack allocations. A global pointer references the current task's kernel
stack and is updated during each context switch. The ISR loads this
pointer to access the appropriate per-task kernel stack through
mscratch, replacing the previous approach of using a single global
kernel stack shared by all user mode tasks.

The interrupt frame structure is extended to include dedicated storage
for the stack pointer. Task initialization zeroes the entire frame and
correctly sets the initial stack pointer to support the new restoration
path. For user mode tasks, the initial ISR frame is constructed on the
kernel stack rather than the user stack, ensuring the frame is protected
from user manipulation. Enumeration constants replace magic number usage
for improved code clarity and consistency.

The ISR implementation now includes separate entry and restoration paths
for each privilege mode. The M-mode path maintains mscratch=0 throughout
execution. The U-mode path saves the user stack pointer from mscratch
immediately after frame allocation and restores mscratch to the current
task's kernel stack address before returning to user mode, enabling the
next trap to use the correct per-task kernel stack.

Task initialization was updated to configure mscratch appropriately
during the first dispatch. The dispatcher checks the current privilege
level and sets mscratch to zero for machine mode tasks. For user mode
tasks, it loads the current task's kernel stack pointer if available,
with a fallback to the global kernel stack for initial dispatch before
the first task switch. The main scheduler initialization ensures the
first task's kernel stack pointer is set before entering the scheduling
loop.

The user mode output system call was modified to bypass the asynchronous
logger queue and implement task-level synchronization. Direct output
ensures strict FIFO ordering for test output clarity, while preventing
task preemption during character transmission avoids interleaving when
multiple user tasks print concurrently. This ensures each string is
output atomically with respect to other tasks.

A test helper function was added to support stack pointer manipulation
during validation. Following the Linux kernel's context switching
pattern, this provides precise control over stack operations without
compiler interference. The validation harness uses this to verify
syscall stability under corrupted stack pointer conditions.

Documentation updates include the calling convention guide's stack layout
section, which now distinguishes between machine mode and user mode task
stack organization with detailed diagrams of the dual-stack design. The
context switching guide's task initialization section reflects the
updated function signature for building initial interrupt frames with
per-task kernel stack parameters.

Testing validates that system calls succeed even when invoked with a
malicious stack pointer (0xDEADBEEF), confirming the ISR correctly uses
the per-task kernel stack from mscratch rather than the user-controlled
stack pointer.
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from 8f07f77 to 8d60dbd Compare January 14, 2026 08:42
@HeatCrab
Copy link
Collaborator Author

Use 'git rebase -i' to rework commits, resolving known issues.

All known issues have been resolved, and the rebase is complete.

@jserv
Copy link
Contributor

jserv commented Jan 14, 2026

All known issues have been resolved, and the rebase is complete.

Too many commits. Lower the counts while maintaining atomic changes in each commit.

@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch 2 times, most recently from e1f8aea to 1ee2970 Compare January 14, 2026 13:04
Introduces the foundational structures for hardware-enforced memory
protection using RISC-V PMP. This establishes the complete data model
before any operational logic is implemented.

PMP infrastructure includes CSR definitions for registers, permission
encodings, and hardware constants. TOR mode is adopted for its
flexibility in supporting arbitrary address ranges without alignment
requirements. Region configuration structures enable priority-based
management within the 16-slot hardware limit.

Memory abstraction provides three layers: flexpages represent
contiguous physical regions with protection attributes, memory spaces
group flexpages into per-task protection domains, and memory pools
define static regions for boot-time kernel protection. Field naming
retains address space prefixes while documentation uses memory space
terminology to avoid virtual memory confusion.

Kernel memory pools are declared from linker symbols, protecting text
as read-execute and data regions as read-write. Each task control
block receives a memory space pointer to reference its protection
domain through the flexpage mechanism.
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from 1ee2970 to 1bf297a Compare January 14, 2026 13:17
Adds creation and destruction functions for flexpages and memory
spaces, the two core abstractions for per-task memory isolation.

Flexpages represent contiguous physical memory regions with
hardware-enforced protection attributes. Memory spaces serve as
containers grouping flexpages into per-task protection domains,
supporting both isolated and shared memory models.
Provide helper functions for runtime-indexed access to PMP control
and status registers alongside existing compile-time CSR macros.
RISC-V CSR instructions encode register addresses as immediate
values in the instruction itself, making dynamic selection
impossible through simple arithmetic. These helpers use
switch-case dispatch to map runtime indices to specific CSR
instructions while preserving type safety.

This enables PMP register management code to iterate over regions
without knowing exact register numbers at compile-time. These
helpers are designed for use by subsequent region management
operations and are marked unused to allow incremental development
without compiler warnings.

PMP implementation is now included in the build system to make
these helpers and future PMP functionality available at link time.
Provides the PMP driver stack for hardware-enforced memory protection,
consisting of three layers: global configuration state, region
management API, and flexpage driver.

A centralized shadow configuration maintains a memory copy of hardware
register state, serving as the single source of truth for all PMP
operations. A public accessor provides controlled access to this shared
state across the kernel.

The region management API provides functions for configuring,
disabling, and locking PMP regions in TOR mode. Hardware initialization
clears all regions and establishes clean state. Region configuration
validates address ranges and atomically updates both hardware CSRs and
shadow state. Access verification checks memory operations against
configured region boundaries and permissions.

The flexpage driver bridges software memory abstractions with hardware
regions, enabling dynamic loading and eviction. Loading translates
flexpage attributes to PMP configuration, while eviction disables
regions and clears mappings. Victim selection identifies the highest
priority flexpage for eviction, with kernel regions protected from
selection to maintain system stability.

To maintain architectural independence, the architecture layer
implements hardware-specific operations while the kernel layer provides
portable wrappers. This enables future support for other memory
protection units without modifying higher-level kernel logic.
When a task accesses memory not currently loaded in a hardware region,
the system raises an access fault. Rather than panicking, the fault
handler attempts recovery by dynamically loading the required region,
enabling tasks to access more memory than can fit simultaneously in the
available hardware regions.

The fault handler examines the faulting address from mtval CSR to locate
the corresponding flexpage in the task's memory space. If all hardware
regions are occupied, a victim selection algorithm identifies the
flexpage with highest priority value for eviction, then reuses its
hardware slot for the newly required flexpage.

This establishes demand-paging semantics for memory protection where
region mappings are loaded on first access. The fault recovery mechanism
ensures tasks can utilize their full memory space regardless of hardware
region constraints, with kernel regions protected from eviction to
maintain system stability.
Memory protection requires dynamic reconfiguration when switching
between tasks. Each task receives a dedicated memory space with its
stack registered as a protected flexpage. During context switches, the
scheduler evicts the outgoing task's regions from hardware slots and
loads the incoming task's regions, while kernel regions remain locked
across all transitions.

Kernel text, data, and BSS regions are configured at boot and protected
from eviction. User mode tasks operate in isolated memory domains where
they cannot access kernel memory or other tasks' stacks.

Nested trap handling is required for correct U-mode operation. When a
user mode syscall triggers a yield, the resulting nested trap must not
corrupt the outer trap's context. Trap nesting depth tracking ensures
only the outermost trap performs context switch restoration, and yield
from trap context invokes the scheduler directly without additional
trap nesting.

A test validates mixed-privilege context switching by spawning M-mode
and U-mode tasks that continuously yield, verifying correct operation
across privilege boundaries.
The previous panic behavior for unrecoverable faults was intentional
during early development to surface bugs immediately. With the core
functionality now stable, proper task termination is implemented to
uphold the isolation principle.

This change introduces a zombie state for deferred task cleanup. The
fault handler marks the faulting task as terminated and signals the trap
handler to initiate a context switch. The scheduler cleans up terminated
task resources before selecting the next runnable task.

This design addresses the limitation where running tasks cannot be
directly cancelled from interrupt context. By deferring cleanup to the
scheduler, the system ensures proper resource reclamation without
modifying task state during fault handling.

Memory regions are also evicted from hardware protection before being
freed, preventing stale references after task termination.
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from 1bf297a to aeda1b2 Compare January 15, 2026 06:48
Provides comprehensive documentation covering memory abstraction,
context switching, and fault handling for the PMP implementation.

The test application validates memory isolation through three tests.
Test 1 spawns three U-mode tasks that verify stack integrity using
magic values across context switches. Test 2a attempts to write to
kernel .text from U-mode, triggering task termination. Test 2b
attempts to read another task's exported stack address, validating
inter-task isolation.

CI scripts are adjusted to recognize expected task termination output
as successful rather than a crash.
@HeatCrab HeatCrab force-pushed the pmp/memory-isolation branch from aeda1b2 to c78a3f3 Compare January 15, 2026 06:52
@HeatCrab
Copy link
Collaborator Author

HeatCrab commented Jan 15, 2026

Too many commits. Lower the counts while maintaining atomic changes in each commit.

The branch is now reduced to 8 commits (without PR #62) while preserving atomic changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants