Skip to content

Subhankar-hub/one-stage-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic 1-Stage Processor

Status: Verified and packaged HDL: SystemVerilog Verification: Python (cocotb) with Icarus Verilog

Overview

This project implements a single-cycle, 1-stage processor. The entire instruction lifecycle — Fetch, Decode, Execute, and Writeback — completes within a single clock cycle.

The design features a custom minimal ISA, word-addressed memory, and a simplified control path with no pipelining, interrupts, or hazard forwarding logic. It is fully synthesizable and verified using a Python-based scoreboard reference model.


Architecture

 Data Width : 32-bit
 Registers  : 16 GPRs (x0 hardwired to 0)
 Pipeline   : Single-stage (non-pipelined)
 PC Step    : +1 per cycle (word-addressed)
 Memories   : 256-word IMEM (ROM), 256-word DMEM (RAM)

Datapath Block Diagram

graph LR
    subgraph FETCH
        PC["PC<br/>(register)"]
        ADDER["+1"]
        IMEM["Instruction<br/>Memory<br/><i>u_imem</i>"]
    end

    subgraph DECODE
        DEC["Instruction<br/>Decoder"]
        CTRL["Control<br/>Unit<br/><i>u_ctrl</i>"]
        SEXT["Sign-Extend<br/>imm14 → 32"]
        RF["Register File<br/>16 × 32<br/><i>u_regfile</i>"]
    end

    subgraph EXECUTE
        MUX_B{"MUX<br/>alu_src_imm"}
        ALU["ALU<br/><i>u_alu</i>"]
    end

    subgraph MEMORY
        DMEM["Data<br/>Memory<br/><i>u_dmem</i>"]
    end

    subgraph WRITEBACK
        MUX_WB{"MUX<br/>mem_to_reg"}
    end

    PC -->|addr| IMEM
    PC --> ADDER -->|pc_next| PC
    IMEM -->|instr| DEC
    DEC -->|opcode| CTRL
    DEC -->|rs1, rs2| RF
    DEC -->|imm14| SEXT
    CTRL -->|alu_op| ALU
    CTRL -->|alu_src_imm| MUX_B
    CTRL -->|mem_write| DMEM
    CTRL -->|reg_write| RF
    CTRL -->|mem_to_reg| MUX_WB
    RF -->|rs1_data| ALU
    RF -->|rs2_data| MUX_B
    SEXT -->|imm32| MUX_B
    MUX_B -->|alu_b| ALU
    ALU -->|alu_result| DMEM
    ALU -->|alu_result| MUX_WB
    RF -->|rs2_data| DMEM
    DMEM -->|dmem_rdata| MUX_WB
    MUX_WB -->|write_back_data| RF
    DEC -->|rd| RF

    style PC fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style IMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
    style RF fill:#f0932b,color:#fff,stroke:#b5700e
    style ALU fill:#eb4d4b,color:#fff,stroke:#b33230
    style DMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
    style CTRL fill:#be2edd,color:#fff,stroke:#8c1aab
    style MUX_B fill:#f9ca24,color:#333,stroke:#c9a00c
    style MUX_WB fill:#f9ca24,color:#333,stroke:#c9a00c
Loading

Execution Flow (Single Clock Cycle)

Every instruction completes all four stages within one rising clock edge:

flowchart LR
    F["1. FETCH<br/>────────<br/>Read instr<br/>from IMEM[PC]"]
    D["2. DECODE<br/>────────<br/>Extract fields<br/>Read registers<br/>Generate control"]
    E["3. EXECUTE<br/>────────<br/>ALU computes<br/>result or address"]
    W["4. WRITEBACK<br/>────────<br/>Write result<br/>to register or<br/>data memory"]
    N["PC ← PC + 1"]

    F --> D --> E --> W --> N
    N -.->|next cycle| F

    style F fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style D fill:#be2edd,color:#fff,stroke:#8c1aab
    style E fill:#eb4d4b,color:#fff,stroke:#b33230
    style W fill:#f0932b,color:#fff,stroke:#b5700e
    style N fill:#4a90d9,color:#fff,stroke:#2c5f8a
Loading

Module Hierarchy

graph TD
    TOP["cpu_top"]
    TOP --> IMEM["instruction_memory<br/><i>u_imem</i><br/>256 × 32 ROM"]
    TOP --> CTRL["control_unit<br/><i>u_ctrl</i><br/>Combinational"]
    TOP --> REGF["regfile<br/><i>u_regfile</i><br/>16 × 32, 2R/1W"]
    TOP --> ALUM["alu<br/><i>u_alu</i><br/>ADD / SUB / PASS"]
    TOP --> DMEM["data_memory<br/><i>u_dmem</i><br/>256 × 32 RAM"]
    ISA["isa_defs<br/><i>(package)</i>"]
    ISA -.->|imported by| TOP
    ISA -.->|imported by| CTRL
    ISA -.->|imported by| ALUM
    ISA -.->|imported by| REGF

    style TOP fill:#2c3e50,color:#fff,stroke:#1a252f
    style IMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
    style CTRL fill:#be2edd,color:#fff,stroke:#8c1aab
    style REGF fill:#f0932b,color:#fff,stroke:#b5700e
    style ALUM fill:#eb4d4b,color:#fff,stroke:#b33230
    style DMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
    style ISA fill:#535c68,color:#fff,stroke:#2d3436
Loading

Instruction Set Architecture (ISA)

The processor uses a fixed 32-bit instruction width with a custom encoding.

Instruction Encoding

  31      26 25    22 21    18 17    14 13                 0
 ┌──────────┬────────┬────────┬────────┬────────────────────┐
 │  opcode  │   rd   │  rs1   │  rs2   │      imm14         │
 │  (6 bit) │ (4 bit)│ (4 bit)│ (4 bit)│     (14 bit)       │
 └──────────┴────────┴────────┴────────┴────────────────────┘
       │         │        │        │            │
       │         │        │        │            └─ Signed immediate (two's complement)
       │         │        │        └────────────── Source register 2 / store data
       │         │        └─────────────────────── Source register 1 / base address
       │         └──────────────────────────────── Destination register
       └────────────────────────────────────────── Operation selector

Instruction List

All immediates (imm14) are sign-extended to 32 bits before use.

Mnemonic Opcode Type Assembly Semantics
NOP 0 - NOP No operation
ADD 1 R-type ADD rd, rs1, rs2 rd ← rs1 + rs2
SUB 2 R-type SUB rd, rs1, rs2 rd ← rs1 - rs2
ADDI 3 I-type ADDI rd, rs1, imm rd ← rs1 + sext(imm14)
LOAD 4 I-type LOAD rd, [rs1+imm] rd ← MEM[rs1 + sext(imm14)]
STORE 5 S-type STORE [rs1+imm], rs2 MEM[rs1 + sext(imm14)] ← rs2

Control Signal Matrix

Opcode reg_write mem_write alu_src_imm alu_op mem_to_reg
NOP 0 0 0 ADD 0
ADD 1 0 0 ADD 0
SUB 1 0 0 SUB 0
ADDI 1 0 1 ADD 0
LOAD 1 0 1 ADD 1
STORE 0 1 1 ADD 0

Data Flow Per Instruction Type

flowchart TB
    subgraph R["R-type (ADD / SUB)"]
        direction LR
        R1["rs1_data"] --> RA["ALU"]
        R2["rs2_data"] --> RA
        RA -->|result| RW["rd"]
    end

    subgraph I["I-type (ADDI)"]
        direction LR
        I1["rs1_data"] --> IA["ALU"]
        I2["sext(imm14)"] --> IA
        IA -->|result| IW["rd"]
    end

    subgraph L["LOAD"]
        direction LR
        L1["rs1_data"] --> LA["ALU<br/>addr calc"]
        L2["sext(imm14)"] --> LA
        LA -->|addr| LM["DMEM"]
        LM -->|rdata| LW["rd"]
    end

    subgraph S["STORE"]
        direction LR
        S1["rs1_data"] --> SA["ALU<br/>addr calc"]
        S2["sext(imm14)"] --> SA
        SA -->|addr| SM["DMEM"]
        S3["rs2_data"] -->|wdata| SM
    end

    style R fill:#e8f5e9,stroke:#2e7d32
    style I fill:#e3f2fd,stroke:#1565c0
    style L fill:#fff3e0,stroke:#e65100
    style S fill:#fce4ec,stroke:#c62828
Loading

Verification

The testbench uses cocotb (coroutine-based co-simulation) with a Python reference model (scoreboard) that mirrors the RTL behavior cycle-by-cycle.

Verification Architecture

flowchart LR
    subgraph SIM["Icarus Verilog Simulation"]
        DUT["cpu_top<br/>(DUT)"]
    end

    subgraph COCOTB["cocotb (Python)"]
        DRV["Test Driver<br/>clock, reset,<br/>program inject"]
        SB["Scoreboard<br/>Reference Model"]
        CHK["Checker<br/>reg & mem compare"]
    end

    DRV -->|"drive clk/reset<br/>write IMEM"| DUT
    DUT -->|"read regs[0:15]<br/>read mem[0:31]"| CHK
    DRV -->|"exec_instr()"| SB
    SB -->|"expected state"| CHK
    CHK -->|"PASS / FAIL"| RESULT["results.xml"]

    style DUT fill:#2c3e50,color:#fff,stroke:#1a252f
    style SB fill:#be2edd,color:#fff,stroke:#8c1aab
    style DRV fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style CHK fill:#27ae60,color:#fff,stroke:#1e8449
    style RESULT fill:#f39c12,color:#fff,stroke:#d68910
Loading

Test Suites

Test Module Type Description
test_basic Directed Reset behavior, x0 immutability, ADD/SUB/ADDI, LOAD/STORE correctness against program.hex
test_randomized Constrained random 64 random instructions checked cycle-by-cycle against scoreboard model

Sample Program (sim/program.hex)

The directed test validates this 8-instruction program:

Addr  Hex         Assembly              Effect
───── ────────── ────────────────────── ──────────────────────────
0     00000000    NOP                   (no-op)
1     0c400005    ADDI x1, x0, 5       x1 = 5
2     0c800003    ADDI x2, x0, 3       x2 = 3
3     04c48000    ADD  x3, x1, x2      x3 = 8
4     1400c000    STORE [x0+0], x3      MEM[0] = 8
5     11000000    LOAD x4, [x0+0]       x4 = MEM[0] = 8
6     09508000    SUB  x5, x4, x2      x5 = 5
7     00000000    NOP                   (no-op)

Expected final state: x1=5, x2=3, x3=8, x4=8, x5=5, MEM[0]=8


Running Tests

Prerequisites

Commands

# Run directed tests
make -C tb SIM=icarus MODULE=test_basic sim

# Run randomized tests
make -C tb SIM=icarus MODULE=test_randomized sim

# Run both test suites
make test

# Run randomized tests with a custom seed
TEST_SEED=0xDEAD make -C tb SIM=icarus MODULE=test_randomized sim

Convenience Targets (from tb/)

cd tb
make basic        # directed tests only
make randomized   # randomized tests only
make all          # both suites

Continuous Integration

GitHub Actions runs both test suites automatically on every push and pull request to main.

flowchart LR
    PUSH["Push / PR<br/>to main"] --> J1["test_basic<br/>(icarus)"]
    PUSH --> J2["test_randomized<br/>(icarus)"]
    J1 --> R1{{"PASS /<br/>FAIL"}}
    J2 --> R2{{"PASS /<br/>FAIL"}}
    R1 -->|fail| A1["Upload VCD<br/>waveform"]
    R2 -->|fail| A2["Upload VCD<br/>waveform"]

    style PUSH fill:#4a90d9,color:#fff,stroke:#2c5f8a
    style J1 fill:#27ae60,color:#fff,stroke:#1e8449
    style J2 fill:#27ae60,color:#fff,stroke:#1e8449
    style A1 fill:#e74c3c,color:#fff,stroke:#c0392b
    style A2 fill:#e74c3c,color:#fff,stroke:#c0392b
Loading

The workflow installs Icarus Verilog and cocotb on ubuntu-latest with Python 3.11, runs each test module as a separate matrix job, and uploads VCD waveform artifacts on failure.


Project Structure

one-stage-processor/
├── rtl/                        # Synthesizable SystemVerilog
│   ├── isa_defs.sv             # Opcodes, types, ALU op enum
│   ├── cpu_top.sv              # Top-level: PC, decode, datapath wiring
│   ├── alu.sv                  # ADD / SUB / pass-through
│   ├── regfile.sv              # 16×32 register file (x0 = 0)
│   ├── control_unit.sv         # Combinational opcode → control signals
│   ├── instruction_memory.sv   # 256-word ROM (loaded from hex)
│   └── data_memory.sv          # 256-word RAM (sync write, async read)
├── tb/
│   ├── cocotb/
│   │   ├── test_basic.py       # Directed tests
│   │   ├── test_randomized.py  # Constrained random tests
│   │   └── scoreboard.py       # Python ISA reference model
│   └── Makefile                # cocotb simulation runner
├── sim/
│   └── program.hex             # Pre-loaded instruction memory image
├── .github/
│   └── workflows/
│       └── ci.yml              # GitHub Actions CI pipeline
├── Makefile                    # Top-level build entry point
└── README.md

License

This project is provided for educational and reference purposes.

About

A simple custom One Stage Processor.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •