Status: Verified and packaged HDL: SystemVerilog Verification: Python (cocotb) with Icarus Verilog
This project implements a single-cycle, 1-stage processor. The entire instruction lifecycle — Fetch, Decode, Execute, and Writeback — completes within a single clock cycle.
The design features a custom minimal ISA, word-addressed memory, and a simplified control path with no pipelining, interrupts, or hazard forwarding logic. It is fully synthesizable and verified using a Python-based scoreboard reference model.
Data Width : 32-bit
Registers : 16 GPRs (x0 hardwired to 0)
Pipeline : Single-stage (non-pipelined)
PC Step : +1 per cycle (word-addressed)
Memories : 256-word IMEM (ROM), 256-word DMEM (RAM)
graph LR
subgraph FETCH
PC["PC<br/>(register)"]
ADDER["+1"]
IMEM["Instruction<br/>Memory<br/><i>u_imem</i>"]
end
subgraph DECODE
DEC["Instruction<br/>Decoder"]
CTRL["Control<br/>Unit<br/><i>u_ctrl</i>"]
SEXT["Sign-Extend<br/>imm14 → 32"]
RF["Register File<br/>16 × 32<br/><i>u_regfile</i>"]
end
subgraph EXECUTE
MUX_B{"MUX<br/>alu_src_imm"}
ALU["ALU<br/><i>u_alu</i>"]
end
subgraph MEMORY
DMEM["Data<br/>Memory<br/><i>u_dmem</i>"]
end
subgraph WRITEBACK
MUX_WB{"MUX<br/>mem_to_reg"}
end
PC -->|addr| IMEM
PC --> ADDER -->|pc_next| PC
IMEM -->|instr| DEC
DEC -->|opcode| CTRL
DEC -->|rs1, rs2| RF
DEC -->|imm14| SEXT
CTRL -->|alu_op| ALU
CTRL -->|alu_src_imm| MUX_B
CTRL -->|mem_write| DMEM
CTRL -->|reg_write| RF
CTRL -->|mem_to_reg| MUX_WB
RF -->|rs1_data| ALU
RF -->|rs2_data| MUX_B
SEXT -->|imm32| MUX_B
MUX_B -->|alu_b| ALU
ALU -->|alu_result| DMEM
ALU -->|alu_result| MUX_WB
RF -->|rs2_data| DMEM
DMEM -->|dmem_rdata| MUX_WB
MUX_WB -->|write_back_data| RF
DEC -->|rd| RF
style PC fill:#4a90d9,color:#fff,stroke:#2c5f8a
style IMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
style RF fill:#f0932b,color:#fff,stroke:#b5700e
style ALU fill:#eb4d4b,color:#fff,stroke:#b33230
style DMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
style CTRL fill:#be2edd,color:#fff,stroke:#8c1aab
style MUX_B fill:#f9ca24,color:#333,stroke:#c9a00c
style MUX_WB fill:#f9ca24,color:#333,stroke:#c9a00c
Every instruction completes all four stages within one rising clock edge:
flowchart LR
F["1. FETCH<br/>────────<br/>Read instr<br/>from IMEM[PC]"]
D["2. DECODE<br/>────────<br/>Extract fields<br/>Read registers<br/>Generate control"]
E["3. EXECUTE<br/>────────<br/>ALU computes<br/>result or address"]
W["4. WRITEBACK<br/>────────<br/>Write result<br/>to register or<br/>data memory"]
N["PC ← PC + 1"]
F --> D --> E --> W --> N
N -.->|next cycle| F
style F fill:#4a90d9,color:#fff,stroke:#2c5f8a
style D fill:#be2edd,color:#fff,stroke:#8c1aab
style E fill:#eb4d4b,color:#fff,stroke:#b33230
style W fill:#f0932b,color:#fff,stroke:#b5700e
style N fill:#4a90d9,color:#fff,stroke:#2c5f8a
graph TD
TOP["cpu_top"]
TOP --> IMEM["instruction_memory<br/><i>u_imem</i><br/>256 × 32 ROM"]
TOP --> CTRL["control_unit<br/><i>u_ctrl</i><br/>Combinational"]
TOP --> REGF["regfile<br/><i>u_regfile</i><br/>16 × 32, 2R/1W"]
TOP --> ALUM["alu<br/><i>u_alu</i><br/>ADD / SUB / PASS"]
TOP --> DMEM["data_memory<br/><i>u_dmem</i><br/>256 × 32 RAM"]
ISA["isa_defs<br/><i>(package)</i>"]
ISA -.->|imported by| TOP
ISA -.->|imported by| CTRL
ISA -.->|imported by| ALUM
ISA -.->|imported by| REGF
style TOP fill:#2c3e50,color:#fff,stroke:#1a252f
style IMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
style CTRL fill:#be2edd,color:#fff,stroke:#8c1aab
style REGF fill:#f0932b,color:#fff,stroke:#b5700e
style ALUM fill:#eb4d4b,color:#fff,stroke:#b33230
style DMEM fill:#6ab04c,color:#fff,stroke:#3e7a28
style ISA fill:#535c68,color:#fff,stroke:#2d3436
The processor uses a fixed 32-bit instruction width with a custom encoding.
31 26 25 22 21 18 17 14 13 0
┌──────────┬────────┬────────┬────────┬────────────────────┐
│ opcode │ rd │ rs1 │ rs2 │ imm14 │
│ (6 bit) │ (4 bit)│ (4 bit)│ (4 bit)│ (14 bit) │
└──────────┴────────┴────────┴────────┴────────────────────┘
│ │ │ │ │
│ │ │ │ └─ Signed immediate (two's complement)
│ │ │ └────────────── Source register 2 / store data
│ │ └─────────────────────── Source register 1 / base address
│ └──────────────────────────────── Destination register
└────────────────────────────────────────── Operation selector
All immediates (imm14) are sign-extended to 32 bits before use.
| Mnemonic | Opcode | Type | Assembly | Semantics |
|---|---|---|---|---|
| NOP | 0 |
- | NOP |
No operation |
| ADD | 1 |
R-type | ADD rd, rs1, rs2 |
rd ← rs1 + rs2 |
| SUB | 2 |
R-type | SUB rd, rs1, rs2 |
rd ← rs1 - rs2 |
| ADDI | 3 |
I-type | ADDI rd, rs1, imm |
rd ← rs1 + sext(imm14) |
| LOAD | 4 |
I-type | LOAD rd, [rs1+imm] |
rd ← MEM[rs1 + sext(imm14)] |
| STORE | 5 |
S-type | STORE [rs1+imm], rs2 |
MEM[rs1 + sext(imm14)] ← rs2 |
| Opcode | reg_write | mem_write | alu_src_imm | alu_op | mem_to_reg |
|---|---|---|---|---|---|
| NOP | 0 | 0 | 0 | ADD | 0 |
| ADD | 1 | 0 | 0 | ADD | 0 |
| SUB | 1 | 0 | 0 | SUB | 0 |
| ADDI | 1 | 0 | 1 | ADD | 0 |
| LOAD | 1 | 0 | 1 | ADD | 1 |
| STORE | 0 | 1 | 1 | ADD | 0 |
flowchart TB
subgraph R["R-type (ADD / SUB)"]
direction LR
R1["rs1_data"] --> RA["ALU"]
R2["rs2_data"] --> RA
RA -->|result| RW["rd"]
end
subgraph I["I-type (ADDI)"]
direction LR
I1["rs1_data"] --> IA["ALU"]
I2["sext(imm14)"] --> IA
IA -->|result| IW["rd"]
end
subgraph L["LOAD"]
direction LR
L1["rs1_data"] --> LA["ALU<br/>addr calc"]
L2["sext(imm14)"] --> LA
LA -->|addr| LM["DMEM"]
LM -->|rdata| LW["rd"]
end
subgraph S["STORE"]
direction LR
S1["rs1_data"] --> SA["ALU<br/>addr calc"]
S2["sext(imm14)"] --> SA
SA -->|addr| SM["DMEM"]
S3["rs2_data"] -->|wdata| SM
end
style R fill:#e8f5e9,stroke:#2e7d32
style I fill:#e3f2fd,stroke:#1565c0
style L fill:#fff3e0,stroke:#e65100
style S fill:#fce4ec,stroke:#c62828
The testbench uses cocotb (coroutine-based co-simulation) with a Python reference model (scoreboard) that mirrors the RTL behavior cycle-by-cycle.
flowchart LR
subgraph SIM["Icarus Verilog Simulation"]
DUT["cpu_top<br/>(DUT)"]
end
subgraph COCOTB["cocotb (Python)"]
DRV["Test Driver<br/>clock, reset,<br/>program inject"]
SB["Scoreboard<br/>Reference Model"]
CHK["Checker<br/>reg & mem compare"]
end
DRV -->|"drive clk/reset<br/>write IMEM"| DUT
DUT -->|"read regs[0:15]<br/>read mem[0:31]"| CHK
DRV -->|"exec_instr()"| SB
SB -->|"expected state"| CHK
CHK -->|"PASS / FAIL"| RESULT["results.xml"]
style DUT fill:#2c3e50,color:#fff,stroke:#1a252f
style SB fill:#be2edd,color:#fff,stroke:#8c1aab
style DRV fill:#4a90d9,color:#fff,stroke:#2c5f8a
style CHK fill:#27ae60,color:#fff,stroke:#1e8449
style RESULT fill:#f39c12,color:#fff,stroke:#d68910
| Test Module | Type | Description |
|---|---|---|
test_basic |
Directed | Reset behavior, x0 immutability, ADD/SUB/ADDI, LOAD/STORE correctness against program.hex |
test_randomized |
Constrained random | 64 random instructions checked cycle-by-cycle against scoreboard model |
The directed test validates this 8-instruction program:
Addr Hex Assembly Effect
───── ────────── ────────────────────── ──────────────────────────
0 00000000 NOP (no-op)
1 0c400005 ADDI x1, x0, 5 x1 = 5
2 0c800003 ADDI x2, x0, 3 x2 = 3
3 04c48000 ADD x3, x1, x2 x3 = 8
4 1400c000 STORE [x0+0], x3 MEM[0] = 8
5 11000000 LOAD x4, [x0+0] x4 = MEM[0] = 8
6 09508000 SUB x5, x4, x2 x5 = 5
7 00000000 NOP (no-op)
Expected final state: x1=5, x2=3, x3=8, x4=8, x5=5, MEM[0]=8
- Icarus Verilog (12.0+)
- Python 3.9+
- cocotb (
pip install cocotb)
# Run directed tests
make -C tb SIM=icarus MODULE=test_basic sim
# Run randomized tests
make -C tb SIM=icarus MODULE=test_randomized sim
# Run both test suites
make test
# Run randomized tests with a custom seed
TEST_SEED=0xDEAD make -C tb SIM=icarus MODULE=test_randomized simcd tb
make basic # directed tests only
make randomized # randomized tests only
make all # both suitesGitHub Actions runs both test suites automatically on every push and pull request to main.
flowchart LR
PUSH["Push / PR<br/>to main"] --> J1["test_basic<br/>(icarus)"]
PUSH --> J2["test_randomized<br/>(icarus)"]
J1 --> R1{{"PASS /<br/>FAIL"}}
J2 --> R2{{"PASS /<br/>FAIL"}}
R1 -->|fail| A1["Upload VCD<br/>waveform"]
R2 -->|fail| A2["Upload VCD<br/>waveform"]
style PUSH fill:#4a90d9,color:#fff,stroke:#2c5f8a
style J1 fill:#27ae60,color:#fff,stroke:#1e8449
style J2 fill:#27ae60,color:#fff,stroke:#1e8449
style A1 fill:#e74c3c,color:#fff,stroke:#c0392b
style A2 fill:#e74c3c,color:#fff,stroke:#c0392b
The workflow installs Icarus Verilog and cocotb on ubuntu-latest with Python 3.11, runs each test module as a separate matrix job, and uploads VCD waveform artifacts on failure.
one-stage-processor/
├── rtl/ # Synthesizable SystemVerilog
│ ├── isa_defs.sv # Opcodes, types, ALU op enum
│ ├── cpu_top.sv # Top-level: PC, decode, datapath wiring
│ ├── alu.sv # ADD / SUB / pass-through
│ ├── regfile.sv # 16×32 register file (x0 = 0)
│ ├── control_unit.sv # Combinational opcode → control signals
│ ├── instruction_memory.sv # 256-word ROM (loaded from hex)
│ └── data_memory.sv # 256-word RAM (sync write, async read)
├── tb/
│ ├── cocotb/
│ │ ├── test_basic.py # Directed tests
│ │ ├── test_randomized.py # Constrained random tests
│ │ └── scoreboard.py # Python ISA reference model
│ └── Makefile # cocotb simulation runner
├── sim/
│ └── program.hex # Pre-loaded instruction memory image
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI pipeline
├── Makefile # Top-level build entry point
└── README.md
This project is provided for educational and reference purposes.