Skip to content

Fixes for Synthesis, Logic, and Verification#52

Open
ridash2005 wants to merge 9 commits intoadam-maj:masterfrom
ridash2005:master
Open

Fixes for Synthesis, Logic, and Verification#52
ridash2005 wants to merge 9 commits intoadam-maj:masterfrom
ridash2005:master

Conversation

@ridash2005
Copy link

@ridash2005 ridash2005 commented Feb 17, 2026

Summary of Changes

This PR addresses critical logic flaws in the instruction dispatcher, fixes structural SystemVerilog syntax errors for synthesis compatibility, and updates the cocotb verification environment for modern API support.

1. Critical Logic & Race Condition Fixes

  • Dispatcher (src/dispatch.sv): Fixed a bug where blocks_dispatched was not updating correctly due to non-blocking assignments inside a loop. This prevented proper block scheduling. Implemented a temporary variable to correctly accumulate dispatched blocks in a single cycle.
  • Controller (src/controller.sv): Resolved multiple driver issues and potential race conditions in the main controller logic.
  • Scheduler (src/scheduler.sv): Modified program counter logic to update based on Thread 0 instead of the last thread in the block, fixing execution errors when kernels use fewer threads than the maximum block size.
  • Device Control Register (src/dcr.sv): Fixed a typo (device_conrol_registerdevice_control_register) and corrected port declarations to ensure proper synthesis.

2. Structural & Syntax Improvements

  • GPU Top-Level (src/gpu.sv): Converted core_lsu_* signals from reg to wire. These signals are driven by submodule instances, and declaring them as reg was structurally incorrect.
  • Fetcher (src/fetcher.sv): Removed trailing commas in port lists for stricter SystemVerilog compliance.

3. Verification Environment Updates

  • Cocotb 2.0+ Compatibility:
    • test/helpers/setup.py: Updated Clock instantiation to use the correct unit parameter (replacing units).
    • test/helpers/format.py: Added explicit int() casts for signal handles to prevent TypeError exceptions during arithmetic operations in simulations.

Fixed multiple driver issues in controller.sv, typo in dcr.sv, and syntax error in gpu.sv. Updated dispatch.sv to use non-blocking assignments.
Converted core.sv internal reg signals to wire. Fixed scheduler.sv to use Thread 0 PC to avoid inactive thread issues.
Fixed dcr.sv port comma, scheduler logic variable scope, and parameter syntax.
Removed docs/CHANGELOG.md and docs/DESIGN_FIXES.md from git tracking but kept locally. Added to .gitignore.
@ridash2005
Copy link
Author

ridash2005 commented Feb 17, 2026

1. Logic & Race Condition Fixes

Dispatcher (src/dispatch.sv)

Problem:
The original implementation utilized non-blocking assignments (<=) for the blocks_dispatched counter inside the core assignment loop. Since non-blocking assignments only update at the end of the time step, the counter value remained constant for all iterations of the loop within a single clock cycle.

  • Consequence: If multiple cores were ready simultaneously, they would all read the same blocks_dispatched value (e.g., Block 0). This resulted in duplicate block processings and subsequent blocks (e.g., Block 1) being skipped entirely.

Solution:
Implemented a temporary blocking variable next_blocks_dispatched to accumulate the counter immediately within the loop.

// AFTER
reg [7:0] next_blocks_dispatched;
next_blocks_dispatched = blocks_dispatched;

for (int i = 0; i < NUM_CORES; i++) begin
    if (core_ready) begin
        core_block_id[i] <= next_blocks_dispatched;
        next_blocks_dispatched = next_blocks_dispatched + 1; // Updates immediately
    end
end
blocks_dispatched <= next_blocks_dispatched;

Impact: Ensures correct, sequential block ID assignment across all available cores in a single cycle, guaranteeing that every block is executed exactly once.


Memory Controller (src/controller.sv)

Problem:
The memory controller arbitrates requests from multiple consumers (cores) to limited memory channels. Using non-blocking assignments for the channel_serving_consumer state bitmask inside the arbitration loop created a race condition.

  • Consequence: If Channel 0 claimed a consumer request, the bitmask wouldn't update until the next clock edge. When the loop proceeded to arbitrate for Channel 1 in the same cycle, it would still see the consumer as "free," leading to multiple channels attempting to service the same request or state corruption.

Solution:
Introduced a local blocking variable next_channel_serving_consumer to track arbitration state instantaneously within the cycle.

// AFTER
reg [NUM_CONSUMERS-1:0] next_channel_serving_consumer;
next_channel_serving_consumer = channel_serving_consumer;

for (int i = 0; i < NUM_CHANNELS; i = i + 1) begin 
    // ... inside IDLE state ...
    for (int j = 0; j < NUM_CONSUMERS; j = j + 1) begin 
        if (consumer_is_valid && !next_channel_serving_consumer[j]) begin 
             next_channel_serving_consumer[j] = 1; // Marks as busy IMMEDIATELY
             // Assign to channel...
             break;
        end
    end
end
channel_serving_consumer <= next_channel_serving_consumer;

Impact: Enforces strict one-to-one mapping between channels and active requests within a single clock cycle, preventing memory access collisions and deadlocks.

Scheduler (src/scheduler.sv)

Problem:
The scheduler was updating the core's current_pc based on next_pc[THREADS_PER_BLOCK-1] (the last thread in the block).

  • Consequence: If thread_count < THREADS_PER_BLOCK (e.g., 5 threads in an 8-thread block), the last thread is inactive and produces an undefined or zero next_pc. This caused the Program Counter to jump to garbage locations or effectively reset when running kernels with fewer threads than the maximum block size.

Solution:
Changed logic to use next_pc[0].

// BEFORE
current_pc <= next_pc[THREADS_PER_BLOCK-1];

// AFTER
current_pc <= next_pc[0];
  • Why: Thread 0 is always guaranteed to be active if the block itself is dispatched.
    Impact: Enables correct execution of kernels with arbitrary thread counts (e.g., matrix addition with 8 threads on a hardware configuration supporting larger blocks), preventing crashes or infinite loops.

2. Structural & Syntax Improvements

GPU Top-Level (src/gpu.sv)

Problem:
The core_lsu_* signals (read/write valid, address, data) were declared as reg types, yet they are driven continuously by submodule instances (lsu and memory_controller).

  • Consequence: In Verilog, reg is intended for procedural assignments (inside always blocks). Using reg for signals driven by module outputs or assign statements is semantically incorrect and can confusing synthesis tools, leading to "multiple driver" warnings or simulation mismatches where the signal does not update as expected.

Solution:
Changed declarations from reg to wire.

// BEFORE
reg [THREADS_PER_BLOCK-1:0] core_lsu_read_valid; // Incorrect

// AFTER
wire [THREADS_PER_BLOCK-1:0] core_lsu_read_valid; // Correct

Impact: Ensures the design models physical connections correctly, eliminating synthesis warnings and ensuring reliable signal propagation between submodules.


Instruction Fetcher (src/fetcher.sv)

Problem:
The module port list contained a trailing comma after the last port declaration.

output reg [PROGRAM_MEM_DATA_BITS-1:0] instruction, // <-- Trailing comma
);
  • Consequence: While some lenient simulators (like Icarus Verilog) might accept this, strictly compliant SystemVerilog parsers and tools like sv2v (SystemVerilog to Verilog converter) will fail to parse the file, causing build errors.

Solution:
Removed the trailing comma.
Impact: Restores standard SystemVerilog compliance, allowing the project to be built with a wider range of tools.


Device Control Register (src/dcr.sv)

Problem:
The internal register variable had a typo: device_conrol_register (missing 't').

  • Consequence: This typo would cause "undeclared identifier" errors during synthesis or simulation if the tool strictly checks variable usage against declarations, or potential inference of implicit wires in loose configurations, which is dangerous.

Solution:
Renamed to device_control_register.
Impact: Improves code readability and ensures correct variable usage.
[diff_block_end]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant