Replace FMA's LZC with CVW's LZA#149
Conversation
|
The following script can be used to verify that the proposed changes are sequentially equivalent to the current implementation with Synopsys VC formal 's sequential equivalence check ( Make sure to have the correct paths to
|
|
Hi @emustafa96. I tested the PR making use of the UVM testbench https://github.com/openhwgroup/cvfpu-uvm.git. In my test I set the FPU instance implementation in order to have merged slice for FMA unit so that the ADD MUL operations can stress your modifications. As a regression test I ran 10000 random transactions with random operation, operands, FP format and FP rounding mode repeated for 10 different seeds then the results have been compared with those given by the MPFR golden model. I can see that everything is fine so if you agree with my test and results I think that the PR can be merged. |
|
Hi @rgiunti, Thank you for the efforts! Concluding from the formal equivalence check and your testing, I also think we can merge. |
Replace leading zero counter with leading zero anticipator in FMA sum path
Summary
This PR optimizes the floating-point multiply-add (FMA) unit by replacing the sequential leading zero counter (LZC) in the sum path with a parallel leading zero anticipator (LZA). This change removes normalization from the critical path, significantly improving FMA performance.
Problem
The previous implementation computed the sum first, then counted leading zeros for normalization:
This sequential approach added unnecessary latency to the FMA operation, as normalization had to wait for the complete sum calculation.
Solution
Added Schmookler's leading zero anticipation algorithm IEEEX, implemented in the Walley Core that predicts the normalization shift count in parallel with the sum computation:
Technical Details
The LZA implementation:
subcontrol signalTesting