-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Description
In some cases, it is necessary to insert a hardware barrier (snrt_cluster_hw_barrier()) after a vle32.v loads to get correct results. This does not happen consistently, making the issue intermittent.
I was able to reproduce the problem by implementing a vectorized backward_solve function for upper-triangular linear systems. For this reason, I suspect the issue may be related to the vector length (vl) not being a multiple of 2, although this still needs verification.
How to reproduce
- Extract the attached zip archive into
sw/spatzBenchmarks - Add the following lines to
sw/spatzBenchmarks/CMakeLists.txt
add_library(backward-solve backward-solve/kernel/backward-solve.c)
add_spatz_test_oneParam(backward-solve backward-solve/main.c 64)- Build and run the test
Expected Result
The test should fail computing the correct solution of the linera system.
Workaround
- Uncomment the hw barrier after the two loads in
kernel/backward-solve.c
asm volatile ("vle32.v v8, (%0)" :: "r"(p_dst));
asm volatile ("vle32.v v0, (%0)" :: "r"(p_mat));
snrt_cluster_hw_barrier(); - re-build and run test
- now test computes the correct solution of the linear system
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels