Skip to content

COMPAS-Lab/sparsity-intel-tensor-core-transformers-accel

Repository files navigation

SpMM with Block Aggregation using Intel Tensor Block Chain

This repo contains the hardware design as well as a simple testbench for the SpMM with Block Aggregation, on Intel/Altera Stratix 10 NX platform.

If you would like to cite this work, use

@inproceedings{enabling-efficient-spmm-for-sparse-attention-on-gemm-optimized-hardware-with-block-aggregation,
  author    = {Ji, Tianchu and Balasubramanian, Niranjan and Ferdman, Michael and Milder, Peter},
  title     = {Enabling Efficient SpMM for Sparse Attention on GEMM-Optimized Hardware with Block Aggregation},
  booktitle = {Proceedings of the 2026 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '26)},
  year      = {2026},
  isbn      = {9798400720796},
  address   = {Monterey, CA, USA},
  publisher = {Association for Computing Machinery},
  numpages  = {12},
  keywords  = {sparse-dense matrix multiplication, self-attention, sparse attention, Tensor Block},
  doi       = {10.1145/3748173.3779187},
  url       = {https://doi.org/10.1145/3748173.3779187},
}

Build

Prerequest:

  • sdkman
  • java: 11.0.23-amzn
    sdk install java 11.0.23-amzn
    
  • sbt: 1.11.7 (for building SpinalHDL project)
    sdk install sbt 1.11.7
    
  • python 3.13
  • Quartus 21.4
  • QuestaSim 2024.3

Clone this repo

git clone --recurse-submodules git@github.com:COMPAS-Lab/sparsity-intel-tensor-core-transformers-accel.git spmm_core
cd spmm_core
git checkout block_sparse_core

Install python requirments

Making sure to create and activate a virtual environment before installing python requirements, then

pip install -r requirements.txt

Generate the verilog for this project:

sbt "runMain mvm.tensor_core_array_wrapper_gen"

This will generate ./src/generated_spmm_core_6x12x8 directory with verilog files for the design, as well as a source code file list ./src/generated_spmm_core_6x12x8tensor_core_array_wrapper.lst. All the files in the .lst should be added to the Quartus project if generating sof file for FPGA.

Simulation:

Simulating SpMM core requires a QuestaSim and Cocotb. A COMPAS-forked cocotb 1.9.2 with an alternated makefile template is provided:

git clone git@github.com:COMPAS-Lab/sparsity-compas-forked-cocotb.git compas-cocotb
cd compas-cocotb 
pip install . -e

1. Preparing BFP-format attention value

A set of extracted sparse attention value with BFP-format is provided to simplify the simulation process. It is extracted from chatglm2-6b-32k on LongBench's vcsum test. Download and extract partial BFP-format attention value from COMPAS NFS and extract it. The extracted data should contain sparse attention values of 10 instances:

iiSeqInst0105_cidx.npy  iiSeqInst0105_val.npy   iiSeqInst0109_ridx.npy  iiSeqInst0112.json      
iiSeqInst0117_cidx.npy  iiSeqInst0117_val.npy   iiSeqInst0121_ridx.npy  iiSeqInst0131.json      
iiSeqInst0135_cidx.npy  iiSeqInst0135_val.npy   iiSeqInst0139_ridx.npy  iiSeqInst0141.json      
iiSeqInst0147_cidx.npy  iiSeqInst0147_val.npy   iiSeqInst0105.json      iiSeqInst0109_cidx.npy  
iiSeqInst0109_val.npy   iiSeqInst0112_ridx.npy  iiSeqInst0117.json      iiSeqInst0121_cidx.npy  
iiSeqInst0121_val.npy   iiSeqInst0131_ridx.npy  iiSeqInst0135.json      iiSeqInst0139_cidx.npy  
iiSeqInst0139_val.npy   iiSeqInst0141_ridx.npy  iiSeqInst0147.json      iiSeqInst0105_ridx.npy  
iiSeqInst0109.json      iiSeqInst0112_cidx.npy  iiSeqInst0112_val.npy   iiSeqInst0117_ridx.npy  
iiSeqInst0121.json      iiSeqInst0131_cidx.npy  iiSeqInst0131_val.npy   iiSeqInst0135_ridx.npy  
iiSeqInst0139.json      iiSeqInst0141_cidx.npy  iiSeqInst0141_val.npy   iiSeqInst0147_ridx.npy

The data is generated by the sparse attention analyzer. Each iiSeqInst<inst_id>_val.npy contains the sparsified heads from one instance, where the inst_id is the sequence id in the Longbench test set. Likewise, iiSeqInst<inst_id>_cidx.npy and iiSeqInst<inst_id>_ridx.npy contains the column and row block index of the remained dense values. These data will be provided to the testbench to generate SpMM test case.

A copy of extracted data is located at /compas-old/projects/sparse-attention on COMPAS NFS.

2. Setting up cocotb testbench

Specify the path to the SpMM core design generated by SpinalHDL in cocotb's makefile

# specify spmm core src path here
VERILOG_SOURCES += $(PWD)/../src/generated_spmm_core_6x12x8/*.v

Specify the inst_id and head_idx for simulation in cocotb testbench:

inst_id = "iiSeqInst0105"
hidx = 15

Specify the test data path in cocotb testbench. Also specify an empty directory path for json file generated by the attention value parser. Make sure to provide the path to the data you extracted in Preparing BFP-format attention value.

attn_data_path = "/compas-old/projects/sparse-attention/chatglm2-6b-32k-attn-bfp20-vcsum/"
inter_config_path = f"/compas-old/projects/sparse-attention/sim_test/chatglm2-6b-32k-attn-bfp20-vcsum/{inst_id}"

3. Start simulation

cd sim
make -f makefile.cocotb clean
make -f makefile.cocotb GUI=0

This will start QuestaSim and run the simulation. If you want to view the waveform, specify GUI=1

Generate bitstream

Refer to nx10-matmul-project to build the hardware.

About

SpMM core using Intel/Altera Tensor Blocks and Block Aggregation

Resources

License

Stars

Watchers

Forks