SparkInfer

Fastest local LLM inference by activation sparsity

Demo Video

demo.mp4

SparkInfer v.s. PowerInfer on a single RTX 4090(24GB) running ProSparse-Llama-2-13B FP16(26GB) with 2 times speedup!

Abstract

We present SparkInfer, an adaptive GPU–CPU hybrid inference system that addresses these limitations through online neuron balancing, a mechanism that dynamically redistributes neurons between the GPU and CPU based on activation behaviors. Extensive evaluations on consumer-grade PCs demonstrate that SparkInfer improves end-to-end throughput by up to 5.05×, 2.48×, and 3.71× over llama.cpp, PowerInfer, and Neuralink, respectively.

Models Weights

Supported Models

// TODO

Download Models

// TODO

Getting Started with SparkInfer

Compilation Instructions

To compile SparkInfer, follow these steps:

cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DGGML_CUDA=ON \
    -DBUILD_SHARED_LIBS=OFF \
    -DGGML_CUDA_GRAPHS=OFF
cmake --build build --config Release -j"$(nproc)" --target llama-cli

Optional Compilation Speed-Up: You can specify -DCMAKE_CUDA_ARCHITECTURES="" to reduce compilation time by targeting only your NVIDIA GPU architecture. Replace the value with the appropriate architecture for your GPU. For example:

For RTX 4090: -DCMAKE_CUDA_ARCHITECTURES="89-real"

Running a Demo

Update the model path (and model-split path) and configure the desired hardware settings in run_demo.sh.
Execute the demo script:

bash run_demo.sh

Evaluation

We evaluated SparkInfer against llama.cpp, PowerInfer, and Neuralink on two PC configurations: PC-Low (NVIDIA RTX 3080Ti, 12GB) and PC-High (NVIDIA RTX 4090, 24GB). The results below demonstrate that SparkInfer achieves significant performance improvements, with up to 5.05× speedup over llama.cpp, 3.71× over Neuralink, and 2.48× over PowerInfer.

More details could be find in our paper.

Paper and Citation

// TODO

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devops		.devops
.github		.github
benches/dgx-spark		benches/dgx-spark
ci		ci
cmake		cmake
common		common
docs		docs
eval_scripts		eval_scripts
examples		examples
figures		figures
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
compile_sparkinfer.sh		compile_sparkinfer.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
debug_sparkinfer.sh		debug_sparkinfer.sh
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
prompts.txt		prompts.txt
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh
run_server.sh		run_server.sh
show.sh		show.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkInfer

Demo Video

Abstract

Models Weights

Supported Models

Download Models

Getting Started with SparkInfer

Compilation Instructions

Running a Demo

Evaluation

Paper and Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ganminghao/SparkInfer

Folders and files

Latest commit

History

Repository files navigation

SparkInfer

Demo Video

Abstract

Models Weights

Supported Models

Download Models

Getting Started with SparkInfer

Compilation Instructions

Running a Demo

Evaluation

Paper and Citation

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages