Simple c++ api for LLama.Cpp. Which has way less breaking changes to the API.
- Plain C/C++ implementation with minimal dependencies (Only LLama.cpp (but it is included))
- Runs on ARM as well (Tested on RPI 3,4 and 5)
- Written with performance in mind, still supports many backends
This library offers support for:
- Running GGUF formatted large language models such as Phi-2, LLama, e.g.
This is some example code for running basic inference:
#include <simplellama.hpp>
...
/* Create new model params struct with for the model settings */
simplellama_model_params model_params;
model_params.model_llama = "phi-2.Q5_0.gguf"; /* Model name, downloaded automatically by cmake */
/* Make a new instance of SimpleLLama */
SimpleLLama sl(model_params);
/* Initialize the model runtime */
sl.init();
/* Some example questions that we want to get answered :) */
std::string question = "What came first, the egg or the chicken?";
/* Let the LLM answer */
std::cout << question << '\n';
std::string response = "response:" + sl.do_inference(question);
std::cout << response << '\n'; output:
What came first, the egg or the chicken?
response: The egg
For a full example, see the example code in example/simple_text_demo/simple_text_demo.cpp.
Before using this library you will need the following packages installed:
- Working C++ compiler (GCC, Clang, MSVC (2017 or Higher))
- CMake
- Ninja (Optional, but preferred)
- Clone this repo
- Run:
cmake . -B build -G Ninja- Let CMake generate and run:
cd build && ninja- After building you can run (linux & mac):
./simple_text_demoor (if using windows)
simple_text_demo.exeDepending on your hardware configuration different software backends might leverage your hardware compute power better. So check this out! As it might make the difference between 2 tok/s and 40 tok/s.
These backends are available:
| Backend | Best suited for: | CMake? |
|---|---|---|
| Vulkan | Generic Intel, AMD & NVidia gpu's | Add -DGGML_GGUF to configure command or add set(GGML_GGUF ON) to your CMakeLists before importing project |
| Cuda | NVidia GPU | Add -DGGML_CUDA to configure command or add set(GGML_CUDA ON) to your CMakeLists before importing project |
| BLIS | All | Add -DGGML_BLIS to configure command or add set(GGML_BLIS ON) to your CMakeLists before importing project |
| BLAS | All | Add -DGGML_BLAS to configure command or add set(GGML_BLAS ON) to your CMakeLists before importing project |
| SYCL | Intel(>12th gen core) & NVidia GPU | Add -DGGML_SYCL to configure command or add set(GGML_SYCL ON) to your CMakeLists before importing project |
| HIP | AMD GPU | Add -DGGML_HIP to configure command or add set(GGML_HIP ON) to your CMakeLists before importing project |
Add this to your top-level CMakeLists file:
include(FetchContent)
FetchContent_Declare(
SimpleLLama
GIT_REPOSITORY https://github.com/HCL_Hbot/SimpleLLama
GIT_TAG main
SOURCE_DIR ${CMAKE_CURRENT_LIST_DIR}/lib/SimpleLLama
)
FetchContent_MakeAvailable(SimpleLLama)
...
target_link_libraries(YOUR_EXECUTABLE simplellama)Or manually clone this repo and add the library to your project using:
add_subdirectory(simplellama)
...
target_link_libraries(YOUR_EXECUTABLE simplellama)See our wiki...
- Make it ovos compatible
- Add compatibility for ARM SOC
- Add wiki
- Windows compatibility, sigh
- Add tests
- Add CI workflows
This work is licensed under the MIT License.