YuiAI is a fully local chat + TTS system focused on low-latency, offline usage. It is designed for small, fast language models and a simple, transparent memory system.
No cloud APIs.
No tracking.
No accounts.
YuiAI consists of three independent parts:
- FastAPI Server
- Orchestrates chat, memory and TTS
- LLM Backend
llama.cpprunning as a local HTTP server
- TTS Backend
- FishSpeech running as a separate local service
All components communicate via HTTP on localhost.
- Linux (x86_64)
- Python 3.11
- CMake + GCC/Clang (for llama.cpp)
- 8–16 GB RAM recommended
- RTX 3060 (12GB) or compare Nvidia graphics cards with at least 12GB VRAM for FishSpeech TTS
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtgit clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DLLAMA_BLAS=ON \
-DLLAMA_BLAS_VENDOR=OpenBLAS
cmake --build . -j$(nproc)The binaries will be located in:
llama.cpp/build/bin/
Example:
./llama-server \
-m /path/to/model.gguf \
-t 16 \
-c 4096 \
--host 127.0.0.1 \
--port 8081YuiAI expects the endpoint:
http://127.0.0.1:8081/completion
- Qwen2.5-1.5B-Instruct (Q4_K_M)
- Qwen2.5-3B-Instruct (Q4_K_M)
- LLaMA 3.2 3B Instruct (Q4)
Small models are intentional to keep latency low and responses short.
YuiAI does not run TTS internally. It connects to an external FishSpeech server via HTTP.
FishSpeech is developed here:
https://github.com/fishaudio/fish-speech https://speech.fish.audio/
Please follow the setup instructions in the FishSpeech repository.
YuiAI expects a TTS endpoint at:
http://127.0.0.1:8080/v1/tts
- Input: text + reference voice
- Output: WAV audio
- Blocking request (non-streaming)
The integration is implemented in:
modules/fishspeech_client.py
.
├─ server.py
├─ modules/
│ ├─ llama_client.py # llama.cpp HTTP client
│ ├─ fishspeech_client.py # FishSpeech HTTP client
│ └─ memory.py # local JSON-based memory
├─ webui/
│ └─ index.html
└─ data/
└─ memory.json
└─ Voicefiles/
└─ Ref_Nao.wav # reference voice sample (not included by default)
YuiAI uses a fixed reference voice sample for FishSpeech cloning. The server expects the reference WAV at: voicefiles/Ref_Nao.wav
mkdir voicefilesAnd a matching reference transcript inside modules/fishspeech_client.py:
REF_WAV = Path("voicefiles/Ref_Nao.wav")REF_TEXT = "<your reference text>"
- Format: WAV (PCM)
- Sample rate: 22.05 kHz or 24 kHz (whatever your FishSpeech setup expects)
- Mono recommended
- Length: ~3–10 seconds clean voice, minimal noise
Do not commit personal voice samples.
Add this to .gitignore:
Make sure both services are running:
- llama.cpp server
- FishSpeech server
Then start YuiAI:
python server.pyThe web UI is available at:
http://localhost:8000
POST /chat
Body: { "text": "your message" }
POST /tts
Body: { "text": "text to speak" }
GET /memory/list
GET /memory/reset
- All inference is local
- llama.cpp handles LLM execution
- YuiAI only manages prompt structure, memory and routing
- Memory is intentionally simple and transparent
- Optimized for short, natural, conversational replies