An offline‑first e‑commerce assistant that pairs local vector search over JSON data with an LLM‑based router and response generator. It is designed for fast, deterministic retrieval while still producing natural responses, with optional speech input and speech output. For deeper architecture details, see ARCHITECTURE.md.
flowchart TD
User[User] --> IO[main.py IO]
User --> VoiceIn[Voice input STT utils stt]
VoiceIn --> IO
IO --> VoiceOut[Voice output TTS utils tts]
IO --> Agent[AGENT ChatGlem core chat_engine]
Agent --> Router[IntentClassifier core intent]
Router --> LLM[GlemEngine core glem]
LLM --> Agent
Agent --> Tools[KnowledgeBaseTools core tools]
Tools --> Retrieval[FAISS vector search]
Tools --> Actions[Cancel return actions]
Retrieval --> Data[JSON data and indexes]
Tools --> Agent
- Python 3: Primary runtime for the assistant and tools.
- Groq LLM API: Routing and response generation via
GlemEngine. - FAISS: Vector similarity search over local indexes.
- Sentence Transformers: Embeddings for catalog, FAQ, policy, and orders.
- JSON data stores: Product catalog, FAQs, policies, and orders.
- ElevenLabs (optional): Text-to-speech via
utils/tts.py.
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root or export variables in your shell.
Required:
API_KEYS: Comma‑separated Groq API keys used byGlemEngine.
Optional:
USE_STT:1,true, oryesto enable speech‑to‑text.USE_TTS:1,true, oryesto enable text‑to‑speech.TTS_VOICE_ID: ElevenLabs voice id.TTS_MODEL_ID: ElevenLabs model id. Defaulteleven_multilingual_v2inmain.py.TTS_OUTPUT_FORMAT: e.g.mp3_44100_128.TTS_RATE: Optional int for speech rate if your TTS backend supports it.ELEVENLABS_API_KEYorELEVENLABS_API_KEYS: Used byutils/tts.py.
The assistant expects FAISS indexes under data/indexes/.
python scripts/build_faiss_indexes.py --data-dir data --out-dir data/indexespython main.pymain.pywires the system prompt, tools, intent classifier, and agent.core/chat_engine.pyruns the loop, calls tools, and crafts the final response.core/intent.pyusesGlemEnginewith a JSON schema to decide tool vs chat routing.core/tools.pyhandles retrieval, order actions, and tool execution.utils/search_utils.pyprovides embeddings, FAISS index access, and query parsing.utils/stt.pyandutils/tts.pyprovide optional audio input/output.scripts/build_faiss_indexes.pybuilds the indexes from JSON data. See ARCHITECTURE.md for a full system diagram and component breakdown.
- Data lives in
data/and is treated as the source of truth. - Indexes in
data/indexes/are built before runtime. - A single customer context is active per run via
CUSTOMER_IDinmain.py. - The configured model string
openai/gpt-oss-20bis available in Groq. - The assistant only acts on orders belonging to the active customer id.
- No live data updates. You must rebuild indexes if JSON data changes.
- Order actions do not mutate the order database; they only write to
data/action_log.jsonl. - Tool calls and routing depend on the LLM; incorrect routing is possible.
- Token budgeting uses a rough heuristic in
build_sliding_window. - No streaming responses or concurrency controls.
- Audio features require a working local audio device and the related packages.
- Errors are surfaced as plain text; there is no structured error handling layer.
- Missing indexes: run
scripts/build_faiss_indexes.py. - API errors: confirm
API_KEYSis set and valid. - STT/TTS issues: disable with
USE_STT=0orUSE_TTS=0and verify dependencies.
