A minimal demo app that combines a chat UI with structured memory extraction: an LLM parses user messages for persistent facts and stores them so the support agent can use them in later replies.
-
Install dependencies
pip install -r requirements.txt
-
Set your Groq API key (required for chat and memory extraction)
set GROQ_API_KEY=your_key_hereOn macOS/Linux:
export GROQ_API_KEY=your_key_hereGet a key at console.groq.com. -
Run the app
uvicorn app:app --reload
-
Open in browser: http://127.0.0.1:8000/
-
Try it
- Send messages (e.g. “My order #123 is delayed” or “I prefer refunds over replacements”).
- Watch the Agent Memory panel fill with extracted facts.
- Keep chatting; the agent uses those facts in its replies.
- Use Reset to clear the current session’s chat and memory and start over.
- Backend: FastAPI serves the UI, static assets, and JSON APIs. SQLite (via SQLAlchemy) stores messages and memory per session.
- Chat flow: Each user message is saved, then an LLM extracts fact strings (JSON list), which are stored in the
memorytable. Another LLM call generates the reply using the last 5 messages plus all memory facts; the assistant message is saved and returned with the updated memory list. - Frontend: Single-page chat plus an “Agent Memory” sidebar. Session ID is stored in
localStorageso memory persists across browser restarts. Reset calls/resetand clears the in-page chat and memory display.
| Layer | Role |
|---|---|
app.py |
Routes, DB init, orchestration |
models.py |
SQLAlchemy models and DB engine |
utils.py |
Message/memory persistence, LLM calls (Groq) |
templates/ + static/ |
Chat UI and behavior |
After every user message, the app calls the LLM with a dedicated prompt:
- Prompt: “Extract any persistent user facts from this message. Return only a JSON list of fact strings.”
- Output: e.g.
["User has a delayed order", "User prefers refunds"] - Storage: Each string is stored as a row in the
memorytable for that session. - Usage: When generating the next reply, the agent receives a “Memory” section listing these facts and is instructed to use them when relevant.
So the agent gains a simple, explicit memory layer instead of relying only on the last few messages.