AStarBot is a backend-only Retrieval-Augmented Generation (RAG) system designed to power a personal AI assistant for Avrodeep Pal. The system answers questions about education, projects, skills, and interests using a curated knowledge base and modern large language models, while maintaining strict control over accuracy, tone, and scope.
This project prioritizes engineering clarity, determinism, and maintainability over autonomous or agent-based behavior.
Personal portfolio chatbots often suffer from one or more of the following issues:
- hallucinated or exaggerated information
- poor grounding in verified data
- excessive framework abstraction (agents, tools, orchestration layers)
- fragile session memory implementations
- unclear separation between data, prompts, and model logic
AStarBot was built to address these issues by following a few core principles:
- All factual answers must come from retrieved data
- Memory should be explicit, bounded, and inspectable
- The backend should remain stateless
- Complexity should be earned, not assumed
AStarBot is a classic RAG pipeline with a deliberately minimal surface area.
At a high level, the system performs the following steps for every query:
-
Embed the user query
-
Retrieve the top-k most relevant knowledge entries
-
Assemble a constrained prompt using:
- system identity
- hard safety rules
- retrieved context
- summarized conversation memory
-
Invoke an LLM to generate a response
-
Optionally update the conversation summary
-
Return the answer and updated summary to the client
The backend itself does not store sessions or user state.
Client (CLI / Frontend)
|
| question, recent_messages, summary
v
FastAPI (/chat)
|
v
RAG Engine
├── Retriever (Pinecone, k=3)
├── Prompt Builder
├── Primary LLM
├── Fallback LLM
└── Summarization LLM (optional)
|
v
Response + Updated Summary
Rather than using Redis or server-side session storage, AStarBot uses stateless summarized memory:
-
The client sends:
- recent conversation turns (bounded window)
- a short summary of earlier conversation
-
When the window grows beyond a threshold, the backend:
- generates a new summary using a lightweight LLM
- returns the updated summary to the client
-
The client stores and sends this summary on the next request
This approach has several advantages:
- no external memory infrastructure
- no session coupling to a specific server instance
- easy horizontal scaling
- explicit, inspectable memory state
All knowledge is stored in JSON files under /data.
Each entry contains:
- a semantic ID (stable, human-readable)
- text content
- descriptive tags
Example:
{
"id": "project-002",
"text": "Built a stock price prediction model using LSTM networks...",
"tags": ["project", "machine-learning", "lstm"]
}Design choices:
- Semantic IDs are preferred over UUIDs for debuggability
- No chunking is used; each JSON entry maps to one vector
- The dataset is assumed to be curated and relatively small
- Embeddings are generated using
sentence-transformers/all-MiniLM-L6-v2 - Vector dimensionality: 384
- Similarity metric: cosine
- Storage: Pinecone
The ingestion script (scripts/embed.py) performs a full rebuild:
- deletes all existing vectors in the namespace
- re-embeds all JSON entries
- upserts them with metadata
This makes the process idempotent and avoids stale or duplicated data.
- Top-k retrieval (
k = 3) - No metadata filtering at query time
- Metadata is preserved for future extensions (e.g., scoring, tone control)
The system intentionally avoids:
- hybrid search
- reranking models
- agent-driven retrieval
The goal is predictable, explainable retrieval, not maximum recall at all costs.
Prompting is handled manually (string-based), not via high-level abstractions.
The prompt is composed of several conceptual layers:
-
System identity
- defines who AStarBot is
- constrains scope to portfolio-related topics
-
Hard RAG rules
- answer only from provided context
- no guessing or external knowledge
-
Safety and refusal rules
- polite refusal for unsupported or personal questions
- redirection to allowed topics
-
Style and tone guidelines
- professional but approachable
- concise, non-robotic responses
- length constraints
-
Conversation summary (if present)
-
Retrieved context
-
User question
This structure makes it easy to reason about why the model answered in a certain way.
The system deliberately uses different models for different tasks:
-
Primary LLM
- used for answering user questions
- chosen for reasoning quality and availability
-
Fallback LLM
- used when the primary model hits rate limits or errors
- prioritizes speed and reliability
-
Summarization LLM
- used only for memory summarization
- deterministic, low temperature, low cost
This separation reduces coupling and improves robustness.
- All model calls use
.invoke() - No streaming
- No tool calls
- No agent loops
This keeps execution linear and debuggable.
POST /chat– main interaction endpointGET /health– health check
The API is intentionally minimal.
{
"question": "string",
"recent_messages": [
{ "role": "user", "content": "..." },
{ "role": "assistant", "content": "..." }
],
"summary": "string | null"
}{
"answer": "string",
"updated_summary": "string | null"
}This contract makes frontend integration straightforward and explicit.
AStarBot includes a CLI (cli.py) for local testing.
The CLI:
- simulates a client-managed memory window
- updates and stores summaries
- prints answers directly to stdout
It is useful for:
- prompt tuning
- retrieval debugging
- sanity-checking summaries
- testing fallback behavior
- Python 3.12
- Slim base image
- Single container
- No sidecar services
- Stateless
- Single worker by default
- Horizontal scaling supported via platform (e.g. Koyeb)
AStarBot intentionally does not include:
- autonomous agents
- tool calling
- web browsing
- long-term personalization
- user authentication
- fine-tuned models
These were excluded to maintain clarity, correctness, and maintainability.
From an engineering perspective, AStarBot demonstrates:
- practical RAG system design
- disciplined prompt engineering
- stateless memory via summarization
- controlled fallback strategies
- production-ready API design
- clear separation of concerns
It is designed to be understandable, extensible, and defensible.
MIT License.