This application creates a Retrieval-Augmented Generation (RAG) system that allows users to ask questions about their custom-made diary. For demonstration, the system uses an arbitrarily created diary, with AI generated entries stored in a .csv file. The script for injecting this data appropriately into a PostgreSQL database with vector embeddings is provided, it enables context-aware responses to user queries.
- Python 3.8+
- PostgreSQL 12+ with pgvector extension
- OpenAI API key
- Clone this repository and change directory to its root folder:
git clone https://github.com/JoeCardoso13/diary_bot.git && cd diary_bot- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt-
Set up PostgreSQL:
- Install PostgreSQL if you haven't already
- Install the pgvector extension:
CREATE EXTENSION vector;
- Create a database named
diary_bot
-
Add your OpenAI API key into your environment, e.g. an
.envfile with the following:
OPENAI_API_KEY=your_api_key_here
DB_NAME=diary_bot
DB_USER=your_psql_user_name
DB_HOST=/var/run/postgresql_or_localhost
DB_PORT=5432
Before using the system, you need to process the diary entries by running the pre_processing.py script:
python src/pre_processing.pyThis will:
- Read the CSV file
- Generate embeddings for each entry
- Store the data in the PostgreSQL database
Run the Gradio interface for a user-friendly web experience:
python src/main_gradio_interface.pyThis will:
- Start a local web server
- Open the interface in your default browser
- Allow you to:
- Ask questions about my hike
- Get random question suggestions
- See responses with source attribution
Run the CLI version for a simpler interface:
python src/main_command_line.pyThis provides:
- A text-based interface
- Same functionality as the web version
- Responses in the terminal
diary_bot/
βββ data/
β βββ diary.csv # Blog entries data
βββ src/
β βββ pre_processing.py # Data processing and embedding generation
β βββ rag_tools.py # Shared RAG functionality
β βββ main_command_line.py # CLI interface
β βββ main_gradio_interface.py # Web interface
βββ requirements.txt # Python dependencies
βββ README.md
The diary table contains:
id: Unique identifierdate: Date of diary entryentry: Content of diary entryembedding: Vector embedding of the entry text


