RagFeed is a feed reader powered by RAG and LLM models.
- Trending Topics: Provide an overview of the most relevant topics based the articles provided by the RSS sources added by the user.
- Ask Feed: Allow the user to ask for specific topics and get an orview of them and the most relevant related articles.
- Improve RSS parse algorithm.
- Source management flow.
- Allow to mark articles as Read/Undead.
- Allow to rate articles interest.
- Internal reader.
- Article recomendation algorithm.
- Presonalized feed.
- Improve app look and feel.
- ...
- The project expects an instance of ollama running at
http://localhost:11434/ - Model
llama3.1:8bavailable in ollama - Embedder
snowflake-arctic-embed2available in ollama - Python dependencies in
requirements.txtinstalled (recomended to use a virtual environment)
Note: Default ollama url, model and embedder used can be edited on settings.py
- Sources must be added directly on the database.
- The app must be used in Dark Mode.
- Settings page is a placeholder.
- Clone repository
- Enter in directory
cd RagFeed - Install depencies
pip install -r requirements.txt - Create SQLite Database
sqlite3 ./data/sqlite/ragfeed.db < ./db/ragfeed_schema.sql - Launch the cron:
python cronapp.py(this will run until stopped) - Run client application:
python -m streamlit run slapp.py - Access through the provided url
data: Is the default folder for the app to store the data.chroma: Directory used by ChromaDB to save vectors (empty in the repository).sqlite: Directry where SQLite database will be stored (empty in the repository).
db: Contains the SQL script with the database schema.docs: Contains documentation of the project, the Exploratory Data Analysis and Snapshots of the app.log: Default directory used by the app to store the logfile (empty in the repository).slapp: Contains the code for the different pages in the streamlit application.src: Contains the classes for the RAGFeed Backend including logic and different controllers.
Note: Data and log directories can be modified in settings.py
cronapp.py: Is the cron application in charge on updating Sources, Trending Topics and Rag Searches stored.RagFeed.py: Is the backend controller, this class is instanced by the clients to access the app fucntions.settings.py: Contain the settings to configure the project execution.slapp.py: Main file of the streamlit application.
The file settings.py contains some settings to modify RagFeed behaviour:
logger_path: Directiory where logger will store the log files for the application.logger_level: Log level to store in log file, accepted vlaues are the numeric described for logging library.10: DEBUG, logs inputs and outputs for majority of methods.20: INFO, logs the name of the methods that are being called.30: WARNING, logs minor errors.40: ERROR, logs major errors.
feeds_update_freq: Min number of hours between each update. The app will not request the Source for updates if the time since last update is bellow this threshold. The only exception is thecronapp.pythat will force an update on it's first iteration.vector_store_engine: Provided for the vector store. Currently onlychromais valid.chromadb_collection: Name for the ChormaDB collection.chromadb_path: Relative path for the ChromaDB persistent storage.database_engine: Database provider. Currently onlysqliteis valid.sqlite_path: Relative file path for the SQLite database (including file name).model_source: Provider for the GenAI capabilities. Currently onlyollamais valid.ollama_url: URL for the Ollama instance.ollama_llm: LLM model to be used by Ollamaollama_embeddings: Embedder model to be use by Ollama.
Currently RagFeed is implemented to be used with Ollama, SQLite and Chroma DB but the code has been done considering the posibility of implement alternative sources.
In case you want to use it with a different LLM, database or vectore store you can rewrite the required controller for the prvoder:
src/chromaVectorStore.py: Contains all the Vector Store required methods.src/SqliteDatabase.py: Contains all the database required methods.src/ollamaModel.py: Contains all the GenAI required methods.
Inheritance was not created for the MVP but could be something to implement later (based on de mentioned files) if more providers are implemented.
In case you change database provider consider you must also define the database creation script (db/ragfeed_schema.sql for SQLite)
Class initialization is done in the RagFeed.__init()__ method (RagFeed.py) based on the options defined in settings.py. This will also need to be updated if new modules are added.
This project born as final project of a Data Science & Machine Learning Bootcamp I attended in between on May and July 2025.
All the details about the project implementation can be found in the presentation located in docs/RagFeed.pdf.
There is a Jupyter Notebook with a brief Exploratory Data Analysis done on test data during the development in docs/ragfeed.ipynb.
The branch ironhack will keep a version of the code on project delivery date.




