- Clone the repository:
https://github.com/stadiello/article_extractor
- Navigate to the project directory:
cd article_extractor - Install dependencies using
pyproject.toml:pip install . - (BIS) Install dependencies using
poetry:poetry install
Launch the app :
streamlit run src/extractor/main.pyConnect at the url and follow the instructions.
You just have to add your new question on a new line in the questions.txt in the folder data.
- Ollama (in
bot.py) with the modeldeepseek-r1:8b - Selenium for web scraping
- Intel Core i5/AMD Ryzen 5 (minimum 4 cores)
- 2.5 GHz or higher
- Minimum recommended: 16 GB
- SSD with at least 20 GB of free space (for models and dependencies)
- Not mandatory but recommended
- If using a GPU: NVIDIA with at least 4 GB VRAM
- Without a GPU: The project will work but may run slower
The project can run without a GPU because:
- Ollama can execute on a CPU
- Streamlit and Selenium do not require a GPU However, using a GPU will significantly improve performance.
- macOS
- Linux
- Windows (WSL recommended)
For questions or support, please contact the development team at tadiello.sebastien@gmail.com.