Woopenbaar

Woopenbaar is a comprehensive Retrieval-Augmented Generation (RAG) system that combines document processing, semantic search, and large language models to provide context-aware responses to user queries. The system features both standard and adaptive RAG approaches with a modern web interface.

Components

Core System

Backend: Python application with Flask API, retrieval engine, and LLM integration
Frontend: Next.js web interface with chat and document visualization

Data Processing Components

Scraper: Web scraping tools for collecting documents from various sources
Standard Data Format: System for converting documents to a standardized markdown format
Trained Models: Pre-trained models for embeddings, chunk reranking, and safety checks

Features

Backend

Document Processing: Index and chunk documents for efficient retrieval
Multi-Source Support: Organize and search across different data sources
Semantic Search: Find content based on meaning, not just keywords
Multiple LLM Integrations: Support for local models via vLLM, Google's Gemini, and DeepSeek
Adaptive Reasoning: Multi-step retrieval with progressive refinement for complex queries
Safety Filters: Content validation and filtering

Frontend

Intuitive Chat Interface: User-friendly chat UI with conversation history
Adaptive Mode: View reasoning steps in real-time as the system processes complex queries
Document Explorer: View and navigate through source documents and chunks
Advanced Filtering: Filter by document type, time period

Prerequisites

Backend:
- Python 3.9+
- CUDA-compatible GPU(s) if you run LLM locally
- Python package manager that supports pyproject.toml (pip, uv, poetry, etc.)
Frontend:
- Node.js 18+
- npm or yarn

Installation

1. Backend Setup

Create a virtual environment and install dependencies using pyproject.toml:

# Enter project directory
cd woopenbaar

# Set up virtual environment
cd backend
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies from pyproject.toml
# Using pip:
pip install -e .

# Or if you prefer uv:
# uv pip install -e .

# Or if you prefer poetry:
# poetry install

2. Frontend Setup

Install Node.js dependencies:

cd frontend
npm install

Running the Application

You'll need to run both the backend and frontend servers in separate terminals.

1. Start the Backend Server

cd backend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python app.py

The Flask API server will by default start on http://localhost:5000

2. Start the Frontend Server

cd frontend
npm run dev

The Next.js development server will start on http://localhost:3000

Configuration

Backend Configuration

Configure the backend through:

Environment variables for sensitive settings (API keys, etc.)
Settings in backend/conf/config.py
LLM prompts in `backend/conf/prompts.py'

Key configuration options:

Language model selection (LOCAL, GEMINI, DEEPSEEK)
Model parameters (temperature, max tokens, etc.)
Retrieval parameters (chunk size, document pool size, etc.)

Frontend Configuration

Create a .env.local file in the frontend directory:

NEXT_PUBLIC_API_URL=http://localhost:5000

Development

Backend Development

cd backend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python app.py

For detailed backend development instructions, see Backend README.

Frontend Development

cd frontend
npm run dev

This starts the development server with hot-reloading enabled.

For detailed frontend development instructions, see Frontend README.

Project Structure

raggle/
├── backend/                # Backend Python application
│   ├── app.py              # Application entry point
│   ├── conf/               # Configuration and prompts
│   ├── pyproject.toml      # Python project metadata and dependencies
│   ├── src/                # Source code
│   │   ├── api/            # API endpoints
│   │   ├── data_classes/   # Data models
│   │   └── services/       # Business logic
│   └── tests/              # Backend tests
├── frontend/               # Frontend Next.js application
│   ├── public/             # Static assets
│   ├── src/                # Source code
│   │   ├── components/     # UI components
│   │   ├── lib/            # Utility libraries
│   │   ├── pages/          # Next.js pages
│   │   ├── services/       # API clients
│   │   └── types/          # TypeScript definitions
│   └── tests/              # Frontend tests
├── scraper/                # Document collection scripts
│   ├── scrape_vws.py       # VWS document scraper
│   ├── scrape_documents.py # General document scraper
│   └── website_specific.py # Website-specific scraping logic
├── standard_data_format/   # Document standardization system
│   ├── config/             # Configuration files
│   ├── scripts/            # Automation scripts
│   ├── src/                # Processing pipeline
│   └── utils/              # Utility functions
├── trained_models/         # Pre-trained ML models
│   ├── embedders/          # Text embedding models
│   ├── chunk_rerankers/    # Document reranker models
│   └── safety/             # Content safety models
└── data/                   # Shared data directory
└── evaluation/             # Evaluation module

Data Collection and Processing

Document Collection - Use the scraper to gather documents:

cd scraper
python scrape_vws.py  # For VWS documents
python scrape_documents.py  # For other document sources

Document Standardization - Process collected documents:

cd standard_data_format
# First divide metadata into chunks for processing
./scripts/divide_metadata.sh
# Then process with chosen configuration
./scripts/run_gemini_worker.sh  # For cloud-based processing
# or
./scripts/run_local_worker.sh  # For local processing

For more details on data processing, see:

Tests

Backend Tests

cd backend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pytest

Contributors

This repository was created by Team Epoch V, based in the Dream Hall of the Delft University of Technology.

Kenzo Heijman, Marcin Jarosz, Felipe Bononi Bello, Laura Kaczmarzyk, Maxim van Emmerik

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
backend		backend
evaluation		evaluation
frontend		frontend
modal		modal
scraper		scraper
standard_data_format		standard_data_format
trained_models		trained_models
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
chat_history.json		chat_history.json
frontend_tunnel.sh		frontend_tunnel.sh
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
run_tests.sh		run_tests.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Woopenbaar

Components

Core System

Data Processing Components

Features

Backend

Frontend

Prerequisites

Installation

1. Backend Setup

2. Frontend Setup

Running the Application

1. Start the Backend Server

2. Start the Frontend Server

Configuration

Backend Configuration

Frontend Configuration

Development

Backend Development

Frontend Development

Project Structure

Data Collection and Processing

Tests

Backend Tests

Contributors

About

Uh oh!

Releases

Packages

Languages

License

TeamEpochGithub/Woopenbaar

Folders and files

Latest commit

History

Repository files navigation

Woopenbaar

Components

Core System

Data Processing Components

Features

Backend

Frontend

Prerequisites

Installation

1. Backend Setup

2. Frontend Setup

Running the Application

1. Start the Backend Server

2. Start the Frontend Server

Configuration

Backend Configuration

Frontend Configuration

Development

Backend Development

Frontend Development

Project Structure

Data Collection and Processing

Tests

Backend Tests

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages