Note: This repository serves as a technical showcase for a proprietary application. Due to NDA and IP restrictions, the source code is not public.
AutoCrisp is a low-code, agentic orchestration platform designed to democratize the data science lifecycle. It enables domain experts to execute end-to-end data mining workflows—from hypothesis generation to predictive modeling—without writing raw code.
AutoCrisp strictly adheres to the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. It utilizes a multi-agent architecture to enforce rigor in the Business Understanding and Data Preparation phases, preventing the common "garbage-in, garbage-out" failure mode of autonomous AI.
- Automated Domain Research: Leverages N8N workflows to perform meta-analyses and identify relevant external datasets before modeling begins.
- Semantic Data Exploration: Allows users to query complex datasets using natural language, translating intent into executable SQL/Pandas operations.
- Human-in-the-Loop (HITL) Governance: Critical code execution steps (cleaning, training) require user review, mitigating LLM hallucination risks.
- Stateful Memory: Agents retain context across the entire project lifecycle via encrypted cloud storage.
The system integrates a hybrid stack of deterministic workflows and agents:
- AG2 (AutoGen): The core agentic framework handling the "heavy lifting" of code generation, error correction, and iterative reasoning.
- N8N: Handles orchestration of deterministic tasks (web search, API connectors) during the Business Understanding phase.
- Streamlit: Provides the interactive frontend, rendering agent dialogues and data visualizations.
- Supabase: Serves as the backend for vector memory, dataset storage, and encrypted user authentication.
AutoCrisp maps specific AI agents to the phases of the industry-standard data mining lifecycle:
The system initiates with an Orchestration Agent that scopes the problem. It delegates tasks to research agents capable of literature review and dataset discovery.
Once data is ingested, specialized Autogen agents take over. These agents operate in a conversable loop to execute:
- Sanitization: Detecting missing values and anomalies.
- Feature Engineering: Proposing and creating new variables based on domain context.
- Training & Evaluation: Running scikit-learn/PyTorch models and interpreting confusion matrices for the user.
# Clone the repository
git clone https://github.com/jackvandervall/autocrisp.git
# Install dependencies (Recommend using a virtual environment)
pip install -r requirements.txt
# Configure Environment Variables
# Create a .env file containing your LLM provider keys and Supabase credentials
cp .env.example .env
# Launch the Application
streamlit run app.py
Developed by Jack van der Vall in collaboration with a leading European academic research center for data science and AI based in Rotterdam.

