Skip to content

TolgaTD/darkdata-hunter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ DarkData Hunter

Zero-Trust Sustainability Intelligence Platform

Hackathon Winner Candidate πŸ† | Doğuş Teknoloji Green Intelligence

DarkData Hunter is a privacy-first, AI-powered audit tool designed to identify, classify, and eliminate "Dark Data" (Redundant, Obsolete, Trivial files) within corporate environments. By reducing digital waste, it directly lowers carbon emissions and cloud storage costs.

image image

🌟 Key Features

  • πŸ”’ Zero-Trust Architecture: All analysis is performed on-premise using local LLMs (Ollama). No data ever leaves your secure network.
  • πŸ›‘οΈ PII Masking Engine: Automatically detects and redacts personal information (KVKK/GDPR) before processing.
  • 🧠 AI Analyst: Ask questions to your data naturally (e.g., "What is the biggest source of waste?").
  • πŸ•ΈοΈ Duplicate Network: Visualizes spread of duplicate files using a network graph.
  • πŸ“‰ Future Projection: Simulates cost and carbon savings over 5 years.
  • 🏭 Enterprise Ready: Integrated Authentication, Role-based Access, and Audit Logging (SQLite).

πŸš€ Installation

Prerequisites

  1. Python 3.10+
  2. Ollama: Must be installed and running.
    • Pull a model: ollama pull llama3.1:8b (or gemma2)
  3. Graphviz: Required for network visualization. Download here.

Setup (Quick Start)

# 1. Clone the repository
git clone https://github.com/yourusername/darkdata-hunter.git
cd darkdata-hunter

# 2. Install Dependencies
pip install -r requirements.txt

# 3. Running Local AI (in separate terminal)
ollama serve

# 4. Launch the Platform
python -m streamlit run app.py

πŸ“– Usage Guide

  1. Register/Login: Create a secure admin account on the first launch.
  2. Configure: In the sidebar, select the Target Directory to scan.
  3. Audit: Click "Start Audit Scan". The system will index files, check for duplicates, and analyze content usefulness using AI.
  4. Analyze & Act:
    • Review the Green Score and ROI Metrics.
    • Use the Duplicate Network tab to find file clusters.
    • Auto-Archive redundant files with a single click.
    • Download the official PDF Green Audit Certificate.

πŸ› οΈ Tech Stack

  • Frontend: Streamlit (Custom Glassmorphism UI)
  • AI Core: Ollama (Llama 3.1 / Gemma 2)
  • Backend: Python, Pandas, NetworkX
  • Database: SQLite
  • Visualization: Plotly, Graphviz

🌍 Impact

Every Gigabyte of dark data stored consumes energy 24/7. DarkData Hunter empowers organizations to turn off the lights on digital waste.


Developed for Doğuş Teknoloji Hackathon 2026.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages