Skip to content

The Automated Data Preprocessing Toolkit streamlines the data preprocessing stage in machine learning by automating tasks like handling missing values, encoding categorical features, and normalizing data. With a user-friendly interface for easy dataset uploads, it enhances data quality and improves model performance efficiently.

License

Notifications You must be signed in to change notification settings

Siddiha/AutoEDA-Automated-Data-Preprocessing-Toolkit

 
 

⚡ AutoEDA - Automated Data Preprocessing Toolkit

GitHub repo size GitHub contributors GitHub last commit Issues License GitHub Stars

⚡ Automate tedious data cleaning — focus more on insights, not pipelines.


📌 Table of Contents

🔍 Overview

AutoEDA is a lightweight yet powerful toolkit that streamlines data preprocessing for Exploratory Data Analysis (EDA) and Machine Learning.

It automates routine cleaning tasks such as missing value treatment, type correction, and feature engineering, helping data scientists and analysts unlock insights faster and with less friction.

✨ Key Features

  • ✅ Seamless CSV upload & schema validation
  • ✅ Null value imputation & type inference
  • ✅ Smart duplicate detection & cleanup
  • ✅ Feature extraction and transformation
  • ✅ REST API support for integration
  • ✅ Modern React + Vite frontend
  • ✅ Dockerized deployment for easy setup

🧱 Project Architecture

🧠 Backend (Python)

  • Modular design for cleaning, transforming, and preprocessing datasets
  • Built with FastAPI for high-speed async REST APIs
  • Easily extendable for custom ML workflows

🎨 Frontend (React + Vite)

  • Sleek UI for uploading, previewing, and processing datasets
  • Designed for responsiveness and ease-of-use
  • Optional sections for documentation, help, and dataset history

🐳 Docker Support

  • Docker & Docker Compose configuration for cross-platform deployment
  • One command to launch the full app stack

📦 Requirements

  • 🧑‍💻 Frontend: React.js, Vite
  • 🐍 Backend: Python 3.x, FastAPI, Pandas
  • 🐳 Containerization: Docker & Docker Compose

⚠️ Remember to configure your .gitignore and environment variables!

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/Nidhi-Satyapriya/AutoEDA-Automated-Data-Preprocessing-Toolkit
cd AutoEDA-Automated-Data-Preprocessing-Toolkit

2️⃣ Run the backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload

3️⃣ Launch the frontend

cd frontend
npm install
npm run dev

4️⃣ [Optional] Run everything with Docker

docker-compose up --build

🤝 How to Contribute

We 💖 community contributions! Here’s how you can make an impact:

🔧 Frontend

🧪 Model Pipeline

⚙️ Backend

📢 New to open source? Start here: CONTRIBUTING.md

✨ Looking for ideas? Explore Good First Issues

🛡 License

This project uses a Modified MIT License.

🔒 Please read the LICENSE file carefully before using or contributing.

📬 Contact

We'd love to hear from you!

Built with ❤️ by passionate developers — for the community, by the community.

✨ If you found this project useful...

Please consider ⭐ starring the repository and sharing it with your team or on social media.


"Keep pushing boundaries — even small steps can lead to powerful transformations. 🌱"

"Believe in the process, trust your curiosity, and let every dataset take you one step closer to mastery. 💡📊"

About

The Automated Data Preprocessing Toolkit streamlines the data preprocessing stage in machine learning by automating tasks like handling missing values, encoding categorical features, and normalizing data. With a user-friendly interface for easy dataset uploads, it enhances data quality and improves model performance efficiently.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 51.1%
  • JavaScript 36.9%
  • Python 8.8%
  • CSS 2.9%
  • HTML 0.3%