This repository contains a curated collection of end-to-end Machine Learning and Data Engineering projects that demonstrate my experience in building scalable, reproducible, and production-style ML systems.
The projects emphasize:
- Practical machine learning algorithms
- Data preprocessing and feature engineering
- Model evaluation using appropriate metrics
- Clean software engineering practices
- Real-world datasets and problem statements
Each project is organized as an independent, runnable module with its own documentation and scripts.
Folder: personalized-recsys/
A Netflix-style hybrid recommendation system that combines:
- Collaborative Filtering (Matrix Factorization using SVD)
- Content-Based Filtering (TF-IDF on movie metadata)
- Hybrid ranking with cold-start handling
Key Highlights
- End-to-end ML pipeline (data ingestion → training → evaluation)
- Offline ranking metrics: Precision@K, Recall@K, NDCG@K, MAP@K
- Modular, production-style Python codebase
- Demo script to inspect real recommendations
- Built using MovieLens 1M dataset
➡️ See personalized-recsys/README.md for full details.
- Languages: Python
- ML & Data: NumPy, Pandas, Scikit-learn
- Evaluation: Ranking-based metrics
- Engineering: Modular code, configuration-driven pipelines, Git
- Datasets: MovieLens (GroupLens)
This repository serves as:
- A technical portfolio for Machine Learning / Data Engineering internships
- A demonstration of problem-solving and system design
- A foundation for experimenting with scalable ML systems
Future projects will extend into:
- Distributed systems (Spark, Kafka)
- Graph-based ML (Neo4j)
- Advanced ML models and pipelines
Uttam
Master’s Student – Data Science (Computing and Decision Analytics)
Actively seeking Machine Learning Engineer and Data Engineering Intern roles.
- Large datasets and trained models are excluded from version control.
- Each project is self-contained and reproducible.