Student Stress Prediction — End-to-End ML Project

Links
Slides: → Student Stress Prediction
Streamlit app: → Student Stress Prediction— Model Explorer (Dropdown Inputs)

Overview

This project predicts a student’s stress level (multi-class target: 0, 1, 2) from ~20 numeric survey features spanning psychological, physiological, environmental, academic, and social factors.
It includes reproducible EDA, leak-free ML pipelines, model comparisons, and a Streamlit app for interactive scoring and evaluation.
The goal is a full analytics & ML workflow that’s easy for data-analytics students to follow and extend.

What You’ll Find

Clean EDA & Prep: distributions, outliers, correlations, multicollinearity checks (incl. VIF), Mutual Information, and a brief PCA summary.
Modeling: consistent pipelines (shared preprocessing), multiple classifiers, optional resampling (SMOTE), and feature importance/coefficients helpers.
Comparisons: accuracy + macro precision/recall/F1, classification reports, confusion matrices, and a clear Top-2 model selection.
Deployment: export of fitted pipelines and a Streamlit app for single/batch predictions and on-app evaluation (pattern inspired by a prior “fraud scoring” project template).

Data Snapshot

From the EDA:

Rows: ~1,100
Features: ~21 (all numeric)
Missing values: none
Duplicates: none
Target distribution (stress_level): {0: 373, 1: 358, 2: 369} → roughly balanced, so accuracy is a fair primary metric.

Note: The dataset is for learning; values reflect typical student self-reports (not clinical labels).

Repo Structure

final_project/
├── data/
│ ├── raw/                          # original files
│ └── clean/                        # cleaned/processed artifacts
├── figures/                        # saved figures from EDA
├── notebooks/
│ ├── _main_dataset_analysis.ipynb  # EDA & preparation
│ └── _main_model_training.ipynb    # model training, evaluation, export
├── lib/
│ └── functions.py                  # helpers (run_models_with_importances, importance tables, plotting)
├── my_streamlit_app/
│ ├── app.py                        # Streamlit app (single/batch predict + evaluation)
│ ├── pages/01_Compare_Models.py    # optional: side-by-side model comparison page
│ └── models/
│ ├── *.pkl                         # exported fitted pipelines (incl. tuned “(best)” variants)
│ ├── feature_names.pkl             # full training schema (ALL features)
│ ├── test_set.csv                  # held-out test set for on-app evaluation
│ └── metrics.json                  # quick accuracy per model
├── README_StudentStress_DA_UPDATED.md          # detailed EDA notes & findings
├── README_model_training_results_UPDATED.md    # modeling results & selection details
└── README.md                                   # this file

Environment & Installation (uv)

This project targets Python 3.13 (see requires-python = ">=3.13"). If your system default is lower, create the venv with an explicit 3.13 interpreter.

Check you have uv

uv --version
Create & activate a virtual environment

Windows (PowerShell)

uv venv .venv --python 3.13 .\.venv\Scripts\Activate.ps1

Install dependencies (choose one profile)

A) Production (app + inference only)
- Installs the minimal set to run the Streamlit app and unpickle pipelines:
  
  uv pip install -r requirements-dev.txt
Alternative: if you prefer installing from pyproject.toml, run:

uv sync
- This will install everything declared there (includes Jupyter and ipykernel). Use this only if you want notebooks in the same env.
Optional system dependency

Graphviz system binary (for certain visualizations) may be required separately from graphviz Python bindings depending on your OS.

End-to-End Workflow

EDA & Preparation (notebooks/_main_dataset_analysis.ipynb)
- Univariate distributions (skew, mean vs median)
- Bivariate feature vs stress (box/violin, group stats)
- Correlation (feature–feature & feature–target) + multicollinearity scanning
- Outliers (IQR / z-score)
- Mutual Information (non-linear relevance)
- PCA for variance explanation (optional dimensionality reduction)
  All steps include concise what/why/how notes for teaching clarity.
Modeling (notebooks/_main_model_training.ipynb)
- Leak-free pipelines with shared preprocessing evaluated across:
  KNN, Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, AdaBoost, and LogReg + SMOTE.
- Unified evaluation (accuracy, classification report, confusion matrix) and feature importance/coefficients where available.
Export & Deploy
- A final cell saves each fitted pipeline (*.pkl), the training feature schema (feature_names.pkl), the test set CSV, and a metrics JSON.
- The Streamlit app loads these artifacts for single/batch prediction and on-app evaluation.

Results (short version)

Test Accuracy (top models):

Random Forest → best overall balance (Acc. 0.8955, Macro F1 0.8953)
Logistic Reg + RUS → Acc. 0.8909, simplest to explain, good class-0 recall
Logistic Reg + SMOTE → Acc. 0.8909, best recall on stressed classes (1, 2)
Gradient Boosting + hiperparam → Acc. 0.8955
Confusion matrices: mostly diagonal, only 1 ↔ 2 mix
Top features consistent across models

What the top-features say (recurring signals):
High association with stress appeared for: blood_pressure, sleep_quality, social_support, anxiety_level, depression, self_esteem, bullying, academic_performance, study_load, future_career_concerns.
(Direction depends on feature; e.g., self-esteem often decreases as stress increases.)

Why these two?
They showed the best overall balance across classes (macro metrics) with strong recall on stressed classes while maintaining solid precision.

How to Reproduce

Environment: Python 3.10+ · pandas · numpy · scikit-learn · imbalanced-learn · matplotlib · joblib · streamlit

Steps

Run EDA: open _main_dataset_analysis.ipynb, execute all cells, check that the snapshot matches the counts above.
Train models: open _main_model_training.ipynb, run the training & evaluation cells (e.g., run_models_with_importances(...)) and review comparisons.
Export for Streamlit: run the final “Streamlit Export Setup (Stress)” cell → writes *.pkl, feature_names.pkl, test_set.csv, and metrics.json into my_streamlit_app/models/.

Launch the app (local):

cd my_streamlit_app
pip install -r requirements.txt
streamlit run app.py

Run the export after training so all fitted pipelines (e.g., AdaBoost) are in memory to be saved.

Streamlit App

Local usage

cd my_streamlit_app
pip install -r requirements.txt
streamlit run app.py

Authors:

Luis Pablo Aiello — Data Analytics Student (Cohort Sep-2025)

License

Educational use within the bootcamp cohort; dataset is survey-based and used for learning purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Stress Prediction — End-to-End ML Project

Overview

What You’ll Find

Data Snapshot

Repo Structure

Environment & Installation (uv)

End-to-End Workflow

Results (short version)

How to Reproduce

Streamlit App

Authors:

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
figures		figures
lib		lib
my_streamlit_app		my_streamlit_app
notebooks		notebooks
slides		slides
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
README_StudentStress_DA_UPDATED.md		README_StudentStress_DA_UPDATED.md
README_model_training_results_UPDATED.md		README_model_training_results_UPDATED.md
config.yaml		config.yaml
main.py		main.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

luispabloaiello-da/final_project

Folders and files

Latest commit

History

Repository files navigation

Student Stress Prediction — End-to-End ML Project

Overview

What You’ll Find

Data Snapshot

Repo Structure

Environment & Installation (uv)

End-to-End Workflow

Results (short version)

How to Reproduce

Streamlit App

Authors:

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages