A compact and standardized dataset examining how lifestyle, environmental, and genetic factors influence five common cancer types. Contains 2,000 individual records Γ 21 features, fully numerical and ready for EDA, dashboarding, and multiclass ML tasks.
Explore risk distributions and correlations across lifestyle factors Build visual dashboards for population-level cancer risk Train multiclass models on Cancer_Type with balanced evaluation (macro-F1, accuracy) Practice class imbalance handling and interpretability
Primary target: Cancer_Type β {Lung, Breast, Colon, Prostate, Skin} β Ideal for multiclass classification (use macro-F1, accuracy, confusion matrix).
Optional target:
Risk_Level β {Low, Medium, High} β Derived from Overall_Risk_Score thresholds: Low < 0.35β|β0.35β0.65 = Mediumβ|β> 0.65 = High
Categorical Features:
Analyzed variables like Cancer_Type, Risk_Level, Gender, H_Pylori_Infection, and BRCA_Mutation using bar charts.
Most patients fall into Medium or High risk groups.
H_Pylori_Infection and BRCA_Mutation strongly link to specific cancer types.
Chi-Square Test:
Confirmed strong associations (p < 0.05) between Cancer_Type,
H_Pylori_Infection, and BRCA_Mutation with overall risk level.
Numerical Features:
Variables: Age, BMI, Smoking, Alcohol_Use, Air_Pollution, Physical_Activity, etc.
High-risk patients β higher BMI, Smoking, and Air Pollution.
Low-risk patients β better Physical Activity and Diet Quality.
Correlation & Feature Importance:
Top correlated features with overall risk:
1οΈβ£ Smokingβ2οΈβ£ Air Pollutionβ3οΈβ£ BMIβ4οΈβ£ Age
Data Preparation:
β Dropped non-predictive IDs
β One-hot encoded categorical variables
β Train-test split (80/20)
π€ Machine Learning Models
Random Forest Classifier:
Accuracy: 96%
Top Predictors: Overall_Risk_Score, Smoking, Air_Pollution, BMI, Age
Logistic Regression:
Accuracy: 85%
β Random Forest outperformed across all metrics (F1 β 0.95).
Environmental and lifestyle factors dominate cancer risk.
Smoking, Air Pollution, and Obesity are top contributors.
Exercise and Healthy Diet act as strong protective factors.