This project implements a California Housing Price Prediction model using Linear Regression and Random Forest Regressor. The dataset used is the California Housing Dataset from sklearn.datasets. The goal is to predict the median house prices based on various features such as income level, house age, and geographical location.
- Dataset Loading: The California Housing dataset is loaded using
fetch_california_housing(as_frame=True). - Feature Engineering:
β
Standardization of the dataset using
StandardScaler. β Splitting the data into training (80%) and testing (20%) sets. β Checking dataset structure and statistics.
- Model Training: β Uses LinearRegression() to fit the training data.
- Predictions: β Predictions are made on the test set.
- Evaluation Metrics: β Mean Absolute Error (MAE): Measures the average absolute differences between predicted and actual prices. β Mean Squared Error (MSE): Penalizes large errors more significantly.
- Model Training: β Uses RandomForestRegressor(n_estimators=100, random_state=42).
- Predictions: β Predictions are made using the trained Random Forest model.
- Evaluation Metrics: β MSE, MAE, and RΒ² score are calculated to compare model performance.
| Model | Mean Squared Error (MSE) | Mean Absolute Error (MAE) | RΒ² Score |
|---|---|---|---|
| Linear Regression | 0.55 | 0.53 | - |
| Random Forest | 0.26 | 0.33 | 0.81 |
Conclusion: The Random Forest model outperforms Linear Regression with a lower MSE and MAE, and a high RΒ² Score (0.81), indicating it captures more variance in the dataset.
To run this project, install the required dependencies:
pip install pandas numpy scikit-learn matplotlib1οΈβ£ Clone the repository:
git clone https://github.com/your-repo/California_Housing_Price_Prediction.git
cd California_Housing_Price_Prediction2οΈβ£ Run the Python script in a Jupyter Notebook:
jupyter notebook3οΈβ£ Execute the cells step by step to see the data processing, model training, and evaluation.
- Code Crafters Bm β Project development and implementation.
- Inspired by
sklearn.datasetsand regression modeling techniques.