Multiple Linear Regression Analysis: Predicting Highway MPG

Project Overview

This project applies Multiple Linear Regression (MLR) to predict Highway mpg (miles per gallon) of a car(Fuel Information.Highway MPG) based on various vehicle features. The dataset includes technical specifications of cars, such as engine type, fuel type, and transmission details.

Objective

Identify key factors affecting highway fuel efficiency.
Build a robust regression model to predict Highway mpg (miles per gallon) of a car(Fuel Information.Highway MPG).
Evaluate the model’s performance and optimize feature selection.

Project Structure

📂 MLR Analysis/
│
├── 📄 LICENSE
│
├── 📄 README.md
│
├── 📂 data/
│   └── cars.csv  # Your dataset
│
└── 📄 MLR_analysis.ipynb  # Google Colab notebook with the full analysis

Dataset Information

The dataset includes the following features:

Vehicle Specifications: Identification.Model Year, Dimensions.Height, Dimensions.Width, etc.
Engine Details: Engine Information.Engine Type, Torque, Horsepower, etc.
Fuel Information: Fuel Type, City MPG, etc.
Transmission Details: Number of Forward Gears, Transmission Type.

Data Preprocessing Steps

Handling duplicated Values: Removed duplicated values
Encoding Categorical Variables:
- Applied Target Encoding for 'Identification.Make', 'Identification.Model Year', 'Engine Information.Engine Type','Engine Information.Driveline'
- Applied One-Hot Encoding for 'Fuel Information.Fuel Type','Engine Information.Transmission','Identification.Classification'
Feature Scaling: Standardized numerical features where necessary.
Outlier Treatment: Used IQR method to remove extreme values.

Exploratory Data Analysis (EDA)

Correlation Heatmap

A heatmap was generated to analyze the correlation between independent variables and the target variable (Fuel Information.Highway MPG).

Key Observations:

Fuel Information.City mpg is highly correlated with Highway MPG.
Some features exhibit multicollinearity, requiring feature selection.

Feature Selection

To select the best predictors:

Recursive Feature Elimination (RFE): Selected the most impactful features.
Multicollinearity Check (VIF): Removed features with VIF > 10 to avoid redundancy.
Correlation Analysis: Chose features highly correlated with the target variable but uncorrelated with each other.

Final Selected Features:

['Fuel Information.City mpg', 'Identification.Model Year_encoded', 
 'Fuel Information.Fuel Type_Diesel fuel', 'Fuel Information.Fuel Type_E85', 
 'Fuel Information.Fuel Type_Gasoline']

Model Building

Model Used: Multiple Linear Regression
Train-Test Split: 80% Training, 20% Testing
Evaluation Metrics:
- R² Score: Indicates how well the model explains variance.
- VIF Analysis: Ensures low multicollinearity.
- RMSE: Measures average error in predictions.

Original vs Predicted data

Here’s the comparison between the actual values and the predicted values:

Results & Model Evaluation

Metric	Before Handling Outliers	After Handling Outliers
R² Score	0.9121	0.9495
RMSE	0.0070	0.0432
Adjusted R²	0.9357	0.9529

After handling outliers, the model became more generalizable with reduced errors.

Residual Analysis

To check model assumptions, we plotted residuals to ensure they followed a normal distribution and exhibited homoscedasticity.

Key Takeaways from Residual Analysis:

The residuals approximately follow a normal distribution.
No clear heteroscedasticity, indicating stable variance.
Confirms that our model meets linear regression assumptions.

Key Takeaways

Feature Engineering Matters: Proper encoding and selection significantly improved model performance.
Multicollinearity is Crucial: Reducing VIF led to more stable coefficients.
Outlier Handling is Important: Post-cleaning, the model showed better predictive accuracy.
Business Impact: This model helps automobile manufacturers understand which factors most influence fuel efficiency.

How to Run Your Code

Open the Google Colab Notebook

Click the link below to open the project in Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aparna-analyst/Machine-Learning/blob/main/MLR_analysis.ipynb)

Upload the Dataset

Make sure to upload your dataset (cars.csv) to Colab by:
- Clicking Files on the left sidebar.
- Clicking Upload and selecting your CSV file.

Install Dependencies in Colab

If needed, install dependencies directly in Colab by running:

!pip install pandas numpy scikit-learn matplotlib seaborn

Run the Code

Execute all the cells in the notebook step by step by pressing Shift + Enter.

Save Results

Download your output files by right-clicking on them in the Files section and choosing Download.

Next Steps

Try Ridge/Lasso Regression to improve regularization.
Perform Cross-Validation for better generalization.
Test on new unseen vehicle data to validate real-world performance.

References

Scikit-Learn Documentation: https://scikit-learn.org/
Pandas Data Manipulation: https://pandas.pydata.org/

Contributors

Aparna S - LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multiple Linear Regression Analysis: Predicting Highway MPG

Project Overview

Objective

Project Structure

Dataset Information

Data Preprocessing Steps

Exploratory Data Analysis (EDA)

Correlation Heatmap

Feature Selection

Model Building

Original vs Predicted data

Results & Model Evaluation

Residual Analysis

Key Takeaways

How to Run Your Code

Open the Google Colab Notebook

Upload the Dataset

Install Dependencies in Colab

Run the Code

Save Results

Next Steps

References

Contributors

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
MLR_analysis.ipynb		MLR_analysis.ipynb
README.md		README.md
cars.csv		cars.csv

License

Aparna-analyst/Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Multiple Linear Regression Analysis: Predicting Highway MPG

Project Overview

Objective

Project Structure

Dataset Information

Data Preprocessing Steps

Exploratory Data Analysis (EDA)

Correlation Heatmap

Feature Selection

Model Building

Original vs Predicted data

Results & Model Evaluation

Residual Analysis

Key Takeaways

How to Run Your Code

Open the Google Colab Notebook

Upload the Dataset

Install Dependencies in Colab

Run the Code

Save Results

Next Steps

References

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages