Skip to content

A Python script to clean and preprocess house price data from Excel, removing invalid and missing values for better analysis.

Notifications You must be signed in to change notification settings

FaNa-AI/Data-exploration-and-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🏠 House Price Data Cleaning Script

Clean and preprocess the house price dataset from an Excel file, preparing it for further analysis or modeling by removing invalid, missing, or non-numeric data.


πŸ“‚ Dataset

  • Input: HousePricePrediction.xlsx (Excel file, Sheet1)
  • Output: Cleaned_HousePricePrediction.xlsx (cleaned and saved Excel file)

🧹 Data Cleaning Steps

  1. Initial Exploration Display dataset info and summary statistics.

  2. Filter Invalid Records Remove rows with zero or negative values in critical columns:

    • LotArea
    • YearBuilt
    • YearRemodAdd
    • TotalBsmtSF
    • SalePrice
  3. Handle Missing Data Drop rows containing any NaN values.

  4. Keep Numeric Columns Only Remove all non-numeric columns to ensure clean data for modeling.

  5. Save Cleaned Dataset Export the processed data to Cleaned_HousePricePrediction.xlsx.


πŸš€ How to Use

  1. Place your input Excel file at the correct path (file_path).

  2. Install dependencies:

    pip install pandas openpyxl
  3. Run the cleaning script:

    python house_price_cleaning.py

πŸ“Š Output

  • A clean, preprocessed Excel file: Cleaned_HousePricePrediction.xlsx.
  • Console report showing the count of records remaining after cleaning.

🎯 Why Use This Script?

  • Automates tedious data cleaning steps.
  • Ensures dataset integrity by removing invalid or missing data.
  • Prepares data perfectly for machine learning models or further analysis.

About

A Python script to clean and preprocess house price data from Excel, removing invalid and missing values for better analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors