Skip to content

ventikaa/missing_value_imputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

missing_value_imputation

Missing value imputation refers to replacing missing data with substituted values in a dataset. This project explores python implementation of the various missing value imputation methods.

MEAN/MEDIAN/MODE IMPUTATION: These are statistical methods of imputation to replace missing values with the mean, median, or mode of the available values in a dataset. Mean Imputation: Replaces missing values with the mean (average) of the available values. This method is suitable for numerical data that does not have outliers, as outliers can significantly affect the mean. Median Imputation: Replaces missing values with the median of the available values. It is more robust than mean imputation, especially for data with outliers or a non-normal distribution. Mode Imputation: Replaces missing values with the mode (the most frequently occurring value). This method is used for categorical data.

PREDICTIVE IMPUTATION Predictive imputation involves using statistical models to predict and fill in missing values based on the relationships observed in the rest of the data. Some methods include: Regression Imputation: Uses a regression model to predict missing values based on other, related variables in the data. K-Nearest Neighbors (KNN) Imputation: Identifies ‘k’ samples in the dataset that are similar to the observation with missing data and imputes values based on the average (or majority) of these ‘k’ neighbours.

LAST OBS CARRIED FORWARD (LOCF) AND NEXT OBS CARRIED BACKWARD (NCOB) MODEL These are imputation methods typically used in time series data or longitudinal studies where the ordering of observations is meaningful. Last Observation Carried Forward (LOCF): Replaces a missing value with the last observed value prior to the missing one. It is based on the assumption that the best guess for a missing value is the one that was most recently observed. Next Observation Carried Backward (NOCB): It is the reverse of LOCF. It replaces a missing value with the next observed value after the missing one.

About

Missing value imputation refers to replacing missing data with substituted values in a dataset. This project explores python implementation of the various missing value imputation methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages