Can we predict the Tenure duration of our customers with the variables we have?
The goal of this analysis is to predict customer Tenure based on the information we have about them such as their patterns of use of the services we offer like Bandwidth, Device Protection, and Streaming Movies, as well as the type of contract they have made.
The method used in this data analysis is Multiple Linear Regression. Multiple Linear Regression attempts to model the relationship between two or more predictor variables with the target variable by fitting a linear equation to the observed data.
The Assumptions of the Multiple Regression Model are:
a) There is a linear relationship between the dependent and the independent variables
b) The independent variables are not highly correlated to each other (multicollinearity)
c) The residuals are normally distributed
d) The observations are independent of each other
The tool for analysis was Python. Python and its packages are very efficient for visualizations, quick statistical summaries, feature selection methods, and linear regression models. Regression analysis was an appropriate technique to answer my research question since I had a bimodal continuous dependent variable (Tenure) and multiple numeric and categorical independent variables with various amounts values (such as Bandwidth, StreamingTV, InternetServices, etc.). Also, not only did I want to find out about the correlation and the magnitude of the relationship between my predictor and target variables, but also I wanted to utilize this model for future predictions.
The Heatmap helps to quickly visualize the strength relationship of variables across two axes.
Q-Q Plots (Quantile Quantile Plots) are visual representations of a sample distribution against a theoretical distribution. Q-Q plots help us determine if our dataset is following a particular type of probability distribution, such as normal, or exponantial.
R-squared
f-statistic
Residual standard error(RSE)

