This simple program computes a multiple linear regression based on Gradient Descent as an optimization method.
This code could be used for a vast majority of datasets, one of which is provided in the result section. This dataset could be found on the Kaggle website (https://www.kaggle.com/datasets/akshaydattatraykhare/data-for-admission-in-the-university)
A multiple linear regression is derived based on 70 percent of participants as the Training Set, and the remaining 30 percent is used to verify the model as the Test Set.
-
Download a Python file ("Code.py")
-
Insert the Training Set (CSV format) in the " importData " function as input.
-
Determine a file path for the total cost in the " reportTotalCost " function.
-
Determine a file path for the final coefficients in multiple linear regression in the " reportTheta " function.
-
Determine a file path for the Test Set to predict its result and also a file path for reporting the result in the " findAnswer " function.
-
Run the program
the dataset file must be specified in the CSV data format.
A wide range of real datasets (in CSV format) is available at data science websites such as Kaggle (www.kaggle.com).
The entire program is written by Ashkan Fouladi (fooladiashkang@gmail.com).