Algorithm | Linear Regression Example as House Prediction System
Linear regression algorithm is used to predict the continuous valued output from a labeled training set i.e. it is a supervised learning algorithm.
It is a well-known algorithm for machine learning as well as it is well-known in Statistics.
If you are new to the machine learning, do read introduction to machine learning for beginners.
For further explanation, let us consider a Linear Regression Example:
We are having a training set of a House prediction system. We are using features of House i.e. area, the number of bedrooms, etc. as input and its price is our output.
How does Linear regression algorithm functions?
(1) We are having labeled training set of houses where the input is a matrix having ‘m’ rows and ‘n’ columns. ‘m’ correspond to the number of houses in training set, each having ‘n’ number of features. The output of these training set is a vector having size ‘m’.
We will use ‘i’ as index denoting row number and ‘j’ as index denoting column number. ‘i’ denotes ith training example and ‘j’ denotes jth feature.
(2) Linear regression algorithm is fed with this training set which outputs the function ‘h’. ‘h’ stands for a hypothesis. So now when ‘h’ is fed with new input it tries to predict its output. Thus ‘h’ is a function that maps X to Y.
Y = h(X) h(X) = a1x1 + a2x2+ … + aNxN x1, x2, x3, … xN are featured. For e.g. x1 = size of house, x2 = number of bedrooms, x3 = number of bathrooms, etc. a1, a2, a3, ...aN are coefficients. Normally ‘theta’ is used as coefficients, just for my convenience I have used ‘a’ instead of ‘theta’. Y is price of house.
(3) Now we need to find values of coefficients so that we can use that values to predict output for corresponding input. That is, we need to find values of a1, a2, a3, … aN. For finding values of coefficients we use GRADIENT DESCENT ALGORITHM.
(4) Once we have values of all coefficients we can predict the output of linear regression example for unseen inputs.
How does Gradient descent algorithm functions in above linear regression example?
(1) Gradient descent is an optimization algorithm, where we try to optimize value of coefficients by minimizing a cost function (cost).
(2) Cost function (cost) is having coefficients as parameters. It has a shape like a bowl, where any point on bowl can be imagined as having coefficients as its coordinates. So minimum value of cost function will be at bottom of bowl so there we will have best sets of coefficients.
(3) We initialize coefficients with random values. Cost is calculated by plugging that values of coefficients into a function ‘f’.
Cost = f(coefficient)
(4) Then we calculate a derivative of a cost function, since derivative gives slope at a point, pointing to the direction in which we can move coefficient value in order to get the lower cost in next iteration.
Delta = derivative(cost)
(5) Once we know which direction is downhill we can keep updating value of coefficients until derivative of cost function tends to zero. A learning rate parameter (alpha) is used to control change in coefficient on each update.
Coefficient = coefficient – (alpha * delta)
(6) In order to speed up process of calculation of coefficients do scaling of features i.e. try to get value of all features in a single range. For e.g. -1 < feature < 1.
So if our feature is size of house, we will do scaling of it by dividing each value by 5000(range of size of house). Therefore if original size of house is 2000, we will assume 2000/5000 as our value of new feature.
This is simple linear regression example to get clear understanding. It is very useful for machine learning.