Linear Regression
Linear Regression is basic and most widely used type of predictive analysis.
Content
- Define
- Goal of Linear regression
- Types of Linear regression
- Assumptions of Linear Regression
- Evaluations Metrics
- Points to Remember
- Applications
- References
Definition
Linear Regression is one of the simplest supervised machine learning algorithm which helps to find the relationship between one or more independent variables ( predictors ) denoted as X and the dependent variables ( target ) denoted as y.
y ( L.H.S side here ) is also known as Dependent variables or Response variable or Outcome variable.
X ( R.H.S side here ) is also known as independent variables or Explanatory variable or Predictor variable.
In above diagram, the blue dots shows us the distribution of y w.r.t. x. There is no such straight line which runs through all the data points. So, the main aim here is to best fit a regression line, which will try to minimize the error between actual and predicted values.
Finding the best fit line
By minimizing the distance ( or say error ) between all the data points and regression line we can find the best fit line for our dataset. There are different ways using which we can minimize the distance, such as by using sum of squared errors, sum of absolute errors or root mean squared error etc.
Our main aim is to minimize the cost function by updating the different values of θ. The minimize value of cost function will give us the best fit regression line for our dataset.
Types of Linear Regression :
Linear Regression is generally divided into two types:
- Simple Linear Regression :- In simple linear regression we have only one explanatory variable X and a corresponding y variable.
- Multiple Linear Regression :- In Multiple linear regression we have one or more explanatory variable X and the corresponding y variable.
Assumptions of Linear Regressions :-
- Normality :- For any fixed value of X, y is normally distributed.
- Linearity :- The relationship between X and y is linear.
- Independence :- Observations are independent of each other.
- Homoscedasticity :- The variance of residual are same for any value of X.
Evaluation Metrics in Linear Regressions :-
Following are some evaluation metrics for the Linear Regression
- Mean Squared Error (MSE) :- MSE basically gives us the average squared difference between the predicted value and the actual value of data. It has convex shape and it penalizes the large errors.
- Mean Absolute Error (MAE) :- It simply gives us the absolute difference between the target value and the predicted value.
- Root Mean Squared Error (RMSE) :- It gives us the square root of the average difference of the predicted and the actual value.
Points to remember :-
- It is used to solve the regression problem.
- The response variables are continuous in nature.
- Linear Regression is sensitive to outliers.
Application of Linear Regression :
Following are the few applications of linear regression in real life in different domain.
Business Application : ex :- Advertising spending and Revenue
Medical Application : ex :- drug dosage and blood pressure of patients
Agricultural Application : ex :- effect of fertilizer and water on crop yields
References :-
- Wikipedia
- Towardsdatascience blog
- Few other blogs