This post is also available in: हिन्दी (Hindi) العربية (Arabic)
What is Regression Analysis?
Regression analysis is a way to find trends in data. For example, you want to know if there is any connection between how much you eat and how much you weigh; regression analysis can help you to find that.
Regression analysis will provide you with an equation for a graph so that you can make predictions about your data. This regression equation is called a regression model. The outcome variable (generally denoted by ‘y’) is called the response or dependent variable, and the predictor variable (generally denoted by ‘x’) is called an explanatory or independent variable).
For example, if someone has been putting on weight over the last few years, it can predict how much she/he will weigh in the next five years or ten years if the same trend continues. Here the number of years is called the explanatory or independent variable and weight is called the response or dependent variable.
Regression Model and Machine Learning
Regression analysis is a way of predicting future happenings between a dependent and one or more independent variables. It is one of the most common models of machine learning. It differs from classification models because it estimates a numerical value, whereas classification models identify which category an observation belongs to. The main uses of regression analysis are forecasting, time series modeling, and finding the cause and effect relationship between variables.
Types of Regression Models
There are numerous types of regression models that you can use. The choice often depends on the kind of data you have for the dependent variable and the type of model that provides the best fit. The most commonly used regression models are discussed below:
1. Linear Regression
It is the simplest regression model used for predictive analysis. It comprises a predictor variable and a dependent variable related to each other in a linear fashion.
The general form of a linear regression equation is y = ax + b, where y is a dependent variable and x is an independent variable having b as the slope (or gradient) and c the y-intercept (the point where the line crosses the y-axis) of the line.
You should use linear regression where your variables are related linearly. For example, if you are forecasting the effect of increased advertising spend on sales.
2. Logistic Regression
When your dependent variable has a discrete value, then a logistic regression model is used. A discrete variable is one that can have one of the two values (either 0 or 1, true or false, black or white, spam or not spam, and so on).
Logistic regression uses a sigmoid curve to show the relationship between the dependent variable and independent variable(s). However, logistic regression works best with large data sets that have an almost equal occurrence of values in the dependent variables. The general form of logistic regression equation is P = 1/(1 + e-(a + bx)).
The logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead, or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object is detected in the image would be assigned a probability between 0 and 1; with a sum of one.
3. Ridge Regression
Ridge regression is implemented for analyzing numerous regression data. When multicollinearity occurs, least-square calculations get unbiased, then a bias degree is affixed to the regression calculations that yield a reduction in standard errors through ridge regression.
In simple words, sometimes the regression model becomes too complex and approaches overfit, so it is worthwhile to minimize the variance in the model and save it from overfitting. So ridge regression corrects the size of the coefficients.
Ridge regression acts as a remedial measure used to ease collinearity in between predictors of a model since the model includes correlated featured variables, so the final model is confirmed and rigid in its maximum approach.
4. Lasso Regression
Like ridge regression, lasso (Least Absolute Shrinkage Selector Operator) regression is another regularization technique that reduces the model’s complexity. It does so by prohibiting the absolute size of the regression coefficient. This causes the coefficient value to become closer to zero, which does not happen with ridge regression.
The advantage of this model is that it can use feature selection, letting you select a set of features from the dataset to build the model. By only using the required features – and setting the rest as zero – lasso regression avoids overfitting.
5. Polynomial Regression
Polynomial regression models a non-linear dataset using a linear model. It is the equivalent of making a square peg fit into a round hole. It works in a similar way to multiple linear regression (which is just a linear regression but with multiple independent variables) but uses a non-linear curve. It is used when data points are present in a non-linear fashion.
The model transforms these data points into polynomial features of a given degree and models them using a linear model. This involves best fitting them using a polynomial line, which is curved, rather than the straight line seen in linear regression.