Types of Regression Models in Machine Learning

Regression analysis is among the most widely used supervised machine learning methods, which trains labelled data to produce continuous values. It’s crucial to select the appropriate regression algorithm for the data and issue that the model attempts to answer when using different types of regression algorithms.

Regression analysis is a predictive statistical method that assesses the relationship between the dependent variable (target variable) and different independent values (predictor variables). Numerous practical uses can be made of regression. It is crucial for any machine learning-related problem that includes numbers in a continuous manner; examples include, but are not restricted to the following:

1. Weather prediction
2.  Age prediction
3. Market movements
4. Estimation of house prices
5. Estimation of stock price
6. Sales and marketing prediction

Types of Regression Models

regression models

1. Linear Regression

Regression using a linear relationship between an independent and a target variable is among the most basic kinds of machine learning regression. If the variables are related to one another linearly, one should utilize linear regression.

For example, by performing a linear regression analysis on the sales reports with monthly sales, a business might predict sales in the months to come if its sales have been growing gradually every month for the last several years.

Linear regression can be represented by the equation below.

Y=bX+c

Where,
Y represents target variables,
X represents predictor variables,
and b and c are the linear coefficients.

linear regression

 

In this illustration, with an increase in the house size, the market value of the home increases. This could help in forecasting the price of houses when further data is entered in future.

2. Support Vector Regression

Support Vector Machine is a type of supervised learning approach that may be applied to both classification and regression issues. Therefore, it is known as Support Vector Regression if we use it for regression-related issues.
The goal of support vector regression (SVR) is to find a hyperplane with a maximal margin that can accommodate the greatest number of data points.

The basic objective of SVR is to take into account as many data points as possible within the boundary lines, and the best-fit line must include as many data points as possible. Following is the image representation of SVR.

support vector regression

3. Lasso Regression

Regularization and feature selection are carried out through the Lasso Regression algorithm. It helps in reducing the complexity of the model by forbidding the regression coefficient’s absolute value. As a result, the coefficient value approaches zero.

L1 regularization is performed using lasso regression. L1 regularization results in the addition of a penalized term to those whose value is equivalent to the absolute value of the coefficient magnitude. If there are various predictor variables that are useless, lasso regression can be quite helpful.

It can make use of feature selection, allowing users to pick a subset of features from the dataset and utilize them to create the model. Lazo regression prevents overfitting by utilizing only the necessary characteristics and leaving the remainder at zero.
Lasso regression is given by:

Min (Sum Of The Squared Residuals + 𝛂 * |Slope|)

Let’s look at an example. Suppose there’s a dataset with employee salaries in the first column and experience (in years) in the other. We need to create a model that, given experience as input, can predict an employee’s salary.

lasso regression

The process of lasso regression involves increasing variance, which changes the slope depending on the circumstances. The slope of the regression line decreases and turns more horizontal with the increase in the value of alpha. With the increase of alpha, the model decreases sensitivity to the independent variable’s (Experience) variations.

4. Ridge Regression

In order to obtain superior long-term forecasts, ridge regression, one of the most reliable types of linear regression, introduces a small degree of bias.

Ridge regression is a specific method for analyzing multicollinear multiple regression data. Although it is a basic regularization method, due to the complicated science involved in its development, it is not generally employed. If there is significant collinearity seen between independent variables, a typical linear or even polynomial regression will not succeed; in this case, Ridge regression can be employed to address the issue.

The advantage of using ridge regression is the fact that the estimations are simply approximations of the genuine population values that are sufficiently biased to be relatively accurate.

Ridge regression and lasso regression are related, and a slight difference between the two can be seen in their equations. Ridge regression is given by:

Min(Sum Of The Squared Residuals +𝞪 * Slope2 )

It can be observed that when we compare the two models while considering all of the data points, the ridge regression line more closely approximates the model than the linear regression line. Therefore, with the help of ridge regression, optimization of model fit is done.

5. Logistic Regression

Logistic regression is a type of supervised learning approach in addressing classification issues. Classification problems are those which have discrete or binary dependent variables (such as 0 or 1) in them. It is an algorithm for predictive analysis that relies on the principle of probability. Although it is a form of regression, logistic regression differs from the linear regression algorithm in terms of how it is applied.

The complex cost function used in logistic regression is the sigmoid function (logistic function). In logistic regression, the data are modelled using this sigmoid function. The logistic function can be shown as follows:

f(X)=1/(1+(e)^(-x))

Where,
f(X) represents output (between the 0 and 1 value).
x represents the input to the function
e is the base of the natural logarithm.

logistic regression

6. Polynomial Regression

Regression techniques like polynomial regression use a linear model to represent a non-linear dataset. When employing polynomial regression, the initial features are converted into polynomial features of a specific degree before being modeled with a linear model. In other words, a polynomial line fits the data points the best.
It is to be noted that the independent variable’s power is greater than 1 in a polynomial regression. Polynomial regression is represented by the equation below:

Y = b + cX²

Where,
Y represents dependent/target variable,
X represents independent/predictor variables,
and b and c are the regression coefficients.
Polynomial regression differs from multiple linear regression in a way that in polynomial regression, a single variable has varying degrees instead of numerous elements that have the same degree.
Also, the point to be considered is that overfitting can occur when a higher degree polynomial is fitted to obtain a smaller error. Plotting the relationships will allow you to assess how well they fit and will help you ensure that the curve suits the situation.

7. Decision Tree Regression

A supervised learning approach known as a decision tree can be used to solve both types of problems, i.e., classification and regression. Also, problems, including categorical data and numerical data, can be resolved by it.
The decision tree operates according to the concept of the condition, as implied by its name. It is effective and contains powerful predictive analysis algorithms. Internal nodes, terminal nodes, and branches make up the majority of their attributes. Each internal node in a decision tree regression creates a structure like a tree, with each branch denoting the outcome of the test and every leaf node denoting the ultimate decision or outcome. Starting with the root node (dataset), a decision tree is created, which divides into subsets of the dataset (left and right child nodes). These children nodes are then subdivided into their child nodes, and eventually, these children nodes become the parent/root nodes of those nodes. Take a look at the figure below as an illustration:

Conclusion

So, in this article, we have studied the various machine learning regression models. It is clear that different regression models, after being employed in their specific condition, result in increased efficiency of the model. Therefore, in order to create the model and attain the best model accuracy, one can utilize the different regression analysis types in their data science and machine learning projects.

Leave a Reply

Your email address will not be published. Required fields are marked *