Logistic Regression in Machine Learning

Logistic Regression is a type of classification algorithm. It is a supervised learning technique in which the prediction of the probability of a dependent variable (target variable) with the help of a provided collection of independent variables(predictor variable) is made. For example, to predict whether a plant is diseased or not, based on the independent variables (such as colour of leaves, spots on the leaves, shape, etc.), which could result as either “yes, the plant is diseased” or “No, the plant is not diseased”.

The reason for it to be called logistic regression is that it uses predictive modelling as regression, but it instead classifies samples; hence, it comes under the category of the classification algorithm. 

Rather than a regression line, logistic regression fits a logistic function of the “S” shape, classifying two maximum sample values. The curve obtained from this logistic function represents the probability of something, for instance, whether a person is diabetic or not, based on his blood sample report.

What is a Logistic Function?

  • A logistic Function, or S-form curve, is a mathematical tool for mapping predicted values to probabilities. 
  • While mapping a real value to another value, logistic regression takes a range of 0 and 1. This means that the result of logistic regression must fall within this range, and because it cannot go beyond this value, it has the shape of an “S” curve.

logistic regression

  • Logistic regression uses the threshold value notion, establishing the likelihood of 0 or 1. Values tend to be 1 when they are above this threshold value and likely to be 0 when below this threshold value. For example- 

          {output=0 ; predicted probability<0.5}, and 

          { output= 1; predicted probability>0.5}

Hypothesis for Logistic Regression

1. The nature of the dependent variable must be categorical.
2. The independent variables in the model must be independent to prevent multicollinearity.
3. Relevant data variables are to be included in the model.
4. It is recommended that the sample size be large for logistic regression.

Equation of Logistic Regression

The equation for logistic regression is given as

           log[Y/1-Y]=a₀+ax₁+a₂x₂+…….+aₙxₙ

where,

Y= target variable, 

x= predictor variable,

a₀=bias term,

a,a₂,a₃……aₙ= weights for x.

  • The equation can be derived from a linear regression equation, which represents a straight line and is given as

                         Y=a₀+ax₁+a₂x₂+…….+aₙxₙ

  • Let’s divide this equation by (1+Y) as Y in Logistic Regression may only be between 0 and 1.

                         Y/1-Y,  {0 ; for Y=0 and infinity; for Y=1}

  • The final equation after taking the logarithm (as a range of values between -infinity to +infinity is needed)

                          log[Y/1-Y]=a₀+ax₁+a₂x₂+…….+aₙxₙ

Types of Logistic Regression

1. Binary or Binomial Logistic RegressionWhen there are only two outcome possibilities, as in our initial example of plant disease prediction, this type is termed binary logistic regression.
2. Multinomial Logistic Regression—This type has multiple possible outcomes, such as expanding our previous example to determine whether the plant is diseased, healthy, or nutrient-deficient.
3. Ordinal Logistic RegressionThe outcome in ordinal logistic regression is ordered, like in the example above, which is built to identify the severity of disease infection by sorting it as low, medium, or high.

Use cases of Logistic Regression.

 Some applications of logistic regression are mentioned below.

  1. Heart attack prediction
  2. Prediction of customer’s tendency to end a subscription or buy a product.
  3. Predicting a process’s or product’s likelihood of failure.
  4. Predicting whether a transaction is a fraud or not.
  5. Predicting response from targeted audience in market sector.

Conclusion

 So, logistic regression is a vital machine learning technique, given its ability to offer probabilities and classify new data using continuous and discrete datasets.

When using Logistic Regression, there are a few considerations to be aware of, including the different kinds of logistic regression, kinds of predictor variables, and the available training data.

Leave a Reply

Your email address will not be published. Required fields are marked *