0% found this document useful (0 votes)
12 views

Report Logistic Regression

Logistic regression is a statistical method used to analyze data with one or more independent variables that determine an outcome measured as dichotomous variables. It predicts the probability of a categorical dependent variable using a logistic function to transform predictions into probabilities between 0 and 1. Logistic regression is similar to linear regression but is used for classification problems with outputs of 0 or 1 rather than regression.

Uploaded by

imane bousslime
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Report Logistic Regression

Logistic regression is a statistical method used to analyze data with one or more independent variables that determine an outcome measured as dichotomous variables. It predicts the probability of a categorical dependent variable using a logistic function to transform predictions into probabilities between 0 and 1. Logistic regression is similar to linear regression but is used for classification problems with outputs of 0 or 1 rather than regression.

Uploaded by

imane bousslime
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

What is Logistic Regression:

Logistic regression is a statistical method used to analyze a data set


where there are one or more independent variables that determine
the outcome. Outcomes are measured as dichotomous variables
(only two possible outcomes). It is used to predict the probability of a
categorical dependent variable. In logistic regression, the dependent
variable is binary, meaning it only contains data coded as 1 (yes,
success, etc.) or 0 (no, failure, etc.).
Understanding Logistic Regression:

In logistic regression, we are essentially trying to find the


weights that transform the input data (independent
variables) into predictions (dependent variables).
Using a logistic function (also called a sigmoid
function), we ensure that these predictions are in the
range 0 to 1. This function maps each real number to a
different value between 0 and 1. In the case of logistic
regression, it converts the output of linear regression
into probabilities.

The logistic function has an S-shaped curve, defined by


the following formula:

Event probability = 1 / (1 + e^(-y))


where y is a linear combination of input features,
weighted by model coefficients.
"e" is the base of the natural logarithm, and "y" is the
equation of the straight line (y = mx + b in simple linear
regression).

It is used to predict a categorical dependent variable using a


specific set of independent variables.

Logistic regression predicts the output of a categorical


dependent variable.
Therefore, the result must be a categorical or discrete value.

It can be yes or no, 0 or 1, true or false, etc., but instead of


giving exact values of 0 and 1, it shows a probability value
between 0 and 1.

Apart from its usage, logistic regression is very similar to linear


regression.

Linear regression is used to solve regression problems, while


logistic regression is used to solve classification problems.
In logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function to predict the two maximum
values (0 or 1).

The logistic function curve gives the probability of whether a


cell becomes cancerous, whether the mouse is obese (based
on body weight), etc.

Logistic regression is an important machine learning algorithm


because it can provide probabilities and classify new data using
both continuous and discrete data sets.

Logistic regression can be used to classify observations based


on different types of data, and the most effective variables for
classification can be easily determined.
Logistic Function (Sigmoid Function):

The sigmoid function is a mathematical function that


maps predicted values to probabilities.

It maps any real value to another value in the range 0


and 1.

The values for logistic regression must be between 0


and 1 and cannot exceed this limit, forming a curve like
an "S" shape.

The S-shaped curve is called a sigmoid function or


logistic function.

In logistic regression, we use the concept of threshold


to define the probability of 0 or 1.

For example, values above the threshold tend to 1 and


values below the threshold tend to 0
Differences b/w Linear and Logistic Regression:
Terminologies involved in Logistic Regression:

Independent variables: Input features or predictor variables that are


applied to the prediction of the dependent variable.

Dependent variable: The target variable we want to predict in the


logistic regression model.

Logistic function: A formula used to express how independent and


dependent variables relate to each other.

Logistic functions convert input variables into probability values


between 0 and 1, representing the probability that the dependent
variable is 1 or 0.

Odds: The ratio between what happens and what doesn't happen. It is
different from probability because probability is the ratio of what
happens to what could happen.

Log Odds: Log odds, also known as the logit function, is the natural
logarithm of odds. In logistic regression, the log odds of the dependent
variable are modeled as a linear combination of the independent
variables and the intercept.
How does Logistic Regression work?

Logistic regression models convert the continuous-valued


output of a linear regression function into a categorical-
valued output using the sigmoid function, which assigns each
real-valued set of independent variable inputs to a value
between 0 and 1. This function is called a logical function.

Let be an independent input feature

And the dependant variable is Y having only binary value i.e 0 or


1
Then apply the multi-linear function to the input variables
X

Here xi is the ith observation of xi Wi=(W1,W2,W3…….Wn)


Is the weight or coefficient, and b is the bias term also
know as intercept. Simply this can be represented as the
dot product of weight and bias.

Whatever we discussed above is the linear regression


Sigmoid Function

Now we use the sigmoid function where the input will be


z and we find the probability between 0 and 1.i.e
Predicted y.

As shown above, the figure sigmoid function converts


the continuous variable data into the probability i.e.
between 0 and 1.
Where the probability of being a class can be measured
as:

Logistic Regression Equation

Probability is the ratio of something happening to


something not happening.

It is different from probability because probability is the


ratio of what happens to what could happen. would be
weird

Applying natural log on odd will be


Then the final regression equation will be:

Likelihood function for logistic Regression

The predicted probabilities will p(X;b;w) = p(x) for y = 1


and for y = 0 predicted probabilities will 1-(X;b;w) = 1-p(x)

Taking natural logs on both sides


Gradient of the log-likelihood function

The assumptions of logistic regression are as follows:

Independent Observations: Each observation is


independent of other observations.
This means that there is no correlation between the
input variables.

Binary dependent variable: Assume that the dependent


variable must be binary or dichotomous, which means it
can only take on two values.
SoftMax function is used for more than two categories.
Linear relationship between the independent variable

and the log odds: The relationship between the


independent variable and the log odds of the dependent
variable should be linear.

No outliers: The dataset should not contain any outliers.


Large sample size: The sample size is large enough
Type of logistic Regression:

On the basis of the categories, Logistic Regression can


be classified intro three types:

1. Binomial: In binomial logistic regression, the


dependent variable can only be of two possible
types, such as B. 0 or 1, pass or fail, etc.

2. Polynomial: In multinomial logistic regression, there


can be three or more possible unordered types of
dependent variables, such as "cat", "dog", or "sheep".

3. Ordinal: In ordinal logistic regression, there can be


three or more possible ordinal types of the
dependent variable, such as B. "Low", "Medium" or
"High".

Code implementation for logistic Regression

Binomial Logistic regression:

The target variable can only have two possible types: "0"
or "1", in this case "win" vs. "lose", "pass" vs. "fail", "dead"
vs. "alive", etc. can represent. Us the Sigmoid function,
which has been discussed above.
Import necessary libraries according to the requirements
of the model.

This Python code shows how to implement a logistic


regression model for classification using a breast cancer
dataset.

Multinomial Logistic Regression:

A target variable can have three or more unordered


possible types (that is, the types have no quantitative
meaning), such as "Disease A" vs. "Disease B" vs.
"Disease C."

In this case, use the SoftMax function instead of the


sigmoid function. The SoftMax function of class K is:

Here, k represents the number of elements in the vector


z, and i, j iterates over all the elements in the vector.
Then the probability will be:

Ordinal Logistic Regression:

It deals target variables with ordered categories. For


example, a test score can be categorized as:
“Very poor”,” poor”, “good” or” very good”, Here, each
category can be given a score like 0,1,2 or,3

What is an activation function:

In an artificial neural network, a node's activation


function defines the output of that node or neuron for a
specific input or set of inputs.

That output is then used as input to the next node, and


so on until the desired solution to the original problem is
found.
It maps the resulting value to the desired range, e.g., B.
between 0 and 1 or -1 and 1 etc.

This depends on the choice of activation function. For


example, using a logistic activation function maps all
inputs to a real number range from 0 to 1.

Example of a binary classification problem:

In a binary classification problem, we have an input x,


say an image, and we have to classify it as having a
correct object or not.

If it is a correct object, we will assign it a 1, else 0.

So here, we have only two outputs – either the image


contains a valid object or it does not.

This is an example of a binary classification problem.


When we multiply each of them features with a weight
(w1, w2, …… wm) and sum them all together, node output
= activation (weighted sum of inputs).

Types of activation Function:

The Activation Functions are basically two types:

1. Linear Activation Function-

Equation: f(x)=x
Range: (-infinity to infinity)
2. Non-linear Activation Function-

It makes it easy for the model to generalize with a variety


of data and to differentiate between the output.

By simulation, it is found that for larger networks ReLUs


is much faster.

It has been proven that ReLUs result in much faster


training for large networks.

Non-linear means that the output cannot be reproduced


from a linear combination of the inputs.

The main terminologies needed to understand for


nonlinear functions are:

Derivative: Change in y-axis w.r.t.

change in x-axis.

It is also known as slope.

Monotonic function: A function which is either entirely


non-increasing or non-decreasing.
In conclusion, logistic regression is a powerful

statistical technique that allows us to model

the probability of a binary event based on a


set of input variables.

It is widely used in machine learning and data

analysis, and its interpretability makes it a

valuable tool for understanding the

relationship between input variables and


output probabilities.

You might also like