09_23ECE216_LogisticRegression
09_23ECE216_LogisticRegression
1
𝑦 =
1 + 𝑒− 𝛽0 +𝛽1 ∗𝑥
Important Points
• Logistic regression models the probability of the default class (e.g. the first class).
• Note: we do not explicitly consider the second class since the idea is that if the
probability of the sample belonging to the first class is small, it by default,
means that the sample has a correspondingly higher probability of belonging to
the second class.
• For example, if we are modeling people’s gender as male or female from their height,
then the first class could be male and the logistic regression model could be written
as the probability of male given a person’s height, or more formally:
𝑃 𝑔𝑒𝑛𝑑𝑒𝑟 = 𝑚𝑎𝑙𝑒 ℎ𝑒𝑖𝑔ℎ𝑡
• Written another way, we are modeling the probability that an input (X) belongs to the
default class (𝑦 = 1), we can write this formally as:
𝑃(𝑋) = 𝑃(𝑌 = 1|𝑋)
Simple Linear Regression
3. Model Representation
In logistic regression, we predict the probability that the output ( y ) is 1 given the input features 𝒙:
𝑇
1
𝑃 𝑦 =1 𝒙 =𝜎 𝒘 𝒙+𝑏 = 𝑇
1 + 𝑒 − 𝒘 𝒙+𝑏
where:
• 𝒘 is the weight vector.
• 𝒙 is the feature vector.
• b is the bias term.
Summary of Logistic Regression
3. Model Representation
Note that the 𝑃 𝑦 = 1 𝒙 can also be represented using the letter 𝑝.
1
Taking natural logarithms, we can also represent the function: 𝑃 𝑦 = 1 𝒙 = 𝑇 as:
1+𝑒 − 𝒘 𝒙+𝑏
𝑝
ln = 𝒘𝑇 𝒙 + 𝑏
1−𝑝
where:
• 𝒘 is the weight vector.
• 𝒙 is the feature vector.
• b is the bias term
𝑝
• is called the odds-ratio (ratio of probability of success (i.e. getting 1) and failure)
1−𝑝
The cost function for logistic regression is the log-loss (binary cross-entropy):
1 𝑚
𝐽 𝒘, 𝑏 = − σ 𝑦𝑖 ln ℎ𝑤 (𝑥𝑖 ) + 1 − 𝑦𝑖 ln(1 − ℎ𝑤 (𝑥𝑖 ))
𝑚 𝑖=1
Where:
𝑚 = number of training samples
1
𝑇
ℎ𝑤 𝑧 = 𝜎 𝒘 𝒙 + 𝑏 = − 𝒘𝑇 𝒙+𝑏
= the hypothesis
1+𝑒
Summary of Logistic Regression
5. Gradient Descent
𝜕𝐽(𝒘.𝑏)
𝒘=𝒘−𝛼
𝜕𝑤
𝜕𝐽(𝒘.𝑏)
𝑏 =𝑏−𝛼
𝜕𝑏
𝜕𝐽 𝒘,𝑏 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝜕𝒘 𝑚 𝑖=1
𝜕𝐽(𝒘,𝑏) 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖
𝜕𝑏 𝑚 𝑖=1
7. Prediction
To make a prediction, we compute the probability:
𝑃(𝑦 = 1 ∣ 𝑥) = 𝜎(𝑤𝑥 + 𝑏)
Q. Let's consider the simple example with one feature (x) and a binary
outcome(y) below. Use logistic regression to model this dataset and find
the class to which each of the samples in the training set are assigned by
the model (In-sample testing).
Feature Class
X y
2 0
3 0
5 1
7 1
S. No. Input Output
Class
Example 1
2
X
2
y
0
Steps: 3 3 0
4 5 1
1.Initialize weights and bias: ( w = 0 ), ( b = 0 ). 5 7 1
Note two
classes
2. Initialize the learning rate : 𝛼 = 0.1 represen
ted using
numbers
1
3. The hypothesis: ℎ𝑤 𝑧 = 𝜎 𝑤 𝑥 + 𝑏 =
𝑇
𝑇
0 and 1
1+𝑒 − 𝑤 𝑥+𝑏
Example
Iteration 1:
S. No. Input Output
Step 1.1: Compute Z: Class
𝑧1 = 𝑤. 𝑥 + 𝑏 = 0 ∗ 2 + 0 = 0 1 X y
𝑧2 = 𝑤. 𝑥 + 𝑏 = 0 ∗ 3 + 0 = 0 2 2 0
𝑧3 = 𝑤. 𝑥 + 𝑏 = 0 ∗ 5 + 0 = 0 3 3 0
𝑧4 = 𝑤. 𝑥 + 𝑏 = 0 ∗ 7 + 0 = 0 4 5 1
5 7 1
Step 1.2: Compute hypothesis h for each zi:
1
ℎ1 = 𝜎 𝑧1 = = 0.5
1+𝑒 −0
1
ℎ2 = 𝜎 𝑧2 = = 0.5
1+𝑒 −0
1
ℎ3 = 𝜎 𝑧3 = = 0.5
1+𝑒 −0
1
ℎ4 = 𝜎 𝑧4 = = 0.5
1+𝑒 −0
Example S. No. Input Output
Class
1 X y
2 2 0
1
= − 4 1 ∗ ln 0.5 + 1 ∗ ln 0.5 + 1 ∗ ln 0.5 + 1 ∗ ln 0.5
= 0.6931
S. No. Input Output
Example 1 X
Class
y
2 2 0
3 3 0
Step 1.4: Compute the gradients: 4 5 1
5 7 1
𝜕𝐽 𝒘.𝑏 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝜕𝒘 𝑚 𝑖=1
1 1
= 0.5 − 0 ∗ 2 + 0.5 − 0 ∗ 3 + 0.5 − 1 ∗ 5 + 0.5 − 1 ∗ 7 = 1 + 1.5 − 2.5 − 3.5 = −0.875
4 4
𝜕𝐽(𝒘.𝑏) 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖
𝜕𝑏 𝑚 𝑖=1
1 1
= 0.5 − 0 + 0.5 − 0 + 0.5 − 1 + 0.5 − 1 = 0.5 + 0.5 − 0.5 − 0.5 = 0
4 4
Example S. No. Input Output
Class
1 X y
2 2 0
3 3 0
Step 1.5: Update the weights: 4 5 1
5 7 1
𝜕𝐽 𝑤.𝑏
𝑤 =𝑤−𝛼 = 0 − 0.1 ∗ −0.875 = 0.0875
𝜕𝑤
𝜕𝐽 𝒘.𝑏
𝑏 =𝑏−𝛼 = 0 − 0.1 ∗ 0 = 0
𝜕𝑏
1
ℎ1 = 𝜎 𝑧1 = = 0.5436
1+𝑒 −0.175
1
ℎ2 = 𝜎 𝑧2 = = 0.5652
1+𝑒 −0.2625
1
ℎ3 = 𝜎 𝑧3 = = 0.6077
1+𝑒 −0.4375
1
ℎ4 = 𝜎 𝑧4 = = 0.6485
1+𝑒 −0.6125
Example S. No. Input Output
Class
1 X y
2 2 0
Step 2.3 : Compute the cost 3 3 0
4 5 1
1
𝐽 𝑤, 𝑏 = − σ4𝑖=1 𝑦𝑖 ln ℎ𝑖 + 1 − 𝑦𝑖 ln(1 − ℎ𝑖 ) 5 7 1
4
1
=− 1 ∗ ln 0.4564 + 1 ∗ ln 0.4348 + 1 ∗ ln 0.6077 + 1 ∗ ln 0.6485
4
= 0.6372
Example S. No. Input Output
Class
1 X y
2 2 0
Step 2.4: Compute the gradients: 3 3 0
4 5 1
𝜕𝐽 𝒘.𝑏 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖 𝑥𝑖 5 7 1
𝜕𝒘 𝑚 𝑖=1
1
= 0.5436 − 0 ∗ 2 + 0.5652 − 0 ∗ 3 + 0.6077 − 1 ∗ 5 + 0.6485 − 1 ∗ 7
4
1
= 1.0872 + 1.6956 − 1.9615 − 2.4545 = −0.4083
4
𝜕𝐽(𝒘.𝑏) 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖
𝜕𝑏 𝑚 𝑖=1
1
= 0.5436 − 0 + 0.5652 − 0 + 0.6077 − 1 + 0.6485 − 1 = 0.09075
4
Example S. No. Input Output
Class
Step 2.5: Update the weights:
1 X y
𝜕𝐽 𝑤.𝑏 2 2 0
𝑤 =𝑤−𝛼 = 0.0875 − 0.1 ∗ −0.4083 = 0.1283 3 3 0
𝜕𝑤
4 5 1
𝜕𝐽 𝒘.𝑏 5 7 1
𝑏 =𝑏−𝛼 = 0 − 0.1 ∗ 0.09075 = −0.0091
𝜕𝑏
Iteration 3:
1
= − ሾ 0 ∗ ln 0.5615 + 1 − 0 ln(1 − 0.5615 ) + 0 ∗ ln 0.5928 + 1 − 0 ln(1 − 0.5928 ) +
4
1 ∗ ln 0.6508 + 1 − 1 ln(1 − 0.6508 ) + 1 ∗ ln 0.7045 + 1 − 1 ln(1 − 0.7045 ሿ
1
=− 1 ∗ ln 0.4385 + 1 ∗ ln 0.4072 + 1 ∗ ln 0.6508 + 1 ∗ ln 0.7045
4
= 0.6257
Example x y
2 0
3 0
Step 3.4: Compute the gradients: 5 1
7 1
𝜕𝐽 𝒘.𝑏 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝜕𝒘 𝑚 𝑖=1
1
= 0.5615 − 0 ∗ 2 + 0.5928 − 0 ∗ 3 + 0.6508 − 1 ∗ 5 + 0.7045 − 1 ∗ 7 = −0.2283
4
𝜕𝐽(𝒘.𝑏) 1 𝑚
= σ ℎ𝑤 𝑥𝑖 − 𝑦𝑖
𝜕𝑏 𝑚 𝑖=1
1
= 0.5615 − 0 + 0.5928 − 0 + 0.6508 − 1 + 0.7045 − 1 = 0.1247
4
S. No. Input Output
Example Class
1 X y
Step 3.5: Update the weights: 2 2 0
3 3 0
𝜕𝐽 𝑤.𝑏 4 5 1
𝑤 =𝑤−𝛼 = 0.1283 − 0.1 ∗ −0.2283 = 0.1511
𝜕𝑤 5 7 1
𝜕𝐽 𝒘.𝑏
𝑏 =𝑏−𝛼 = −0.0091 − 0.1 ∗ 0.1247 = −0.02157
𝜕𝑏
• And so on…
• Note that 149th iteration is taken just for illustration, the steepest
descent algorithm converges at a much later iteration.
S. No. Input Output Class
Example 1 X y
2 2 0
1 1
• 𝑃 𝑦=1𝑥=2 = = = 0.3209, Since 0.3209 < 0.5, we have 𝑦ො = 0
1+𝑒 −(0.5771∗2−1.9037) 1+𝑒 0.7495
1 1
• 𝑃 𝑦=1𝑥=3 = = = 0.4570 , Since 0.4570 < 0.5, we have 𝑦ො = 0
1+𝑒 −(0.5771∗3−1.9037) 1+𝑒 0.1724
1 1
• 𝑃 𝑦=1𝑥=5 = = = 0. 7275 , Since 0.7275 > 0.5, we have 𝑦ො = 1
1+𝑒 −(0.5771∗5−1.9037) 1+𝑒 −0.9818
1 1
• 𝑃 𝑦=1𝑥=7 = = = 0.8944 , Since0.8944 > 0.5, we have 𝑦ො = 1
1+𝑒 −(0.5771∗7−1.9037) 1+𝑒 −2.137
We see from the above results that the logistic regression model is able to
correctly predict the class for all the four samples.
Hence in-sample testing Completed
References
• Tushar B. Kute, Logistic Regression, http://tusharkute.com
• https://machinelearningmastery.com