06LogisticRegression
06LogisticRegression
Machine Learning
Dr. Muhammad Amjad Iqbal
Associate Professor
University of Central Punjab, Lahore.
[email protected]
https://sites.google.com/a/ucp.edu.pk/mai/iml/
Slides of Prof. Dr. Andrew Ng, Stanford & Dr. Humayoun
Logistic Regression
A Classification Algorithm
2
Classification
Logistic Regression:
Classification task
7
Hypothesis Representation
8
Logistic Regression Model
Want
0.5
Sigmoid function 0
Logistic function
Need to select parameters so that this line fits data
Do it with an algorithm later 9
Interpretation of Hypothesis Output
= estimated probability that y = 1 on a new input x
Example: If
𝜽 𝑻 𝒙=¿
If
`
If 𝑧 is + ve then 𝑔 ( 𝑧 ) ≥ 0.5i . e . 𝜃 𝑇 𝑥 ≥ 0
If
If 𝑇
i . e . 𝜃 𝑥< 0
12
For any example with features x1, x2 that satisfy this equation predicts y=1
Decision Boundary
x2
3
2
1 2 3 x1
Predict “ “ if
15
Non-linear decision boundaries
x2
-1 1 x1
-1
Predict “ “ if
x2
x1
16
Cost function
To fit the parameters
Training set:
m examples
19
Logistic regression cost function
If y = 1
Cost
0 1 20
Logistic regression cost function
If y = 1
Cost
0 1 21
Logistic regression cost function
If y = 0
Cost
0 1 22
Logistic regression cost function
If y = 0 Cost = 0 if ,
But as
To fit parameters :
Want :
Repeat
𝒎
𝝏 𝟏
𝑱 ( 𝜽 )= ∑ ( 𝒉 𝜽 𝒙 − 𝒚 ) 𝒙 𝒋
( (𝒊 )
) ( 𝒊) (𝒊 )
𝝏𝜽 𝒎 𝒊=𝟏
28
Gradient Descent
Want :
Repeat
1+ 𝑒
Cost function
1
𝐽 ( 𝜃 )= ¿
𝑚
Gradient Descent
𝒎
𝟏
𝜃 𝑗 =𝜃 𝑗 − 𝛼 ∑ (𝒉 𝜽 ( 𝒙 ) − 𝒚 ) 𝒙 𝒋
( 𝒊) (𝒊 ) (𝒊)
𝒎 𝒊=𝟏
Multi-class classification
One-vs-all algorithm
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby
37
Binary classification: Multi-class classification:
x2 x2
x1 x1
38
One-vs-all (one-vs-rest):
x2
x1
Class 1:
Class 2:
Class 3:
39
One-vs-all
40
Regularization
41
The problem of overfitting
• So far we've seen a few algorithms
• Work well for many applications, but can suffer from
the problem of overfitting
42
Overfitting with linear regression
Example: Linear regression (housing prices)
Price
Price
Price
Size Size Size
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Addressing overfitting:
size of house
Price
no. of bedrooms
no. of floors
age of house
average income in neighborhood Size
kitchen size
• Plotting hypothesis is one way to decide whether
overfitting occurs or not
• But with lots of features and little data we cannot
visualize, and therefore:
• Hard to select the degree of polynomial
• What features to keep and which to drop
Addressing overfitting:
Options:
1. Reduce number of features. (but this means loosing
information)
― Manually select which features to keep.
― Model selection algorithm (later in course).
2. Regularization.
― Keep all the features, but reduce magnitude/values of
parameters .
― Works well when we have a lot of features, each of
which contributes a bit to predicting .
Cost function
47
Intuition
Price
Price
Price
regularization term)
• We get a much smoother curve
which fits the data and gives a
much better hypothesis
Size of house
λ is the regularization parameter
Controls a trade off between our two goals
1) Want to fit the training set well
2) Want to keep parameters small
In regularized linear regression, we choose to minimize
Size of house
Regularized linear regression
54
Regularized linear regression
Gradient descent 𝜕
𝐽 (𝜃)
𝜕𝜃0
Repeat
[ + 𝝀
𝒎
𝜽 𝒋 ]
(regularized)
Same as before
Interesting term:
Usually learning rate is small and m is large Ex.
Regularized logistic regression
57
Regularized logistic regression.
x2
x1
Cost function:
Gradient descent
Repeat
[ + 𝝀
𝒎
𝜽 𝒋 ]
(regularized)
End