Intro to Logistic Regression

Logistic Regression
Jacquelyn Victoria & Tamer Wahba
1

Slide Ownership
Jacquelyn Victoria - 3 to 9
Tamer Wahba - 10 to 15
2

Regression
Analysis +
Classification
How can we predict a nominal class
using regression analysis?
Consider a binary class:
Each instance x is a vector of feature
values
Our output values or class labels are
restricted to 0 or 1, i.e. f(x) ∈ {0, 1}
We need an h(x) where: 0 < h(x) < 1
We need a function which exhibits this
behavior
3

Logistic
Functions Sigmoid Function σ(x)
Asymptotes at y = 1 and y = 0
Easy to specify threshold (σ(0) = .5)
Results are P(y=1)
As a result:
Where θ is a vector of weights
4

Cost Function
Need to find hθ(x) that is a logistic
function that represents our data
Need to find θ to fit our data
-log(1-x)-log(x)
5

Gradient
Descent
In order to find the minimum, we can
use the partial derivative of J(θ)
do {
}until θ converges
Where α is the learning rate (almost
always between 0 and 1, .1-.3 usually
a good range)
6

Maximum Likelihood Estimation
7
do {
}until θ converges
Can also be calculated using:
Iteratively Reweighted Least Squares
Multinomial data uses Softmax Regression

Interpreting
hypothesis
8
Recall that σ(0) = .5 and that hθ(x) = σ(θTx)
x1
x2

Interpreting hθ
I want to create a model to give me the
probability that I will pass a test given how
many hours I have studied
Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50
Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1
Using this generated model, calculate my probability
of passing given I have studied 3 hours
P(passing| study time = 3) = .61
9source

Logistic
Regression
Compared to
Other Classifiers
Naive Bayes
Support Vector Machines
Decision Trees
10

vs Decision Tree
Assumptions
DT: decision boundaries parallel to axes
LR: one smooth boundary
Decision trees can be used when there are
multiple decision boundaries
11

Feature Weights
NB: each set independently depending on class
LR: together such that decision function tends to be high for positive classes and low for negative
classes
Correlated features have no effect on logistic regression
vs Naive Bayes
12

vs Support Vector Machine
13
Both attempt to find hyperplane separating training samples
SVM: find the solution with maximum margin
LR: find any solution that separates the instances
SVM is a hard classified while LR is probabilistic

Advantages
Works well with diagonal decision boundaries
Does not give undue weight to correlated
features
Probabilistic outcomes
14
Requires large sample size for stable results
Disadvantages

Use Cases
Categorical outcomes
Large sample data
Minimal preprocessing
15

For more info...
Helpful links to go into more
depth with Logistic Regression
Stanford Open Course (Logit
regression section)
Logit Regression Tutorial (exercises in
MATLAB)
Logit Regression Tutorial (no code)
How to use Logit Regression in Python
How to use Logit Regression in R
How to use Logit Regression in Java
using Weka
16

Intro to Logistic Regression

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Intro to Logistic Regression (20)

Recently uploaded (20)

Intro to Logistic Regression

Editor's Notes