Regression
Regression
Lecture 9
Linear Discriminant Analysis
Dr. Jianjun Hu
mleg.cse.sc.edu/edu/csce833
Linear discriminant:
d
gi x | wi ,wi 0 wiT x wi 0 wij x j wi 0
j 1
Advantages:
Simple: O(d) space/computation
Knowledge extraction: Weighted sum of attributes;
positive/negative weights, magnitudes (credit scoring)
Optimal when p(x|Ci) are Gaussian with shared cov
matrix; useful when classes are (almost) linearly
separable
Lecture Notes for E Alpaydın 2004
4 Introduction to Machine Learning © The
MIT Press (V1.1)
Generalized Linear Model
Quadratic discriminant:
wT x w 0
C 1 if g x 0
choose
C 2 otherwise
Choose C i if
K
gi x max g j x
j 1
Classes are
linearly separable
0 if x C i
gij x 0 if x C j
don' t care otherwise
choose C i if
j i ,gij x 0
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine
From Discriminants to
Posteriors
When p (x | Ci ) ~ N ( μi , ∑)
T
gi x | wi ,wi 0 wi x wi 0
1 T 1
1
wi μi wi 0 μi μi log P C i
2
y P C 1 | x and P C 2 | x 1 y
y 0.5
chooseC 1 if y / 1 y 1 and C 2 otherwise
log y / 1 y 0
Lecture Notes for E Alpaydın 2004
10 Introduction to Machine Learning © The
MIT Press (V1.1)
P C 1 | x P C 1 | x
logitP C 1 | x log log
1 P C 1 | x P C 2 | x
p x | C 1 P C 1
log log
p x | C 2 P C 2
2 d / 2 1 / 2 exp 1 / 2x μ 1 1 x μ 1
T
log P C
1
log
2 d / 2 1 / 2 exp 1 / 2x μ 2 1 x μ 2
T
P C
2
wT x w 0
1
where w 1 μ 1 μ 2 w 0 μ 1 μ 2 T 1 μ 1 μ 2
2
The inverse of logit
P C 1 | x
log wT x w 0
1 P C 1 | x
1
P C 1 | x sigmoid wT x w 0
1 exp wT x w 0
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine
Sigmoid (Logistic) Function
Gradient-descent:
Starts from random w and updates w iteratively
in the negative direction of gradient
E (wt)
E (wt+1)
wt wt+1
14
η
Lecture Notes for E Alpaydın 2004
Introduction to Machine Learning © The
MIT Press (V1.1)
Logistic Discrimination
1
y P̂ C 1 | x
Lecture Notes for E Alpaydın 2004
1 exp wT x w 0
15 Introduction to Machine Learning © The
MIT Press (V1.1)
Training: Two Classes
X xt ,r t t r t | xt ~ Bernoulli y t
1
y P C 1 | x
1 exp wT x w 0
| X y 1 y
t t
l w,w 0 t r t 1 r
E logl
E w,w 0 | X r t log y t 1 r t log 1 y t
t
dy
If y sigmoid a y 1 y
da
E r t 1 r t t
w j
w j
t
t
y 1
y t
x t
j
t y 1 y
r t y t xtj , j 1,...,d
t
E
w 0
w 0
r t y t
t
Lecture Notes for E Alpaydın 2004
17 Introduction to Machine Learning © The
MIT Press (V1.1)
Lecture Notes for E Alpaydın 2004
18 Introduction to Machine Learning © The
MIT Press (V1.1)
100 1000
10
p x | C i
log wiT x wio0
p x | C K softmax
y P̂ C i | x
exp wiT x wi 0 ,i 1,...,K
j 1
K
exp wT
j x wj0
t
wLecture
j r
Notes
forj E Alpaydın
t
y t
j xt
w
2004 j 0
j j
r t
y t
20 t to Machine Learning © The
Introduction t
MIT Press (V1.1)
Lecture Notes for E Alpaydın 2004
21 Introduction to Machine Learning © The
MIT Press (V1.1)
Example
l w,w 0 | X
1
exp
r y
t t 2
t 2 2 2
1
E w,w 0 | X r t y t
2 t
2
w Notes
Lecture
for
r t EAlpaydın
y t y t 2004
1 yt x t
1 if xt
C1
t t
t
X x ,r t wherer t
1 if x C2
find w and w 0 such that
wT xt w 0 1 for r t 1
wT xt w 0 1 for r t 1
which can be rewritten as
r t wT xt w 0 1
1 N N
2
w
2
r w x w t
0
t
t T t
t 1 t 1
L p N
0 w t r t xt
w t 1
L p N
w 0
0 r 0
t t
t 1
1 T
w w t
2
t
1 t T s
r r x x t
2 t s
t s t s
t
Soft error
t
t
New primal is
1 2
Lp w C t t t t r t wT xt w 0 1 t t t t
2 Lecture Notes for E Alpaydın 2004
30 Introduction to Machine Learning © The
MIT Press (V1.1)
Kernel Machines
g x w φx r φx
T t t
φx
t T
g x tr t K xt , x
t
Lecture Notes for E Alpaydın 2004
31 Introduction to Machine Learning © The
MIT Press (V1.1)
Kernel Functions
Polynomials of degree q: K x , x x x 1 t
T t
q
K x, y x y 1
T
2
x1y1 x2y 2 1
2
xt x 2
Radial-basis functions:
K xt , x exp
2
Sigmoidal functions:
K xt , x tanh2xT xt 1
32
(Cherkassky and Mulier, 1998)
Lecture Notes for E Alpaydın 2004
Introduction to Machine Learning © The
MIT Press (V1.1)
SVM for Regression