Chap2slides - Copy
Chap2slides - Copy
Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY
DEEP LEARNING
ACCURACY
CONVENTIONAL
MACHINE LEARNING
AMOUNT OF DATA
LOSS
0 1
VALUE OF W X FOR
POSITIVE CLASS INSTANCE
HIDDEN LAYER
(RBF ACTIVATION)
INPUT LAYER
x1
OUTPUT LAYER
x2 y
x3
BIAS NEURON
+1 (HIDDEN LAYER)
– If you have less data with noise, you want to use conven-
tional machine learning.
INPUT NODES
W
SQUARED LOSS
OUTPUT NODE
LINEAR ACTIVATION
• For yi ∈ {−1, +1}, we use same loss of (yi − ŷi)2, and update
of W ⇐ W + α (y − ŷ ) X .
i i i
delta
INPUT NODES
W
LOG LIKELIHOOD
OUTPUT NODE
ŷ = PROBABILITY OF +1
y = OBSERVED VALUE
SIGMOID ACTIVATION (+1 OR -1)
yiXi
W ⇐W +α
1 + exp[yi(W · Xi)]
Interpreting the Logistic Update
• This factor is 1 − ŷi for positive instances and ŷi for negative
instances ⇒ Probability of mistake!
• Interpret as: W ⇐ W +α Probability of mistake on (Xi , yi) (yi Xi)
Comparing Updates of Different Models
4
PERCEPTRON (SURROGATE)
3.5 WIDROW−HOFF/FISHER
SVM HINGE
3 LOGISTIC
2.5
DECISION
2 BOUNDARY
PENALTY
1.5
0.5
−1
−3 −2 −1 0 1 2 3
PREDICTION= W.X FOR X IN POSITIVE CLASS
• All the models discussed so far discuss only the binary class
setting in which the class label is drawn from {−1, +1}.
• Cross entropy loss is −vc) + log[ kj=1 exp(vj )]
Loss Derivative of Softmax
• Differentiate loss value of −vc + log[ kj=1 exp(vj )]
vi = Wi X
W1 v1
∑ ŷ1 = exp(v1)/[∑exp(vi)]
v2
ŷ2 = exp(v2)/[∑exp(vi)]
X
W2
∑ TRUE CLASS
ŷ3 = exp(v3)/[∑exp(vi)]
W3
∑ v3 SOFTMAX
LAYER
LOSS = -LOG(- ŷ2)
x3 xI3
x4
xI4
RECONSTRUCTEDDATA
ORIGINALDATA
ENCODER DECODER
CODE
(MULTILAYERNEURAL (MULTILAYERNEURAL
NETWORK) NETWORK)
FUNCTION F(.) FUNCTIONG(.)
F(X)
CONSTRICTED
LAYERSIN XI =(G oF)(X)
X
MIDDLE
D ≈ UV T
x3 xI3
x4 xI4
x5 xI5
OUTPUT OF THIS LAYER PROVIDES
REDUCED REPRESENTATION
1.2
0.6
1 POINT C
0.4 POINT B
POINT A
0.8
0.2
POINT C POINT A
0.6
0
0.4
−0.2
0.2
−0.4
POINT B
0
−0.6
−0.2
1.5
0.6
1 1 0.4 5
0.5 0.2
0.5
0
0 0 −0.2 0
−0.5 −0.5 −0.4
−0.6
−1 −1 −5
USERS ITEMS
E.T.
4
ALICE
0 VT
U NIXON
MISSING
BOB
0
SHREK
5
SAYANI
1
GANDHI
MISSING
JOHN
0
NERO
MISSING
E.T. E.T.
4 2
ALICE ALICE
0 0
NIXON
5
BOB BOB
0 1
SHREK
5
SAYANI SAYANI
1 0
GANDHI
4
JOHN JOHN
0 0 NERO
3
OBSERVED RATINGS (SAYANI): E.T., SHREK OBSERVED RATINGS (BOB): E.T., NIXON, GANDHI, NERO
• For k hidden nodes, there are k paths between each user and
each item identifier.
• If you have binary data, you can add logistic outputs for
logistic matrix factorization.
V=[vqj]
p X d matrix y1d
x1 yj1
x2 yj2
x3 h1 yj3
h2
U=[ujq] V=[vqj]
d X p matrix p X d matrix
hp
xd yjd
V=[vqj] ym1
p X d matrix ym2
ym3
ymd
x1 yj1
x2 yj2
x3 h1 yj3
h2
U=[ujq] V=[vqj]
d X p matrix p X d matrix
hp
xd yjd
USERS ITEMS
E.T.
4
ALICE
0 VT
U NIXON
MISSING
BOB
0
SHREK
5
SAYANI
1
GANDHI
MISSING
JOHN
0
NERO
MISSING