Greedy-Layerwise in Deep Learning
Greedy-Layerwise in Deep Learning
Greedy Layer-wise
Unsupervised Pretraining
Sargur N. Srihari
[email protected]
1
Deep Learning Srihari
Unsupervised Pretraining
4
Deep Learning Srihari
Greedy Algorithm
• Greedy algorithms break a problem into many
components, then solve for the optimal version
of each component in isolation
• Unfortunately, combining the individually
optimal components is not guaranteed to yield
an optimal complete solution
5
Deep Learning Srihari
6
Deep Learning Srihari
8
Deep Learning Srihari
9
Deep Learning Srihari
Formal Algorithm
Algorithm: Greedy Layer-wise Unsupervised Pretraining Protocol
– Given unsupervised feature learning algorithm L
• Which takes as input a training set of examples
and returns an encoder or feature function f
– Raw input data is X, with one row per example,
f (1)(X) is output of the first stage encoder on X
– In the case where fine tuning is performed
we use
a learner T which takes an initial function f,
input examples X
(and in the supervised fine-tuning case,
associated targets Y) and
returns a tuned function.
The no of stages is m
10
Deep Learning Srihari
14
Deep Learning Srihari
Learning trajectories
• Each point refers to a neural network at a
particular time in its training
• Pretraining accesses new part of space:
– With pretraining: halt in one region of function
space
– Without pretraining: another region