Deep Generative Models
• Deep Generative Models : Boltzmann Machines, Deep
Belief networks, Deep Boltzmann Machine, Generative
Stochastic Networks, Generative Adversarial
networks, evaluating Generative Models Networks.
Introduction
• A Generative Model is a powerful way of
learning any kind of data distribution using
unsupervised learning and has achieved
tremendous success.
• All types of generative models aim at learning
the true data distribution of the training set so
as to generate new data points with some variations.
• Deep generative models (DGMs) are neural networks
with many hidden layers trained to approximate
complicated, high-dimensional probability
Introduction
• These models have gained significant attention in
recent years due to their ability to learn complex data
distributions and generate new samples from those
distributions.
• When trained successfully, we can use DGMs to estimate
the likelihood of each observation and create new samples
from the underlying distribution.
• The two most popular approaches for deep generative
modeling are:
1. Variational Autoencoders (VAE)
2. Generative Adversarial Networks (GAN).
Introduction
1. Variational Autoencoders (VAE):
VAEs are probabilistic graphical models rooted in
Bayesian inference. VAEs aim to learn a low-
dimensional latent representation of training data,
which can be used to generate new data points.
VAEs combine an encoder and a decoder network.
The encoder maps input data to a latent space, and the
decoder generates samples from this latent space.
VAEs are commonly used for generative tasks and
representation learning.
Introduction
2. Generative Adversarial Networks (GAN): GANs
consist of a generator and a discriminator.
• The generator generates data samples, while the
discriminator evaluates whether a given sample is
real or generated.
• The training process involves an adversarial game
between the generator and discriminator, leading to
the generator learning to produce realistic data
samples.
Differentiate between deterministic
and probabilistic neural networks
Feature Deterministic Neural Probabilistic Neural
Networks Networks
Output Type Fixed values Probability distributions
Weight Nature Fixed weights after Probabilistic weights
training
Prediction Consistent for the same
Consistency input Varies for the same input
Uncertainty Not available Provides uncertainty
Estimation estimation
Implementation Simpler More complex
Example Models FNNs, CNNs, RNNs BNNs, VAEs, Dropout-
based models
Use Cases General prediction and Tasks requiring
classification uncertainty handling
Boltzmann machine
• Boltzmann machine is designed to learn
probability distributions over its set of inputs.
• There are three key concepts to know about
Boltzmann machine:
1. Stochasticity : Unlike traditional deterministic
neural networks, Boltzmann machines incorporate
randomness.
• The state of each neuron (node) in the network is
determined probabilistically based on the states of
the neighboring neurons and a temperature
Boltzmann machine
2. Energy Function: The Boltzmann machine assigns an
energy to each possible state of the system.
• Lower energy states are more probable. The energy
function typically involves weights between nodes and
biases.
3. Equilibrium: The machine aims to reach a thermal
equilibrium where the distribution of states follows the
Boltzmann distribution.
• This distribution specifies that the probability of a system
being in a certain state decreases exponentially with the
Boltzmann machine
• A Boltzmann machine is essentially a fully connected,
two-layer neural network.
• These two layers represents as the visual and
hidden layers.
• The visual layer is analogous to the input layer in
feedforward neural networks.
• A Boltzmann machine has a hidden layer, it functions
more as an output layer.
• The Boltzmann machine has no hidden layer between
the input and output layers.
Boltzmann machine
• The basic units of a Boltzmann machine are binary
neurons that can be in one of two states: on (1) or off (0).
• There are two types of units in a Boltzmann machine:
• Visible units: Correspond to the input data.
• Hidden units: Capture dependencies and abstract
features that are not directly observed.
• Weights: Represent connections between pairs of units.
These can be symmetric (i.e., the weight from unit i to
unit j is the same as from unit j to unit i).
• Biases: Represent the threshold for each unit.
Boltzmann machine
• Figure below shows the very simple structure of a
Boltzmann machine:
• The above Boltzmann machine has three hidden neurons
and four visible neurons.
• A Boltzmann machine is fully connected because every
neuron has a connection to every other neurons. However,
no neuron is connected to itself.
Boltzmann machine
• Types of Boltzmann Machines
1. Restricted Boltzmann Machines (RBMs): A simplified
version of the Boltzmann machine where the network is
restricted to a bipartite graph, meaning there are no
connections within the visible units or the hidden units.
• The Figure below shows as an RBM which is not fully
connected. All hidden neurons are connected to each
visible neuron.
Boltzmann machine
• There are no connections among the hidden neurons nor
there are connections among the visible neurons.
2. Deep Belief Networks (DBNs): Composed of multiple
layers of RBMs. These networks can learn hierarchical
representations of the data.
3. Deep Boltzmann Machines (DBMs): A Deep Boltzmann
Machine (DBM) is an advanced type of Boltzmann machine
designed to model complex, high-dimensional data.
It extends the idea of a Restricted Boltzmann Machine (RBM)
by stacking multiple layers of hidden units, creating a deep
architecture that can capture intricate patterns and
dependencies in data.
Restricted Boltzmann machine (RBM)
• A Restricted Boltzmann Machine (RBM) is a simplified
version of a Boltzmann machine with certain
restrictions that make it easier to train and more
practical for many applications.
• Structure of Restricted Boltzmann Machines
A Restricted Boltzmann Machine
(RBM) is a generative,
stochastic, and 2-layer artificial
neural network that can learn a
probability distribution over its
set of inputs.
Restricted Boltzmann machine (RBM)
• Visible Units (V): These units represent the input
data. The number of visible units corresponds to the
number of features in the input data.
• Hidden Units (H): These units capture the
dependencies and patterns in the input data. The
number of hidden units is a hyper-parameter that
can be tuned.
• Weights (W): Each visible unit is connected to
every hidden unit with a symmetric weight. The
weight matrix W defines these connections.
Restricted Boltzmann machine (RBM)
• Biases: There are bias terms for both visible units ( 𝑎)
and hidden units (𝑏). These biases help in adjusting the
activation thresholds of the units.
• The restriction in a Restricted Boltzmann Machine is
that there is no intra-layer communication(nodes of
the same layer are not connected).
• Visible units are not connected to other visible units,
and hidden units are not connected to other hidden
units.
• This restriction allows for more efficient training
algorithm in the class of Boltzmann machines.
Restricted Boltzmann machine (RBM)
• Energy function in RBM
• The energy of a configuration (a state of visible and hidden
units) in an RBM is defined as:
• 𝐸(𝑣,ℎ) = − ∑ 𝑖 𝑎𝑖 𝑣𝑖 − ∑𝑗 𝑏𝑗 ℎ𝑗 − ∑𝑖,𝑗 𝑣𝑖 𝑊𝑖𝑗 ℎ𝑗
• where:
• 𝑣𝑖is the state of visible unit 𝑖,
• ℎ𝑗is the state of hidden unit j,
• 𝑎𝑖is the bias of visible unit 𝑖,
• 𝑏𝑗is the bias of hidden unit j,
• 𝑊𝑖𝑗is the weight between visible unit 𝑖 and hidden unit j.
Restricted Boltzmann machine (RBM)
• Probabilistic Activation
• The states of the units are binary (0 or 1) and are
activated probabilistically based on their energies.
• The probability that a hidden unit ℎ𝑗is activated (i.e.,
set to 1) given the visible units 𝑣 is:
• P ( hj= 1∣v ) = σ ( bj + ∑iviWij )
• Similarly, the probability that a visible unit 𝑣𝑖 is
activated given the hidden units ℎ is:
• P ( vi = 1∣h ) = σ ( ai + ∑j hj Wij)
Restricted Boltzmann machine (RBM)
• where 𝜎(𝑥) is the logistic sigmoid function:
• σ (x)=1 / 1+e−x1
Training RBMs
• Training an RBM involves adjusting the weights and
biases to minimize the difference between the
observed data distribution and the distribution
modeled by the RBM.
• The primary algorithm used for this purpose is
Contrastive Divergence (CD).
Restricted Boltzmann machine (RBM)
• Working of Restricted Boltzmann Machine
• RBM works in two biases
• The hidden bias helps the RBM produce the
activations on the forward pass, while
• The visible layer’s biases help the RBM learn the
reconstructions on the backward pass.
• Forward pass
• The following Figure shows the working of RBM in
forward pass.
Restricted Boltzmann machine (RBM)
• The forward pass is the first step in training an RBM with
multiple inputs.
• The inputs are multiplied by the weights and then added to
Restricted Boltzmann machine (RBM)
• The result is then passed through a sigmoid
activation function and the output determines if the
hidden state gets activated or not.
• Weights will be a matrix with the number of input
nodes as the number of rows and the number of
hidden nodes as the number of columns.
• The first hidden node will receive the vector
multiplication of the inputs multiplied by the first
column of weights before the corresponding bias term
is added to it.
Restricted Boltzmann machine (RBM)
• The sigmoid function is given by:
• So the equation that we get in this step would be,
• where h(1) and v(0) are the corresponding vectors
(column matrices) for the hidden and the visible
layers with the superscript as the iteration (v(0)
means the input that we provide to the network) and
a is the hidden layer bias vector.
Restricted Boltzmann machine (RBM)
• Backward pass
• The backward pass is the reverse or the
reconstruction phase.
• It is similar to the first pass but in the opposite
direction as shown below:
Restricted Boltzmann machine (RBM)
• Where v(1) and h(1) are the corresponding vectors
(column matrices) for the visible and the hidden layers
with the superscript as the iteration and a is the
visible layer bias vector.
Applications of RBM
• RBMs have been used in various applications,
including:
1. Dimensionality Reduction: Learning compact
representations of data.
2. Feature Learning: Extracting useful features from
raw data.
3. Collaborative Filtering: Building
recommendation systems.
4. Pre-training Deep Networks: Initializing the
weights of deep networks in a layer-wise manner.
Deep Belief Neural Networks
• A Restricted Boltzmann Machine (RBM) is a type of
generative stochastic artificial neural network that
can learn a probability distribution from its inputs.
• Deep belief networks, in particular, can be created
by “stacking” RBMs and fine-tuning the resulting
deep network via gradient descent and
backpropagation.
• DBF belong to the family of unsupervised learning
algorithms and are known for their ability to learn
hierarchical representations from data.
Deep Belief Neural Networks
• DBN vary in operation, unlike autoencoders and
RBMs work with raw input data whereas DBN operate
on an input layer with one neuron for each input
vector and go through numerous levels before
arriving at the final layer.
• The final outputs are produced using probabilities
acquired from earlier layers.
Deep Belief Neural Networks
• The Architecture of DBN
• The top two layers are the associative memory, and
the bottom layer is the visible units.
• The arrows pointing towards the layer closest to the
data point showing the relationships between all
lower layers.
Deep Belief Neural Networks
• Directed acyclic connections in the lower layers
translate associative memory to observable
variables.
• The lowest layer of visible units receives input data
as binary or actual data.
• Like RBM, there are no intralayer connections in DBN.
• The hidden units represent features that encapsulate
the data’s correlations.
• A matrix of proportional weights W connects two
layers.
Deep Belief Neural Networks
• The “Input Layer” represents the initial layer, which has one
neuron for each input vector.
• “Hidden Layer 1” is the first layer of Restricted Boltzmann
Machine (RBM), which learns the fundamental structure of the
Deep Belief Neural Networks
• “Hidden Layer 2” and subsequent layers are additional
RBMs that learn higher-level features as we move through
the network.
• We can have multiple hidden layers depending on the
complexity of the task.
• “Output Layer” is used for supervised learning tasks like
classification or regression.
• The arrows indicate the flow of information from one layer
to the next, and the connections between neurons in
adjacent layers represent the weights that are learned
Deep Belief Neural Networks
• Training the RBMs:
• One of the unique aspects of DBNs is that each RBM
is trained independently using a technique
called contrastive divergence.
• This method allows us to approximate the gradient
of the log-likelihood of the data with respect to the
RBM’s parameters.
• After training, the output of one RBM becomes
the input for the next, creating a stacked
structure of RBMs.
Deep Belief Neural Networks
• Fine-Tuning for Supervised Learning:
• After the DBN has been assembled through the training of
its RBMs, it can be fine-tuned for supervised learning tasks.
• This fine-tuning process entails adjusting the
weights of the final layer using supervised learning
techniques like backpropagation.
• DBNs have gained popularity for their impressive
performance across various applications.
• From image and speech recognition to natural
language processing, they have consistently
delivered state-of-the-art results.
Deep Belief Neural Networks
• One of the main advantages of DBNs is their ability to
learn features from the data in an unsupervised manner.
• 1. A hierarchical representation of the data can also be
learned by DBNs, with each layer learning increasingly
sophisticated features from lower layers to higher layers.
• 2. DBNs have proven to be resistant to overfitting issue
due to model regularisation and by just using a small
amount of labelled data during the fine-tuning phase.
• 3. The capacity of DBNs to manage missing data that
happens frequently in many real-world applications for
some data to be corrupted or absent.
Deep Boltzmann Machine
• A Deep Boltzmann Machine (DBM) is a type of
generative stochastic neural network that is used in
deep learning to model complex distributions over
high-dimensional data.
• It is an extension of the Boltzmann Machine and
Restricted Boltzmann Machine (RBM) that introduces
multiple layers of hidden units, allowing it to capture
intricate patterns and dependencies in the data.
• A DBM analyzes data and learns how to produce new
examples that are similar to the original data.
Architecture of DBM
• The key concepts are :
• Architecture:
1. Visible Layer: This
layer represents the
observed data.
2. Hidden Layers: These
are multiple layers of
hidden units (neurons)
that interact with each
other and with the visible
Architecture of DBM
• Energy-Based Model:
• DBMs define a joint probability distribution over
visible and hidden variables using an energy function.
• The energy of a state (combination of visible and
hidden units) determines its probability.
• The energy function for a DBM with two hidden layers
can be written as:
• E(v,h1,h2)=−i∑vibi−j∑hj1bj1−k∑hk2bk2−ij∑viWijhj1
−jk∑hj1Wjk′hk2
Architecture of DBM
• Where v is the visible layer, h1 and h2 are the hidden
layers, b, b1, and b2 are biases, and W and W′ are
the weights connecting the layers.
• Training:
• Greedy Layer-Wise Training: Initially, each layer is
trained as an RBM, layer by layer. This simplifies the
learning process.
• Fine-Tuning: After pre-training, the entire network is
fine-tuned using algorithms like Stochastic Gradient
Descent (SGD) to adjust weights and minimize the
Architecture of DBM
• Contrastive Divergence: Often used in training, it
approximates the gradients needed to update the
weights.
• Advantages:
• Representation Power: With multiple layers, DBMs
can learn deep hierarchical representations of data,
capturing complex dependencies.
• Generative Capability: DBMs can generate new
samples from the learned distribution, making them
useful for tasks like data generation and reconstruction.
Architecture of DBM
• Feature Learning: They can learn useful features
for tasks such as classification, making them
versatile in various applications.