Introduction To ANN
Introduction To ANN
Dr.Srinath.S
Pre - Requisite
Linear Algebra
Elementary Probability and Statistics
Machine Learning / Pattern Recognition.
Programming skills – Python preferred
Dr.Srinath.S
Course Outcomes
After completing this course, students should be able
to:
Dr.Srinath.S
Text/Reference Books/web
resource/CO mapping
Text Book:
Sl.
Author/s Title Publisher Details
No.
1 Aurelien Geron Hands on Machine Learning with O’Reilly, 2019
Scikit-Learn &TensorFlow
Reference Books:
Sl.
Author/s Title Publisher Details
No.
1 Lan Good fellow and Deep Learning MIT Press2016
Yoshua Bengio and
Aaron Courville
2 Charu C. Aggarwal Neural Networks and Deep Springer International
Learning Publishing, 2018
3 Andrew W. Trask Grokking Deep Learning Manning Publications
4 Sudharsan Hands-On Deep Learning --
Ravichandran Algorithms with Python
Web Resources:
Sl. No. Web link
1 https://onlinecourses.nptel.ac.in/noc20_cs62/preview
2 https://nptel.ac.in/courses/106/105/106105215/
CO-1 1 2 3 1 3 3 1 3 3 3 3 3 2 3 3 3
CO-2 3 3 3 3 3 2 2 2 1 2 3 2 1 2 3 2
CO-3 3 1 2 3 1 2 2 2 3Dr.Srinath.S
3 3 2 3 2 3 3
0 -- No association 1---Low association, 2--- Moderate association, 3---High association
Assessment Weightage in Marks
Class Test –I 10
Quiz/Mini Projects/ Assignment/ seminars 10
Class Test – II 10
Quiz/Mini Projects/ Assignment/ seminars 10
Class Test – III 10
Total 50
Dr.Srinath.S
Question Paper Pattern
Semester End Examination (SEE)
Semester End Examination (SEE) is a written examination of three hours
duration of 100 marks with 50% weightage.
Note:
• The question paper consists of TWO parts PART- A and PART- B.
Dr.Srinath.S
Source:
Material is based on Hands-On Machine
Learning with Scikit_Learn and TensorFlow:
Concepts, Tools and Techniques (by Aurelien
Geron), Wikipedia, and other sources.
Dr.Srinath.S
UNIT – 1 Introduction to ANN:
Dr.Srinath.S
Quick look into ML
Dr.Srinath.S
MACHINE LEARNING
Dr.Srinath.S
Associate Professor, Department of CS&E
Sri Jayachamarajendra College of Engineering,
JSS Science and Technology University, Mysuru- 570006
Introduction
Artificial Intelligence (AI)
Machine Learning (ML)
Deep Learning (DL)
Data Science
Dr.Srinath.S
Artificial Intelligence
Artificial intelligence is intelligence demonstrated by
machines, as opposed to natural intelligence
displayed by animals including humans.
Dr.Srinath.S
Machine Learning
Machine Learning – Statistical Tool to explore the data.
Dr.Srinath.S
Variants of Machine Learning:
Supervised
Unsupervised
Semi supervised
Reinforcement Learning
Dr.Srinath.S
Deep Learning
It is the subset of ML, which mimic human brain.
Dr.Srinath.S
Summary:
Dr.Srinath.S
Introduction to ANN
Dr.Srinath.S
Biological Neural Network to ANN
Dr.Srinath.S
Biological Neural Network (BNN)
Dr.Srinath.S
BNN parts
BNN is composed of a cell body and many branching
extensions called dendrites and one long extension
called the axon.
Primarily the parts of BNN are:
Cellbody
Dendrites – Input part
Axon - output
Dr.Srinath.S
At the end AXON splits off into many branches
called telodendrion and the tip of these branches
are called synaptic terminals or simply synapses.
The synapses of one neurons are connected to the
dendrites of other neurons.
Electric impulses called signals are passed from one
neuron to another.
BNN is a collection of billions of neurons, and each
neurons are typically connected to thousands of
other neurons.
Dr.Srinath.S
Another view of BNN interconnection
Dr.Srinath.S
Multiple layers in a biological network
Dr.Srinath.S
Artificial Neural Network (ANN)
Dr.Srinath.S
Logical Computations with Neurons
Dr.Srinath.S
ANNs performing simple logical
computations
Dr.Srinath.S
The first network on the left is simply the identity function: if neuron A is
activated, then neuron C gets activated as well (since it receives two input
signals from neuron A), but if neuron A is off, then neuron C is off as well.
The third network performs a logical OR: neuron C gets activated if either
neuron A or neuron B is activated (or both).
Dr.Srinath.S
Perceptron consists of 4 parts
input values
weights and a Constant/Bias
a weighted sum, and
Step function / Activation function
Dr.Srinath.S
Linear threshold unit (LTU)
Dr.Srinath.S
A perceptron can have multiple input and single
output as shown in the previous diagram (single
LTU).
Perceptron is simply composed of a single layer of
LTUs.
For example a 2 input and 3 output perceptron is
as shown in the next slide.
However a single layer perceptron will not have
hidden layer.
Dr.Srinath.S
Dr.Srinath.S
Working of Perceptron
The perceptron works on these simple steps:
Dr.Srinath.S
Step activation function
Dr.Srinath.S
Comparison between BNN and ANN
Dr.Srinath.S
Equation for the perceptron learning rule
Dr.Srinath.S
Perceptron learning rule (weight update)
Dr.Srinath.S
A perceptron is simply composed of a single layer
of LTUs, with each neuron connected to all the inputs.
Dr.Srinath.S
MLP : Simplified view
Dr.Srinath.S
Example for ANN
Dr.Srinath.S
Shallow or Deep ANN
MLP can be either shallow or deep.
They are called shallow when they have only one
hidden layer (i.e. one layer between input and
output).
They are called deep when hidden layers are more
than one. (Two or more)
This is where the expression DNN (Deep Neural
Network) comes.
So DNN is a variant of ANN having 2 or more hidden
layer.
Dr.Srinath.S
Summary
Perceptron: It is a ANN with single layer neural network
without having hidden layers. It will have only input and
output layer.
Dr.Srinath.S
Example: Neural Network to find whether the given input is
Square or circle or triangle
Dr.Srinath.S
Dr.Srinath.S
Dr.Srinath.S
Dr.Srinath.S
CNN
CNN (Convolutional Neural Network):
They are designed specifically for computer vision (they
are sometimes applied elsewhere though).
Their name come from convolutional layers.
They have been invented to receive and process pixel
data.
Dr.Srinath.S
RNN
RNN (Recurrent Neural Network):
They are the "time series version" of ANNs.
They are meant to process sequences of data.
They are at the basis of forecast models and language models.
Dr.Srinath.S
Backward propagation
Measure the network’s output error ( difference between
actual and obtained)
Tweak the waits to correct or to reduce the error.
Move from output layer to input layer one step at a time.
Compute how much each neuron in the last hidden layer
contributed to each output neuron’s error.
Later it moves to the next hidden layer in the reverse
direction till the input layer and keeps updating the waits.
Tweaking the weights to reduce the error is called
gradient descent step
Dr.Srinath.S
Summary… and moving toward
activation functions
Dr.Srinath.S
Linear and Non-linear part of neuron
Dr.Srinath.S
Need of activation function:
They are used in the hidden and output layers.
Activation function is a function that is added
into an artificial neural network in order to help
the network learn complex patterns in the
data.
The activation function will decide what is to be
fired to the next neuron
Dr.Srinath.S
Can ANN work without an activation function?
3. Linear function
5. Leaky ReLU
6. Tanh
7. Sigmoid
8. softmax
Dr.Srinath.S
1. Step Function
Dr.Srinath.S
2. Sign function
Dr.Srinath.S
3. Linear function
Dr.Srinath.S
4. ReLU function
Dr.Srinath.S
ReLU (Rectified Linear Unit)
It will produce the same output for +ve value, and 0
for all –ve values.
Dr.Srinath.S
5. Leaky Rectified Linear Unit
Dr.Srinath.S
Leaky ReLU
Leaky Rectified Linear Unit, or Leaky ReLU,
is a type of activation function based on a
ReLU, but it has a small slope for negative
values instead of a flat slope.
Dr.Srinath.S
6. Tanh: Hyperbolic Tangent :
any value between -1 to +1.
Dr.Srinath.S
7 Sigmoid Function
Dr.Srinath.S
Sigmoid: It is used for classification
Dr.Srinath.S
8. SoftMax function : It is the variant of
sigmoid function with multi class classification
Dr.Srinath.S
Logistic Regression
Linear Regression is used to handle
regression problems whereas Logistic
regression is used to handle the
classification problems.
Linear regression provides a continuous
output but Logistic regression provides
discreet output
Dr.Srinath.S
Compare linear vs Logistic regression
Dr.Srinath.S
Linear Regression
• Linear Regression is one of the most simple
Machine learning algorithm that comes under
Supervised Learning technique and used for
solving regression problems.
• It is used for predicting the continuous dependent
variable with the help of independent variables.
• The goal of the Linear regression is to find the best
fit line that can accurately predict the output for
the continuous dependent variable
Dr.Srinath.S
Logistic Regression
• Logistic regression is one of the most popular
Machine learning algorithm that comes under
Supervised Learning techniques.
• It can be used for Classification as well as for
Regression problems, but mainly used for
Classification problems.
• Logistic regression is used to predict the
categorical dependent variable with the help of
independent variables.
Dr.Srinath.S
Training an MLP with TensorFlow’s
Training (60%)
Validation (20%)
Testing (20%)
Dr.Srinath.S
Tensorflow and Scikit-learn (SK learn)
Dr.Srinath.S
How to work with DL algorithms?
You need a programming language, preferred is
Python.
Lot of libraries available including Tensorflow.
Others are Keras, Theano, torch, DL4J
Tensorflow is from google and Keras is now
embedded into Tensorflow.
Tensorflow also supports traditional ML algorithms
also.
Dr.Srinath.S
What is Tensor Flow?
Dr.Srinath.S
What is Tensorflow?
It is from google
It was originally developed for large numerical
computations.
Later it was introduced with ML and DL algorithms
It accepts data in multidimensional array called
“Tensor”
Dr.Srinath.S
Tensorflow works on the basis of
Dataflow graphs
Dr.Srinath.S
In tensor flow graphs are created and
are executed by creating sessions
Dr.Srinath.S
All the external data at fed into what is know as
placeholder, constants and variables.
To summarize, Tensorflow starts building a
computational graph and in the next step it
executing the computational graph.
Dr.Srinath.S
Tensors
Dr.Srinath.S
Ranks (dimensions) of tensor
Dr.Srinath.S
Why to use TensorFlow?
Dr.Srinath.S
Components of Tensorflow: Constants
Variables
Placeholders
Dr.Srinath.S
Variables
In variables, V must be in capital letters.
Value of the variable can be changed.. But not of
constant.
Dr.Srinath.S
Placeholder
They are used to feed the data from outside.
Say from a file, from image file, CSV file and so on.
Feed_dict id popularly used to feed the data to the
placeholder.
Dr.Srinath.S
Constants, Variable and placeholder…
Crate Graph using the above, then you will have
session and session object and run it.
Every computation you perform is a node in a
graph.
Initially tf object is created, which is the default
graph, which will not have any constant, variable …
Dr.Srinath.S
Running a session in Tensorflow
Multiplication of ‘a’ and ‘b’ are done while running
the session (last statement)
Dr.Srinath.S
Tensor flow – where to execute?
TensorFlow is already pre-installed
When you create a new notebook on
colab.research.google.com, TensorFlow is
already pre-installed and optimized for the
hardware being used. Just import tensorflow
as tf , and start coding.
Dr.Srinath.S
Tensor flow can also be executed in
Jupyter Notebook
Inside the notebook, you can import
TensorFlow in Jupyter Notebook with the tf
alias. Click to run.
Dr.Srinath.S
Training an MLP with TensorFlow’s
The simplest way to train an MLP with TensorFlow is
to use the high-level API TF.Learn.
The DNNClassifier class makes it trivial to train a
deep neural network with any number of hidden
layers, and a softmax output layer to output
estimated class probabilities.
For example, the following code trains a DNN for
classification with two hidden layers (one with 300
neurons, and the other with 100 neurons) and a
softmax output layer with 10 neurons.
Dr.Srinath.S
Piece of code of for training MLP
fdsfdf
tf is tensorflow
Code creates set of real valued columns from the training set.
Then create the DNNClassifier, with two hidden layers of 300 and 100 neurons
and with output layer of 10 neurons.
Finally program is run for 40,000 epochs in a batch of 50.
Dr.Srinath.S
Fine tuning NN Hyper Parameters -
Up and Running with TensorFlow
In a simple MLP you can change the number of
layers, number of neurons per layer, the type of
activation function and also the weight initialization
logic.
The above are the Hyper Parameters to be fine
tuned in a Neural Network.
Dr.Srinath.S
Number of Hidden Layers
For many problems, you can just begin with a single hidden
layer and you will get reasonable results.
It has actually been shown that an MLP with just one hidden
layer can model even the most complex functions provided
it has enough neurons.
For a long time, these facts convinced researchers that
there was no need to investigate any deeper neural
networks.
But they overlooked the fact that deep networks have a
much higher parameter efficiency than shallow ones.
They can model complex functions using exponentially
fewer neurons than shallow nets, making them much faster
to train
Dr.Srinath.S
Number of hidden layers..cont
Very complex tasks, such as large image
classification or speech recognition, typically
require networks with dozens of layers (or even
hundreds) and they need a huge amount of training
data.
However, you will rarely have to train such networks
from scratch: it is much more common to reuse parts
of a pretrained state-of-the-art network that
performs a similar task. Training will be a lot faster
and require much less data
Dr.Srinath.S
Number of Neurons per Hidden Layers
Usually the number of neurons in the input and output layers
is determined by the type of input and output your task
required.
For the hidden layer the common practice is to size them to
form a funnel, with fewer and fewer neurons at each layer.
For example a typical neural network may have two hidden
layers, the first with 300 neurons and the second with 100.
However, this practice is not as common now, and you may
simply use the same size for all hidden layers; for example
all hidden layers with 150 neurons.
Neurons can be gradually increased until the network starts
overfitting.
Dr.Srinath.S
Activation Functions
In most of the cases you can use the ReLU activation
function in the hidden layers. It is a bit faster to
compute than other activation functions.
For the output layer, the softmax activation function
is generally a good choice for classification tasks.
Dr.Srinath.S
End of Unit - 1
Dr.Srinath.S