100% found this document useful (1 vote)

126 views

ML unit-2

Machine learning jntuh R22

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

126 views

ML unit-2

Machine learning jntuh R22

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

R22 Machine Learning Lecture Notes

UNIT-II
Multi-Layer Perceptron: Going Forwards, Going Backwards, Back Propagation Error, Multi-
layer perceptron in practice, Examples of using the MLP, Deriving Back-propagation.
Radial Basis Functions and Splines: Concepts, RBF Network, Curse of Dimensionality,
Interpolations and Basis Functions, Support Vector Machine

Multi-Layer Perceptron (MLP):

 The Multilayer Perceptron is a neural network where the mapping between inputs and
output is non-linear.
 A Multilayer Perceptron has input and output layers, and one or more hidden layers
with many neurons stacked together.
 Multilayer Perceptron can use any arbitrary activation function.
 Multilayer Perceptron falls under the category of feedforward algorithms, because
inputs are combined with the initial weights in a weighted sum and subjected to the
activation function just like in the Perceptron.
 But the difference is that each linear combination is propagated to the next layer.
 Each layer is feeding the next one with the result of their computation, their internal
representation of the data. This goes all the way through the hidden layers to the output
layer.

Example - XOR Problem:

1
R22 Machine Learning Lecture Notes

Input: (0,1)
A=1, B=0
At Neuron C:
1x1+0x1+1x(-0.5)=1+0-0.5=0.5 > Threshold 0
Neuron C Fires, so output is 1
At Neuron D:
1x1+0x1+1x(-1)=1+0-1=0
Neuron D does not fire, so output is 0
At Neuron E:
1x1+0x(-1)+1x(-0.5)=1-0-0.5=0.5 > Threshold 0
Neuron E fires, so output is 1.
Going Forwards:
 Training the MLP consists of two parts: working out what the outputs are for the given
inputs and the current weights, and then updating the weights according to the error, which
is a function of the difference between the outputs and the targets.
 These are generally known as going forwards and backwards through the network.
 Each neuron in the network (whether it is a hidden layer or the output) has one extra input,
with fixed value is called bias.
Going Backwards- Back Propagation of Error:
 Back-propagation of error makes it clear that the errors are sent backwards through the
network.
 It is a form of gradient descent.
 The problem is that when we try to adapt the weights of the Multi-layer Perceptron, we
have to work out which weights caused the error.
 This could be the weights connecting the inputs to the hidden layer, or the weights
connecting the hidden layer to the output layer.
 We use sum-of-squares error function, which calculates the difference between y and t for
each node, squares them, and adds them all together.

Where y is output, t is target and N is the number of output nodes.

 If we differentiate a function, then it tells us the gradient of that function, which is the
direction along which it increases and decreases the most.
 If we differentiate an error function, we get the gradient of the error.
 The purpose of learning is to minimise the error, we follow the error function downhill.

2
R22 Machine Learning Lecture Notes

 We need an activation function that looks like a threshold function but is differentiable
so that we can compute the gradient.
Activation Functions:
 The activation function basically decides whether a neuron should be activated or not.
 The activation function is a non-linear transformation that we do over the input before
sending it to the next layer of neurons or finalizing it as output.
Sigmoid Function:

 The Sigmoid activation function, also known as the logistic activation function,
takes inputs and turns them into outputs ranging between 0 and 1.
 For this reason, sigmoid is referred to as the “squashing function” and is
differentiable.
 Larger, more positive inputs should produce output values close to 1.0, with
smaller, more negative inputs producing outputs closer to 0.0.

3
R22 Machine Learning Lecture Notes

Hyperbolic Tangent Function:

 Tanh function is very similar to the sigmoid/logistic activation function, and even
has the same S-shape with the difference in output range of -1 to 1.
 In Tanh, the larger the input (more positive), the closer the output value will be to
1.0, whereas the smaller the input (more negative), the closer the output will be to
-1.0.

Multi-Layer Perceptron Algorithm:

1. An input vector is put into the input nodes
2. The inputs are fed forward through the network
o The inputs and the first-layer weights (here labelled as v) are used to decide whether
the hidden nodes fire or not.

4
R22 Machine Learning Lecture Notes

o The activation function g(·) is the sigmoid function

o The outputs of these neurons and the second-layer weights (labelled as w) are used
to decide if the output neurons fire or not
3. The error is computed as the sum-of-squares difference between the network outputs and the
targets
4. This error is fed backwards through the network in order to
o First update the second-layer weights and then afterwards, the first-layer weights

5
R22 Machine Learning Lecture Notes

Improvements for MLP Algoritthm:

Initializing the weights:
 The MLP algorithm suggests that the weights are initialised to small random numbers,
both positive and negative.

 A common trick is to set the weights in the range where n

is the number of nodes in the input layer to those weights.
 We use random values for the initialisation so that the learning starts off from different
places for each run, and we keep them all about the same size because we want all of
the weights to reach their final values at about the same time. This is known as uniform
learning
Different Output Activation Functions:
 Sigmoid activation function in the output layer is fine for Binary classification
problems.
 For regression problems, we use linear activation function in the output layer.
 For multi class classification, we use softmax activation function.

Sequential and Batch Training:

 In Batch Training, we only update the weights once for each iteration of the algorithm,
which means that the weights are moved in the direction that most of the inputs want
them to move, rather than being pulled around by each input individually.
 The batch method performs a more accurate estimate of the error gradient, and will thus
converge to the local minimum more quickly.
 In Sequential Training, where the errors are computed and the weights updated after
each input.
 This is not guaranteed to be as efficient in learning, but it is simpler to program.
 Since it does not converge as well, it can also sometimes avoid local minima, thus
potentially reaching better solutions.
Local Minima:
 A loss function is a function that measures the error between a model’s predictions and
the ground truth.
 The goal of machine learning is to find a model that minimizes the loss function.

6
R22 Machine Learning Lecture Notes

 A local minimum is a point in the parameter space where the loss function is minimized
in a local neighborhood.
 A global minimum is a point in the parameter space where the loss function is
minimized globally.

Picking Up Momentum:
 Momentum in neural networks is a parameter optimization technique that accelerates
gradient descent by adding a fraction of the previous weight update to the current
weight update.

7
R22 Machine Learning Lecture Notes

Minibatches and Stochastic Gradient Descent:

 MiniBatch method is to find a middle way between Batch Algorithm and Sequential
Algorithm, by splitting the training set into random batches, estimating the gradient
based on one of the subsets of the training set, performing a weight update, and then
using the next subset to estimate a new gradient and using that for the weight update,
until all of the training set have been used.
 If the batches are small, then there is often a reasonable degree of error in the gradient
estimate, and so the optimisation has the chance to escape from local minima.
 In Stochastic Gradient Descent method, a single input vector is chosen from the training
set, and the output and hence the error for that one vector computed, and this is used to
estimate the gradient and so update the weights.
 A new random input vector is then chosen and the process repeated. This is known as
stochastic gradient descent.
Multi-layer perceptron in practice:
 Here we can discuss choices that can be made about the network in order to use it for
solving real problems.
Amount of Training Data:
 For the MLP with one hidden layer there are (L + 1) ×M + (M + 1) × N weights, where
L,M,N are the number of nodes in the input, hidden, and output layers, respectively.
 The extra +1s come from the bias nodes, which also have adjustable weights
 This is a potentially huge number of adjustable parameters that we need to set during
the training phase.
 Setting the values of these weights is the job of the back-propagation algorithm, which
is driven by the errors coming from the training data.
 Clearly, the more training data there is, the better for learning, although the time that
the algorithm takes to learn increases.
 Unfortunately, there is no way to compute what the minimum amount of data required
is, since it depends on the problem.
 A rule of thumb that you should use a number of training examples that is at least 10
times the number of weights.
 This is probably going to be a very large number of examples, so neural network
training is a fairly computationally expensive operation, because we need to show the
network all of these inputs lots of times.

Number of Hidden Layers:

 Two Choices
o The number of hidden nodes
o The number of hidden layers
 It is possible to show mathematically that one hidden layer with lots of hidden nodes
is sufficient. This is known as the Universal Approximation Theorem.
 we will never normally need more than two layers (that is, one hidden layer and the
output layer)

8
R22 Machine Learning Lecture Notes

When to stop Learning:

 The training of the MLP requires that the algorithm runs over the entire dataset many
times, with the weights changing as the network makes errors in each iteration.
 Two options
o Predefined number of Iterations
o Predefined minimum error reached
 Using both of these options together can help, as can terminating the learning once the
error stops decreasing.
 We train the network for some predetermined amount of time, and then use the
validation set to estimate how well the network is generalising.
 We then carry on training for a few more iterations, and repeat the whole process.
 At some stage the error on the validation set will start increasing again, because the
network has stopped learning about the function that generated the data, and started to
learn about the noise that is in the data itself.
 At this stage we stop the training. This technique is called early stopping.

Examples of using the MLP:

 We will then apply MLP to find solutions to four different types of problem: regression,
classification, time-series prediction, and data compression.
Regression:
 Regression is a statistical technique that is used for predicting continuous outcomes.
 If you want to predict a single value, you only need a single output neuron and if you want
to predict multiple values, you can add multiple output neurons.
 In general, we don't apply any activation function to the output layer of MLP, when dealing
with regression tasks, It just does the weighted sum and sends the output.
 But, in case you want your value between a given range, for example, -1 or +1 you can use
activation like Tanh(Hyperbolic Tangent) function.

9
R22 Machine Learning Lecture Notes

 The loss functions that can be used in Regression MLP include Mean Squared Error(MSE)
and Mean Absolute Error(MAE).
 MSE can be used in datasets with fewer outliers, while MAE is a good measure in datasets
which has more outliers.
 Example: Rainfall prediction, Stock price prediction

Classification:
 If the output variable is categorical, then we have to use classification for prediction.
Example: Iris Flower classification

 The aim is to classify iris flowers among three species (Setosa, Versicolor, or Virginica)
from the sepals’ and petals’ length and width measurements.
 The above neural network has one input layer, two hidden layers and one output layer.
 In the hidden layers we use sigmoid as an activation function for all neurons.
 In the output layer, we use softmax as an activation function for the three output
neurons.
 In this regard, all outputs are between 0 and 1, and their sum is 1.
 The neural network has three outputs since the target variable contains three classes
(Setosa, Versicolor, and Virginica).

10
R22 Machine Learning Lecture Notes

Working of Softmax:

 The input object belongs to Class 2 (66.4%)

Time series Prediction:
 There is a common data analysis task known as time-series prediction, where we have
a set of data that show how something varies over time, and we want to predict how the
data will vary in the future.
 The problem is that even if there is some regularity in the time-series, it can appear over
many different scales. For example, there is often seasonal variation in temperatures.
 Example: A typical time-series problem is to predict the ozone levels into the future
and see if you can detect an overall drop in the mean ozone level.
Data Compression / Data denoising:
 we train the network to reproduce the inputs at the output layer called auto-associative
learning
 These networks are known as auto encoders.
 The network is trained so that whatever you give as the input is reproduced at the output,
which doesn’t seem very useful at first, but suppose that we use a hidden layer that has
fewer neurons than the input layer.
 This bottleneck hidden layer has to represent all of the information in the input, so that
it can be reproduced at the output.
 It therefore performs some compression of the data, representing it using fewer
dimensions than were used in the input.

11
R22 Machine Learning Lecture Notes

 They are finding a different representation of the input data that extracts important
components of the data, and ignores the noise.
 This auto-associative network can be used to compress images and other data.

Deriving Back-propagation:
Things to know:
1. Derivative of ½ x2 is x
2. Chain rule:

3. Derivative of constant is zero. (Not a function of x is zero)

The Network output and the Error:

 The output of the neural network (the end of the forward phase of the algorithm) is a
function of three things:
 the current input (x)
 the activation function g(·) of the nodes of the network
 the weights of the network (v for the first layer and w for the second)
 We can’t change the inputs, since they are what we are learning about, nor can we
change the activation function as the algorithm learns.
 So the weights are the only things that we can vary to improve the performance of the
network, i.e., to make it learn.

 Here, x represents inputs

 v represents first set of weights
 w represents second set of weights
 y represents outputs

12
R22 Machine Learning Lecture Notes

 Note that i is an index over the input nodes, j is an index over the hidden layer neurons,
and k is an index over the output neurons.
The Error of the Network:
 Error function E(v,w) remind us that the only things that we can change are the weights
v and w.
 We will choose sum of squared error function

 We are going to use a gradient descent algorithm that adjusts each weight.
 The gradient that we want to know is how the error function changes with respect to
the different weights

Requirements of an Activation Function:

In order to model a neuron we want an activation function that has the following properties:

 it must be differentiable so that we can compute the gradient

 it should saturate (become constant) at both ends of the range, so that the neuron
either fires or does not fire
 it should change between the saturation values fairly quickly in the middle

13
R22 Machine Learning Lecture Notes

There is a family of functions called sigmoid functions because they are S-shaped that satisfy
all those criteria perfectly.

Back propagation of Error:

 Now we need chain rule as follows.

since we don’t know much about the inputs to a neuron, we just know about its output. That’s
fine, because we can use the chain rule again

14
R22 Machine Learning Lecture Notes

The important thing that we need to remember is that inputs to the output layer neurons come
from the activations of the hidden layer neurons multiplied by the second layer weights:

15
R22 Machine Learning Lecture Notes

The output Activation Functions:

 For regression case, we use linear activation function.
 For multiclass classification , we use softmax activation function.

Derivative of softmax function is as follows:

16
R22 Machine Learning Lecture Notes

Radial Basis Functions and Splines:

Receptive Fields:
 The Receptive Field (RF) is defined as the size of the area in the input that creates the
feature.
 It is essentially a measure of the relationship of an output feature (of any layer) with
the input area (patch).
Radial Basis Function(RBF) Network:
 Radial Basis Function (RBF) Networks are a specialized type of Artificial Neural
Network (ANN) used primarily for function approximation tasks.
 Radial Basis Function (RBF) Networks are a special category of feed-forward neural
networks comprising three layers:
o Input Layer: Receives input data and passes it to the hidden layer.
o Hidden Layer/RBF Layer: The core computational layer where RBF neurons
process the data.
o Output Layer: Produces the network’s predictions, suitable for classification or
regression tasks.

 RBF Networks are conceptually similar to K-Nearest Neighbor (k-NN) models, though
their implementation is distinct.
 The fundamental idea is that an item’s predicted target value is influenced by nearby
items with similar predictor variable values.
 Here’s how RBF Networks operate:
o Input Vector: The network receives an n-dimensional input vector that needs
classification or regression.
o RBF Neurons: Each neuron in the hidden layer represents a prototype vector
(center, radius/spread) from the training set. The network computes the
Euclidean distance between the input vector and each neuron’s center.
o Activation Function: The Euclidean distance is transformed using a Radial
Basis Function (typically a Gaussian function) to compute the neuron’s
activation value. This value decreases exponentially as the distance increases.

17
R22 Machine Learning Lecture Notes

o Output Nodes: Each output node calculates a score based on a weighted sum of
the activation values from all RBF neurons. For classification, the category with
the highest score is chosen.

Advantages of RBF Networks:

 Universal Approximation: RBF Networks can approximate any continuous function
with arbitrary accuracy given enough neurons.
 Faster Learning: The training process is generally faster compared to other neural
network architectures.
 Simple Architecture: The straightforward, three-layer architecture makes RBF
Networks easier to implement and understand.
Interpolation:
 Interpolation is estimating or measuring an unknown quantity between two known
quantities.
 Interpolation is a mathematical function that takes the values of nearby points and uses
them to predict the value of the unknown end.
 The process of interpolation involves creating a smooth curve between two data
points.
 The curve is created by plotting the point on the graph at which the distance between
two points is equal to half of their difference in y-coordinates.
 It is important because it ensures that your data points are evenly spaced along your
line.
 The interpolation formula is as follows:

18
R22 Machine Learning Lecture Notes

Example: if a child's height was measured at age 5 and age 6, interpolation could be
used to estimate the child's height at age 5.5.
Basis Function:
 Radial basis functions and several other machine learning algorithms can be written in
this form:

The Cubic Spline:

 We can continue to make the functions more complicated, with the important point
being how many degrees of continuity we require at the boundaries between the points.
 These functions are known as splines, and the most common one to use is the cubic
spline.
 Cubic Spline Interpolation: This type of interpolation creates a curved line to connect
two points in a graph. It's also called "quadratic spline interpolation" or "quadratic
smoothing."

19
R22 Machine Learning Lecture Notes

Curse of Dimensionality:
 The Curse of Dimensionality refers to the phenomenon where the efficiency and
effectiveness of algorithms deteriorate as the dimensionality of the data increases
exponentially.
 It is crucial to understand this concept because as the number of features or dimensions
in a dataset increases, the amount of data we need to generalize accurately grows
exponentially.
 Dimensions refer to the features or attributes of data.
 For instance, if we consider a dataset of houses, the dimensions could include the
house's price, size, number of bedrooms, location, and so on.
What problems does it cause?

1. Data sparsity. As mentioned, data becomes sparse, meaning that most of the high-
dimensional space is empty. This makes clustering and classification tasks challenging.
2. Increased computation. More dimensions mean more computational resources and
time to process the data.

20
R22 Machine Learning Lecture Notes

3. Overfitting. With higher dimensions, models can become overly complex, fitting to
the noise rather than the underlying pattern. This reduces the model's ability to
generalize to new data.
4. Distances lose meaning. In high dimensions, the difference in distances between data
points tends to become negligible, making measures like Euclidean distance less
meaningful.
5. Performance degradation. Algorithms, especially those relying on distance
measurements like k-nearest neighbors, can see a drop in performance.
6. Visualization challenges. High-dimensional data is hard to visualize, making
exploratory data analysis more difficult.

How to Solve the Curse of Dimensionality?.

 The primary solution to the curse of dimensionality is "dimensionality reduction."
 It's a process that reduces the number of random variables under consideration by
obtaining a set of principal variables.
 By reducing the dimensionality, we can retain the most important information in the
data while discarding the redundant or less important features.
Dimensionality Reduction Methods:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Factor Analysis
 Independent Component Analysis(ICA)
Support Vector Machine:
 Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
 However, primarily, it is used for Classification problems in Machine Learning.
 The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
 SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine.

21
R22 Machine Learning Lecture Notes

 SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM:

 Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.
 Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier. 

Hyperplane:

 There can be multiple lines/decision boundaries to segregate the classes in n-

dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
 The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features, then hyperplane will be a straight line.
 And if there are 3 features, then hyperplane will be a 2-dimension plane.

22
R22 Machine Learning Lecture Notes

 We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

 The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

Linear SVM:

 Suppose we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates
in either green or blue. Consider the below image:

 So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:

23
R22 Machine Learning Lecture Notes

 Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane.
 SVM algorithm finds the closest point of the lines from both the classes. These points
are called support vectors.
 The distance between the vectors and the hyperplane is called as margin.
 And the goal of SVM is to maximize this margin.
 The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:

 If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:

24
R22 Machine Learning Lecture Notes

 So to separate these data points, we need to add one more dimension. For linear data,
we have used two dimensions x and y, so for non-linear data, we will add a third
dimension z. It can be calculated as:

Z=x2+y2

 By adding the third dimension, the sample space will become as below image:

 So now, SVM will divide the datasets into classes in the following way. Consider the
below image:

25
R22 Machine Learning Lecture Notes

Kernels:
 The most interesting feature of SVM is that it can even work with a non-linear dataset
and for this, we use “Kernel Trick” which makes it easier to classifies the points.
Suppose we have a dataset like this:

 Here we see we cannot draw a single line or say hyperplane which can classify the
points correctly.
 So we convert this lower dimension space to a higher dimension space using some
quadratic functions which will allow us to find a decision boundary that clearly divides
the data points.
 The functions which help us to do this are called Kernels and which kernel to use is
purely determined by hyperparameter tuning.

Different Kernel functions:

 Polynomial Kernel

 Here d is the degree of the polynomial, which we need to specify manually.

 Suppose we have two features X1 and X2 and output variable as Y, so using
polynomial kernel we can write it as:

 So we basically need to find X12, X22 and X1.X2, and now we can see that 2
dimensions got converted into 5 dimensions.

26
R22 Machine Learning Lecture Notes

 Sigmoid Kernel

 It is just taking your input, mapping them to a value of 0 and 1 so that they can be
separated by a simple straight line. 
 RBF Kernel

 It creates non-linear combinations of our features to lift your samples onto a higher-
dimensional feature space where we can use a linear decision boundary to separate your
classes
 It is the most used kernel in SVM classifications, the following formula explains it
mathematically:

The Support Vector Machine Algorithm:

– identify the support vectors as those that are within some specified distance of the
closest point and dispose of the rest of the training data
– compute b* using equation

27
R22 Machine Learning Lecture Notes

Advantages of SVM:
 SVM works better when the data is Linear
 It is more effective in high dimensions
 With the help of the kernel trick, we can solve any complex problem
 SVM is not sensitive to outliers
 Can help us with Image classification

Disadvantages of SVM:
 Choosing a good kernel is not easy
 It doesn’t show good results on a large dataset
 The SVM hyperparameters are Cost -C and gamma. It is not that easy to fine-tune
these hyper-parameters. It is hard to visualize their impact.

******

Unit 5
No ratings yet
Unit 5
61 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
NN UNIT-1 Complete Notes with 153 pages (1)
No ratings yet
NN UNIT-1 Complete Notes with 153 pages (1)
153 pages
Machine Learning QB
No ratings yet
Machine Learning QB
3 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Unit 2
No ratings yet
Unit 2
31 pages
TB04 - Soft Computing Ebook PDF
100% (4)
TB04 - Soft Computing Ebook PDF
356 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
CP4252 Machine Learning lab manual
No ratings yet
CP4252 Machine Learning lab manual
37 pages
AIML UNIT 4
No ratings yet
AIML UNIT 4
26 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
MACHINE LEARNING Important Questions
100% (1)
MACHINE LEARNING Important Questions
2 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
Aiml Lab Manual 2023
No ratings yet
Aiml Lab Manual 2023
17 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
136 pages
CS3591 Computer Networks Lab manual finalized (3)
No ratings yet
CS3591 Computer Networks Lab manual finalized (3)
67 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Aim L Record
No ratings yet
Aim L Record
26 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Ad3351 Daa Important Questions
No ratings yet
Ad3351 Daa Important Questions
94 pages
Deep Learning Question Bank(2024-25)
No ratings yet
Deep Learning Question Bank(2024-25)
2 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
UNIT 4 - Perceptron and DL
No ratings yet
UNIT 4 - Perceptron and DL
39 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
KNN Solved Numerical Problem( Regression)
No ratings yet
KNN Solved Numerical Problem( Regression)
3 pages
NN DL
No ratings yet
NN DL
1 page
Soft Computing UNIT 3
No ratings yet
Soft Computing UNIT 3
10 pages
18AI61
No ratings yet
18AI61
3 pages
ML Decode
No ratings yet
ML Decode
130 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
Neural Networks
No ratings yet
Neural Networks
1 page
ad3461-ml-lab-manual
No ratings yet
ad3461-ml-lab-manual
48 pages
Question Bank - Module 2 - Module-3 Module 4 -Module 5
No ratings yet
Question Bank - Module 2 - Module-3 Module 4 -Module 5
4 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Designing A Learning System
No ratings yet
Designing A Learning System
12 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
No ratings yet
R22B Tech CSE (AIML) IandIIYearSyllabus PDF
65 pages
MP Neuron
No ratings yet
MP Neuron
35 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
CS3251 Programming in C 2 Marks
No ratings yet
CS3251 Programming in C 2 Marks
23 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
ML unit-1
No ratings yet
ML unit-1
15 pages
ML unit-5
No ratings yet
ML unit-5
14 pages
ML unit-3
No ratings yet
ML unit-3
23 pages
ML unit-4
No ratings yet
ML unit-4
17 pages
Mathematics For AI
No ratings yet
Mathematics For AI
4 pages
A Study On NSL-KDD Dataset
100% (1)
A Study On NSL-KDD Dataset
7 pages
ML WITH PYTHON
No ratings yet
ML WITH PYTHON
6 pages
Combining Internal - and External-Training-Loads To Predict Non-Contact
No ratings yet
Combining Internal - and External-Training-Loads To Predict Non-Contact
21 pages
Intelligent disassembly of electric vehicel batteries a forward looking review
No ratings yet
Intelligent disassembly of electric vehicel batteries a forward looking review
26 pages
EmailSpamFilteringTechniques AReview
No ratings yet
EmailSpamFilteringTechniques AReview
13 pages
Project On Agriculture
No ratings yet
Project On Agriculture
20 pages
syllabus
No ratings yet
syllabus
14 pages
Data Analytics Certification Program Learnbay
No ratings yet
Data Analytics Certification Program Learnbay
36 pages
Sign Language Translation
No ratings yet
Sign Language Translation
23 pages
Body Pose Detection Using Research
No ratings yet
Body Pose Detection Using Research
12 pages
The Evolution of AI
No ratings yet
The Evolution of AI
20 pages
Stop and Scan The Trees: Tree Leaf Recognition With Transfer Learning
No ratings yet
Stop and Scan The Trees: Tree Leaf Recognition With Transfer Learning
6 pages
MLT Unit 2 - Updated
No ratings yet
MLT Unit 2 - Updated
58 pages
Amharic Idiom Paper
No ratings yet
Amharic Idiom Paper
9 pages
Full download Introduction to High Dimensional Statistics 2nd Edition Christophe Giraud pdf docx
100% (9)
Full download Introduction to High Dimensional Statistics 2nd Edition Christophe Giraud pdf docx
50 pages
Article Review - Getachew Abe
No ratings yet
Article Review - Getachew Abe
5 pages
Data science
No ratings yet
Data science
29 pages
Instant Download (Ebook) Fundamentals of Data Science: Theory and Practice by Jugal K Kalita, Dhruba K Bhattacharyya, Swarup Roy, ISBN 9780323917780, 032391778X PDF All Chapters
100% (6)
Instant Download (Ebook) Fundamentals of Data Science: Theory and Practice by Jugal K Kalita, Dhruba K Bhattacharyya, Swarup Roy, ISBN 9780323917780, 032391778X PDF All Chapters
81 pages
Cryptocurrency Fraud Detection[1]
No ratings yet
Cryptocurrency Fraud Detection[1]
19 pages
Sat - 96.Pdf - Machine Learning Models For Electricity Consumption Forecasting
No ratings yet
Sat - 96.Pdf - Machine Learning Models For Electricity Consumption Forecasting
11 pages
Supervised - ML Complete Book
No ratings yet
Supervised - ML Complete Book
153 pages
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
No ratings yet
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
8 pages
Zhihui X (2008) Computer - Vision, I-Tech
No ratings yet
Zhihui X (2008) Computer - Vision, I-Tech
549 pages
A - Hybrid - CNN-PH 2based - Segmentation - and - Boosting - Classifier - For - Real - Time - Sensor - Spinal - Cord - Injury - Data
No ratings yet
A - Hybrid - CNN-PH 2based - Segmentation - and - Boosting - Classifier - For - Real - Time - Sensor - Spinal - Cord - Injury - Data
10 pages
Thesis Presentation
No ratings yet
Thesis Presentation
35 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
25072023_Application of machine learning techniques in environmentally benign surface grinding of Inconel 625
No ratings yet
25072023_Application of machine learning techniques in environmentally benign surface grinding of Inconel 625
17 pages
Divya - Detection of Face Spoofs With Raspberry Pi
No ratings yet
Divya - Detection of Face Spoofs With Raspberry Pi
34 pages
ISSA_DRDO_Report
No ratings yet
ISSA_DRDO_Report
20 pages

ML unit-2

Uploaded by

ML unit-2

Uploaded by

R22 Machine Learning Lecture Notes

Multi-Layer Perceptron (MLP):

Example - XOR Problem:

Where y is output, t is target and N is the number of output nodes.

Hyperbolic Tangent Function:

Multi-Layer Perceptron Algorithm:

o The activation function g(·) is the sigmoid function

Improvements for MLP Algoritthm:

 A common trick is to set the weights in the range where n

Sequential and Batch Training:

Minibatches and Stochastic Gradient Descent:

Number of Hidden Layers:

When to stop Learning:

Examples of using the MLP:

 The input object belongs to Class 2 (66.4%)

3. Derivative of constant is zero. (Not a function of x is zero)

The Network output and the Error:

 Here, x represents inputs

Requirements of an Activation Function:

 it must be differentiable so that we can compute the gradient

Back propagation of Error:

The output Activation Functions:

Derivative of softmax function is as follows:

Radial Basis Functions and Splines:

Advantages of RBF Networks:

The Cubic Spline:

How to Solve the Curse of Dimensionality?.

 There can be multiple lines/decision boundaries to segregate the classes in n-

Different Kernel functions:

 Here d is the degree of the polynomial, which we need to specify manually.

The Support Vector Machine Algorithm:

You might also like