0% found this document useful (0 votes)

13 views

MLP U3

Uploaded by

rishiparmar921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

MLP U3

Uploaded by

rishiparmar921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

MACHINE

LEARNING
WITH PYTHON
SEMESTER 5
UNIT - 3

HI COLLEGE
SYLLABUS
UNIT - 3

HI COLLEGE
ARTIFICIAL NEURAL NETWORKS

Artificial Neural Networks (ANN) are a type of machine learning model

inspired by the structure and functioning of the human brain.
They consist of interconnected artificial neurons, also known as nodes or
units, that work together to process and learn from data.
Each node takes in input values, applies certain transformations to them,
and produces an output value.
The connections between the nodes are represented by weights, which
determine how the input values influence the output values.
Training an ANN involves adjusting these weights based on the input data
and the desired output.
ANN can be used for a variety of tasks, such as classification, regression, and
pattern recognition.
Let's take an example of recognizing handwritten digits using an ANN. Each
pixel of the digit image is treated as an input value, and the weights
determine how strongly each pixel contributes to identifying the digit. The
output of the ANN would be the digit class with the highest probability.
The popular Python library for implementing ANNs is TensorFlow, which
provides useful tools and functions to build, train, and evaluate neural
networks.

HEBBNET
HebbNet, also known as Hebbian Networks, is a type of artificial neural network
(ANN) that follows the Hebbian learning rule.

The Hebbian learning rule states that "neurons that fire together wire
together". It means that when two connected neurons have correlated
activities, the strength of their connection is increased. This rule captures
the idea that the connections between neurons are strengthened through
repeated activation.

In HebbNet, the weights between neurons are adjusted based on the input
correlation. If two connected neurons are simultaneously firing or inhibiting
each other, the weight between them is increased. Conversely, if they have
opposite activities, the weight is decreased.

HiCollege Click Here For More Notes 01

HebbNet can learn to recognize patterns by detecting and strengthening
the correlations between the input features. For example, if a network is
trained on a dataset of images, it can learn to recognize common features
and patterns shared by different images.

Compared to other ANN architectures, HebbNet is relatively simple and

biologically plausible. However, it also has limitations. It can easily overfit the
training data, and it may struggle with handling noisy or incomplete inputs.

Implementation of HebbNet using Python would require defining the

network structure, initializing the weights, providing the input patterns, and
updating the weights based on the Hebbian learning rule.

PERCEPTRON
A perceptron is a simple type of artificial neural network (ANN) that can be
used for binary classification tasks.

The perceptron consists of a single artificial neuron that takes in multiple

inputs, applies weights to them, and produces an output.
Each input is multiplied by its corresponding weight, and these weighted
inputs are summed up.
The summed value is then passed through an activation function, typically a
step function or a sigmoid function, to produce a binary output (0 or 1).
The activation function helps in separating the inputs into two classes and
making a decision based on the calculated value.
Initially, the weights of the perceptron are randomly assigned, and the
model learns by iteratively adjusting the weights based on the correctness
of its predictions.
The adjustment of weights is done using a learning rule called the
perceptron learning rule, which updates the weights proportionally to the
input data and the error in prediction.
To train a perceptron, labeled training data is required, consisting of input
features and corresponding target labels (0 or 1).
The training process involves presenting the input data to the perceptron,
comparing the predicted output with the target output, and updating the
weights accordingly.
Once trained, a perceptron can be used to classify new unseen data by
feeding it through the trained model and observing the output.

HiCollege Click Here For More Notes 02

Example: Let's say we have a perceptron model that aims to classify whether an
email is spam or not. The input features could include the number of words, the
presence of specific keywords, etc. The perceptron would learn the weights
based on labeled training data (emails labeled as spam or not). Then, when
given a new email input, the perceptron will predict whether it is spam or not
based on the learned weights and the activation function.

ADALINE
Adaline (Adaptive Linear Neuron) is another type of artificial neural network
(ANN) that is closely related to the perceptron. While the perceptron focuses on
binary classification, Adaline is primarily used for regression tasks or continuous
output prediction.

Here are some key characteristics of Adaline:

Adaline consists of a single-layer neural network with one or more input

neurons, a weight adjustment mechanism, and a linear activation function.
Unlike the perceptron, Adaline does not use a step function or sigmoid
function for activation; instead, it uses the identity (linear) activation
function.
Training Adaline involves a continuous weight adjustment process that
minimizes the difference between the predicted output and the desired
output value.
The key learning algorithm for Adaline is the Widrow-Hoff rule, also known
as the Least Mean Squares (LMS) algorithm or Delta Rule.
The LMS algorithm updates the weights in a way that minimizes the mean
squared error between the predicted output and the actual output.
The weight adjustments are conducted based on the gradient descent
method, where the weights are modified in the opposite direction of the
gradient to approach the optimal solution.
Adaline networks are capable of learning linear relationships between input
variables and the continuous output variable, making them suitable for
regression problems.
Similar to the perceptron, Adaline's weights are initialized randomly before
training and they are updated iteratively until the model converges or
reaches a desired level of accuracy.
Once Adaline is trained, it can be used to predict the output for new input
data by applying the learned weights and the linear activation function.

HiCollege Click Here For More Notes 03

MULTILAYER NEURAL NETWORK
A multilayer neural network, also known as a multi-layer perceptron (MLP), is a
type of artificial neural network (ANN) that consists of multiple layers of
interconnected neurons. It is a nonlinear model that can be used for both
classification and regression tasks.

Here are some key characteristics of a multilayer neural network:

Structure: A multilayer neural network typically consists of an input layer,

one or more hidden layers, and an output layer. Each layer is composed of
several neurons or nodes.

Neuron connectivity: In a multilayer neural network, neurons are organized

in a layered manner, with each neuron in one layer connected to every
neuron in the previous layer and every neuron in the next layer.

Activation function: Each neuron in the hidden layers and the output layer
applies an activation function to its inputs, transforming the input values
into a more useful form. Common activation functions used in MLPs include
sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU).

Training: The weights and biases of the network are initially assigned
randomly, and the network goes through a training process to adjust these
parameters. Backpropagation, a popular training algorithm, is commonly
used in MLPs. It calculates the gradient of the network's error with respect to
its weights and biases and then updates these parameters iteratively to
minimize the error.

Learning nonlinearity: The presence of hidden layers and the use of

nonlinear activation functions enable multilayer neural networks to learn
complex nonlinear relationships between the input and output variables.

Universal approximator: A well-designed multilayer neural network with

sufficient hidden neurons can approximate any continuous function to
arbitrary precision, given enough training data and training time.

Overfitting: Multilayer neural networks are prone to overfitting, which occurs

when the model becomes too complex and starts to remember the noise
and specifics of the training data instead of learning general patterns.

HiCollege Click Here For More Notes 04

ARCHITECTURE
The architecture of a neural network refers to the structure and arrangement of
its layers, neurons, and connections. It determines how information flows
through the network and how computations are performed.

Here are some key elements of neural network architecture:

1. Input Layer: The input layer accepts the input data and passes it to the next
layer. The number of neurons in the input layer depends on the number of
input features or variables.

2. Hidden Layers: Hidden layers are intermediary layers between the input and
output layers. They play a crucial role in capturing and learning complex
patterns and representations from the input data. The number of hidden layers
and the number of neurons in each hidden layer are design choices and
depend on the specific problem and data.

3. Output Layer: The output layer produces the final predictions or outputs of
the neural network. The number of neurons in the output layer depends on the
type of task. For example, in binary classification, there may be a single neuron
with a sigmoid activation function, while in multi-class classification, there may
be multiple neurons with softmax activation.

4. Neurons and Activation Functions: Neurons are the fundamental units in a

neural network that compute and transmit information. Each neuron applies
an activation function to its input to produce an output. Common activation
functions include sigmoid, hyperbolic tangent (tanh), and rectified linear unit
(ReLU). The choice of activation function depends on the nature of the problem
and the desired properties of the network.

5. Connections and Weights: Connections are the links or pathways through

which information flows between neurons. Each connection is associated with
a weight, which determines the strength or importance of the connection.
During the training process, the weights are adjusted using optimization
algorithms like backpropagation in order to minimize the error or loss of the
network.

HiCollege Click Here For More Notes 05

6. Bias Units: Bias units are additional neurons that provide an offset or bias to
the input of a neuron in the next layer. They help in better representation and
flexibility of the learned models.

7. Dropouts: Dropouts are a regularization technique used in deep neural

networks to mitigate overfitting. Randomly selected neurons are ignored, or
their outputs are set to zero, during training. This prevents the network from
relying too heavily on any individual neuron or feature.

8. Architectural Choices: The design of a neural network architecture depends

on the specific problem and data. Various architectural choices, such as the
number of layers, number of neurons, type of activation functions,
regularization techniques, and optimization algorithms, impact the network's
capacity to learn and its generalization ability.

Example: A common architecture for a simple neural network might have an

input layer with the number of neurons equal to the number of input features,
followed by one or more hidden layers with varying numbers of neurons, and
finally, an output layer with the number of neurons corresponding to the
number of classes in a classification task.

HiCollege Click Here For More Notes 06

ACTIVATION FUNCTION
Activation functions are an essential component of neural networks. They
introduce nonlinearity into the network, allowing it to learn and model
complex relationships between inputs and outputs. Here are some commonly
used activation functions:

1. Sigmoid Activation Function: The sigmoid function applies a sigmoid curve

to squash the input values between 0 and 1. It has a smooth and continuous
output and is useful for binary classification tasks. However, its gradient
saturates for large input values, which can lead to slow learning.

Formula: f(x) = 1 / (1 + exp(-x))

2. Hyperbolic Tangent (tanh) Activation Function: The tanh function is similar

to the sigmoid function but squashes the input values between -1 and 1. It also
suffers from the saturation problem, but its output is zero-centered, leading to
faster convergence compared to the sigmoid function.

Formula: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

3. Rectified Linear Unit (ReLU) Activation Function: ReLU is a widely used

activation function for deep neural networks. It returns the input as the output
if the input is positive, and zero otherwise. ReLU overcomes the saturation
problem and speeds up training compared to sigmoid and tanh functions.
However, it can cause dead neurons, i.e., neurons that output zero and stop
learning any further.

Formula: f(x) = max(0, x)

4. Leaky ReLU Activation Function: Leaky ReLU is a modification of the ReLU

function that solves the dead neurons issue. It introduces a small slope for
negative inputs, allowing backward gradients to flow even for negative inputs.

Formula: f(x) = max(0.01x, x) (or any small positive value instead of 0.01)

HiCollege Click Here For More Notes 07

5. Softmax Activation Function: Softmax is commonly used in multi-class
classification tasks. It converts a vector of real numbers into a probability
distribution by exponentiating and normalizing the values. Each output
represents the probability of the corresponding class.

Formula: f(x_i) = exp(x_i) / sum(exp(x_j)) for each element x_i in the input
vector x

6. Linear Activation Function: The linear function simply computes the

weighted sum of the inputs without any nonlinearity. It is commonly used in
regression tasks or as the output activation function when the network needs
to predict continuous values.

Formula: f(x) = x

LOSS FUNCTION
A loss function, also known as a cost function or objective function, measures
the discrepancy between the predicted output of a model and the true target
values. It quantifies how well the model is performing and guides the learning
process by updating the model's parameters to minimize the loss.

The choice of the loss function depends on the type of task being solved. Here
are some commonly used loss functions for different types of problems:

1. Mean Squared Error (MSE): MSE is a common loss function for regression
problems, where the aim is to predict continuous values. It calculates the
average squared difference between the predicted and true values.

Formula: MSE = (1/n) * sum((y_pred - y_true)^2)

2. Mean Absolute Error (MAE): MAE is another loss function for regression tasks.
It calculates the average absolute difference between the predicted and true
values. MAE provides a more interpretable measure as compared to MSE.

Formula: MAE = (1/n) * sum(abs(y_pred - y_true))

3. Binary Cross-Entropy (or Logistic Loss): Binary cross-entropy is commonly

used in binary classification tasks, where the aim is to predict one of two
classes. It measures the dissimilarity between the predicted probabilities and
the true binary labels.

HiCollege Click Here For More Notes 08

Formula: BCE = - ( y_true * log(y_pred) + (1 - y_true) * log(1 - y_pred) )

4. Categorical Cross-Entropy: Categorical cross-entropy is used in multi-class

classification problems, where more than two classes need to be predicted. It
calculates the average of the cross-entropy loss for each class.

Formula: CCE = - sum( y_true * log(y_pred) )

5. Sparse Categorical Cross-Entropy: Sparse categorical cross-entropy is similar

to categorical cross-entropy but is used when the target labels are integers
instead of one-hot-encoded vectors. This is commonly used when the number
of classes is large.

6. Hinge Loss: Hinge loss is used in binary and multi-class classification tasks,
particularly in support vector machines (SVMs). It encourages correct
classification by penalizing misclassifications based on a margin.

Formula: Hinge Loss = max(0, 1 - y_true * y_pred)

7. Kullback-Leibler Divergence (KL Divergence): KL divergence is a loss function

used in variational autoencoders (VAEs) and generative models. It measures the
difference between two probability distributions, typically the predicted
distribution and a target distribution.

Formula: KL Divergence = sum( P(x) * log(P(x) / Q(x)) )

HiCollege Click Here For More Notes 09

HYPER PARAMETERS
Hyperparameters are the configuration parameters of a machine learning
model that are not learned from the data but set before the training process.
These parameters control the behavior and performance of the model and are
typically set by the user or determined through trial and error. Examples of
hyperparameters include:

1. Learning Rate: This hyperparameter controls the step size or rate at which the
model updates its parameters during training. A larger learning rate can lead to
faster convergence, but if it is too large, the model may overshoot the optimal
solution. Conversely, a too small learning rate can slow down or prevent
convergence.

2. Number of Hidden Layers and Neurons: The architecture of a neural network

consists of hidden layers and neurons. The number of hidden layers and
neurons in each layer is a hyperparameter that defines the capacity and
complexity of the network. Increasing the number of layers and neurons can
potentially improve the model's ability to capture complex patterns, but it can
also increase overfitting and slow down training.

3. Activation Functions: Different activation functions (such as sigmoid, tanh,

ReLU) can be chosen for different layers of a neural network. The selection of
activation functions is a hyperparameter that influences the nonlinearity and
expressiveness of the model.

4. Regularization Parameters: Regularization techniques such as L1 or L2

regularization are used to prevent overfitting by penalizing large parameter
values. The hyperparameters associated with regularization methods, such as
the strength of regularization (lambda), control the trade-off between fitting
the training data and maintaining simplicity in the model.

5. Dropout Rate: Dropout is a regularization technique that randomly drops out

a fraction of neurons during training, preventing the model from becoming too
dependent on any specific set of neurons. The dropout rate hyperparameter
determines the fraction of neurons that will be dropped out.

6. Batch Size: During training, the data is divided into batches for efficiency. The
batch size hyperparameter specifies the number of samples to be processed
before updating the model's parameters. Larger batch sizes can accelerate
training, but smaller batch sizes can lead to better generalization and faster
convergence for certain datasets.

HiCollege Click Here For More Notes 10

7. Number of Epochs: An epoch is a full pass over the training data. The number
of epochs is a hyperparameter that determines how many times the entire
dataset will be passed through the model during training. Too few epochs may
result in underfitting, while too many epochs can lead to overfitting.

GRADIENT DESCENT
Gradient descent is an optimization algorithm commonly used in machine
learning and deep learning to minimize the loss function and find the optimal
values for the model's parameters. It works by iteratively adjusting the
parameters in the direction of the steepest descent of the loss function.

The main idea behind gradient descent is to compute the gradient of the loss
function with respect to each parameter. The gradient represents the direction
of the steepest ascent of the loss function. Since we want to minimize the loss,
we move in the opposite direction of the gradient, which is the direction of
steepest descent.

Here is a high-level overview of the gradient descent algorithm:

1. Initialize the model's parameters with some initial values.

2. Compute the loss function by applying the current parameters to the training
data.
3. Compute the gradient of the loss function with respect to each parameter.
This is done using techniques like backpropagation in neural networks.
4. Update the parameters by taking a small step in the opposite direction of the
gradient. This step is determined by the learning rate hyperparameter, which
controls the step size or rate of convergence.
5. Repeat steps 2-4 until the loss function converges or a stopping criterion is
met, such as reaching a maximum number of iterations or a desired level of
loss.

HiCollege Click Here For More Notes 11

Batch gradient descent computes the gradient and updates the parameters
based on the entire training dataset. This can be computationally expensive for
large datasets but can provide a more accurate estimate of the gradient.
Stochastic gradient descent computes the gradient and updates the
parameters for each individual data sample. It is much faster but can result
in noisy updates.
Mini-batch gradient descent is a compromise between batch and stochastic
gradient descent. It computes the gradient and updates the parameters
using a small batch of data samples. It is commonly used because it
combines the benefits of both approaches in terms of faster convergence
and more accurate estimates.

Gradient descent is an iterative process that continues until the loss function is
minimized or until a stopping criterion is met. By updating the parameters in
the direction of the negative gradient, the algorithm slowly converges towards
the optimal parameter values that minimize the loss function and improve the
model's performance.

BACKPROPAGATION
Backpropagation, short for "backward propagation of errors," is a key algorithm
used to train artificial neural networks. It is a technique to compute the
gradients of the loss function with respect to the parameters of a neural
network efficiently. These gradients are then used to update the parameters
during the optimization process, typically using gradient descent.

The backpropagation algorithm consists of two main steps: forward

propagation and backward propagation.

1. Forward Propagation:
During forward propagation, the input data is passed through the neural
network, layer by layer, to compute the predicted output. Each layer
performs a weighted sum of the inputs, applies an activation function, and
passes the result to the next layer.
The computed output is compared with the ground truth labels to calculate
the loss (or cost) function, which quantifies the discrepancy between the
predicted and actual outputs.

HiCollege Click Here For More Notes 12

2. Backward Propagation:
In backward propagation, the goal is to compute the gradients of the loss
function with respect to the parameters of the neural network.
Initially, the gradient of the loss with respect to the final layer's activations is
calculated using the derivative of the loss function.
Then, starting from the last layer and moving backward through the
network, the gradients of the loss with respect to the parameters of each
layer are computed using the chain rule and the calculated gradients from
the subsequent layers.
The obtained gradients are used to update the parameters using an
optimization algorithm such as gradient descent.

The backpropagation algorithm can be visualized as propagating the error

calculated at the output of the network backward through each layer. The
errors are distributed to each neuron in proportion to their contribution to the
overall error.

Backpropagation commonly employs the gradient of the activation function

and the chain rule to calculate the gradients efficiently. Different activation
functions have different derivatives, so their choice can impact the
performance of backpropagation. Additionally, techniques like regularization
and dropout can be incorporated during the backward pass for more robust
training.

Backpropagation continues to iterate through the forward and backward

passes, adjusting the parameters using the calculated gradients, until the loss
function is minimized and the neural network achieves the desired
performance on the training data.

HiCollege Click Here For More Notes 13

VARIANTS OF BACKPROPAGATION
There are several variants of backpropagation that have been developed over
the years to improve the training efficiency or address specific challenges in
neural network training. Some of the commonly used variants include:

1. Stochastic Backpropagation:
In traditional backpropagation, the gradients are computed using the entire
training dataset (batch gradient descent). In stochastic backpropagation,
the gradients are computed and updated for each individual training
sample.
This variant is computationally more efficient and works well for large
datasets. However, the updates can be noisy as they are based on individual
samples, which can introduce more variability in the optimization process.

2. Mini-Batch Backpropagation:
Mini-batch backpropagation is a compromise between batch gradient
descent and stochastic backpropagation.
Instead of using the entire training dataset or just a single sample, this
variant computes the gradients and updates the parameters using a small
batch of training samples.
This approach balances the computational efficiency of stochastic
backpropagation and the stability of batch gradient descent.

3. Hessian Backpropagation:
Hessian backpropagation takes into account the curvature information of
the loss function by computing and using the Hessian matrix during the
parameter updates.
The Hessian matrix provides additional information about the shape of the
loss function and can lead to more efficient optimization.
However, computing and manipulating the Hessian matrix can be
computationally expensive, especially for large neural networks.

HiCollege Click Here For More Notes 14

4. Resilient Backpropagation (RProp):
RProp is a variant of backpropagation that modifies the parameter update
rules to adaptively adjust the step size based on the sign of the gradient.
It adjusts the step size independently for each parameter and can result in
faster convergence and improved robustness compared to traditional
gradient descent or adaptive gradient methods like AdaGrad or Adam.

5. Levenberg-Marquardt Algorithm:
The Levenberg-Marquardt algorithm is a specialized method for training
neural networks with a form of backpropagation called the Gauss-Newton
method.
This algorithm utilizes second-order derivatives (the Hessian matrix) to
compute efficient weight updates.
It is especially useful in situations where the neural network is trained on
continuous-valued output data and the underlying model is non-linear.

AVOIDING OVERFITTING THROUGH REGULARIZATION

Overfitting occurs when a model learns the training data too well, leading to
poor generalization on new, unseen data. Regularization is a technique used to
prevent overfitting by adding a penalty term to the loss function during
training. This penalty discourages the model from learning complex patterns
that might be specific to the training data but don't generalize well.

Here are some common regularization techniques to avoid overfitting:

1. L1 and L2 Regularization:
L1 and L2 regularization are two widely used techniques.
L1 regularization adds the sum of the absolute values of the model weights
multiplied by a regularization parameter to the loss function. It encourages
the model to have sparse weights, i.e., many weights become close to zero,
effectively performing feature selection.
L2 regularization adds the sum of squares of the model weights multiplied
by a regularization parameter to the loss function. It encourages the model
to have small weights, spreading the influence of each feature across
multiple weights.
Both regularization techniques help prevent overfitting by reducing the
model's reliance on any single feature and mitigating the impact of noisy or
irrelevant features.

HiCollege Click Here For More Notes 15

2. Dropout:
Dropout is a regularization technique introduced to combat overfitting in
neural networks.
During training, dropout randomly sets a proportion of the outputs of the
neurons in a layer to zero. Essentially, it "drops out" random neurons
temporarily, forcing the model to learn more robust representations.
Dropout acts as a form of ensemble learning, as different subsets of neurons
are active in each forward pass, resulting in multiple different models being
trained simultaneously.
At testing time, the entire network is used without dropout, but the weights
are usually scaled down to account for the missing neurons' effect.

3. Early Stopping:
Early stopping is a simple yet effective regularization technique based on
monitoring the model's performance on a validation dataset during training.
The training process is stopped early if the model's performance on the
validation dataset starts deteriorating.
This prevents the model from overfitting by finding the point where further
training is likely to lead to generalization loss.
Early stopping saves computation time as the model does not have to
continue training until convergence but rather stops at an optimal point.

4. Data Augmentation:
Data augmentation is a technique commonly used in computer vision tasks
to prevent overfitting.
It involves generating additional training examples by applying random
transformations to the existing data, such as random rotations, flips,
translations, or adding noise.
Data augmentation increases the size and diversity of the training dataset,
making the model more robust and less prone to overfitting to specific
patterns in the original data.

5. Batch Normalization:
Batch normalization is a regularization technique often used in deep neural
networks.
It normalizes the intermediate outputs of the network by subtracting the
batch mean and scaling by the batch standard deviation.
Batch normalization reduces the internal covariate shift, helping the model
converge faster and making it less sensitive to the weight initialization.
By providing a form of regularization, batch normalization can also help
prevent overfitting.

HiCollege Click Here For More Notes 16

APPLICATIONS OF NEURAL NETWORKS
Image and Video Recognition:
Neural networks excel in image classification, object detection, and
segmentation.
Convolutional Neural Networks (CNNs) stand out, achieving top
performance in computer vision tasks.
Applications span facial recognition, self-driving cars, surveillance, and
medical image analysis.

Natural Language Processing (NLP):

Neural networks are pivotal in language translation, sentiment analysis,
text generation, and speech recognition.
Recurrent Neural Networks (RNNs) and LSTM networks handle
sequential data processing.
NLP applications cover chatbots, machine translation, text-to-speech
synthesis, sentiment analysis, and unstructured text analysis.

Speech and Speaker Recognition:

Neural networks excel in speech recognition and speaker identification
tasks.
Deep neural networks, like CNNs and RNNs, model speech signals and
extract pertinent features.
Applications include automatic speech recognition, voice-controlled
interfaces, speaker verification, and audio transcription.

Recommender Systems:
Neural networks power recommender systems predicting user
preferences for personalized recommendations.
Collaborative Filtering methods like matrix factorization and neural
collaborative filtering analyze user-item interactions, offering precise
suggestions.
Applications encompass product recommendations, movie/music
suggestions, and personalized content delivery.

Financial Forecasting:
Neural networks find applications in financial markets for tasks like stock
market prediction, asset price forecasting, and credit scoring.
Models employing recurrent or feedforward neural networks capture
patterns and trends in financial data.
Applications span stock market prediction, algorithmic trading, credit
risk assessment, fraud detection, and investment portfolio optimization.

HiCollege Click Here For More Notes 17

Unit 5
No ratings yet
Unit 5
61 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
soft computing unit 2
No ratings yet
soft computing unit 2
23 pages
DL_Unit II
No ratings yet
DL_Unit II
78 pages
UNIT - 4
No ratings yet
UNIT - 4
17 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
unit-3_ml[1]
No ratings yet
unit-3_ml[1]
21 pages
UNIT V
No ratings yet
UNIT V
49 pages
Unit I - Afs
No ratings yet
Unit I - Afs
18 pages
ML Unit4
No ratings yet
ML Unit4
38 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
51 pages
neural network
No ratings yet
neural network
18 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Notes_ML_02_Slides_RNN_ANN
No ratings yet
Notes_ML_02_Slides_RNN_ANN
105 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
Deep learning notes
No ratings yet
Deep learning notes
47 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Unit 3 Self Made
No ratings yet
Unit 3 Self Made
23 pages
AIML-UNIT-5
No ratings yet
AIML-UNIT-5
34 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
ML Exam Prep
No ratings yet
ML Exam Prep
14 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Module 2
No ratings yet
Module 2
84 pages
Artificial Neural Network (2)
No ratings yet
Artificial Neural Network (2)
75 pages
Artifical Neural Network
No ratings yet
Artifical Neural Network
69 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Lecture 5-Introduction to neural network (1)
No ratings yet
Lecture 5-Introduction to neural network (1)
42 pages
Soft Compute
No ratings yet
Soft Compute
21 pages
eL_Assignment
No ratings yet
eL_Assignment
10 pages
AI Week 12
No ratings yet
AI Week 12
2 pages
Unit 4
100% (1)
Unit 4
57 pages
Artificial Neural Networks Ppt
No ratings yet
Artificial Neural Networks Ppt
26 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Dersnot 6452 1668688984
No ratings yet
Dersnot 6452 1668688984
36 pages
Module - 2
No ratings yet
Module - 2
33 pages
Mod 2.1,2.2
No ratings yet
Mod 2.1,2.2
24 pages
IV Ai & Ds Al3451 Ml Unit4
No ratings yet
IV Ai & Ds Al3451 Ml Unit4
36 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Neural Networks
No ratings yet
Neural Networks
36 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
UNIT-II MLT1
No ratings yet
UNIT-II MLT1
45 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
27 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
51 pages
1. Introduction to neural networks -Single layer perceptrons - Modified
No ratings yet
1. Introduction to neural networks -Single layer perceptrons - Modified
26 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
unit2ml-230101150634-5590aaef
No ratings yet
unit2ml-230101150634-5590aaef
202 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Lesson 14 ANN Supervised
No ratings yet
Lesson 14 ANN Supervised
37 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
4 - File Systems
No ratings yet
4 - File Systems
17 pages
Bfs
No ratings yet
Bfs
2 pages
JAVA U2 - Watermark
No ratings yet
JAVA U2 - Watermark
11 pages
Java U3
No ratings yet
Java U3
23 pages
Semester 4: Unit - 1
No ratings yet
Semester 4: Unit - 1
22 pages
DM U3 Final
No ratings yet
DM U3 Final
13 pages
Digital: Marketing
No ratings yet
Digital: Marketing
6 pages
Unit 1 Ai PDF
No ratings yet
Unit 1 Ai PDF
89 pages
Unit 2 Neural Networks
No ratings yet
Unit 2 Neural Networks
52 pages
Role of Machine Learning in MIS
No ratings yet
Role of Machine Learning in MIS
4 pages
Lecture 22 Energy-Based Models - Hopfield Network
No ratings yet
Lecture 22 Energy-Based Models - Hopfield Network
57 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Adaline and K
0% (1)
Adaline and K
29 pages
Bab 7
No ratings yet
Bab 7
3 pages
Multiclass Classification
No ratings yet
Multiclass Classification
3 pages
ANN Deep Learning Course Structure AUG-DEC2023
No ratings yet
ANN Deep Learning Course Structure AUG-DEC2023
1 page
Neural Network Toolbox Command List
No ratings yet
Neural Network Toolbox Command List
4 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
Ensemble Deep Learning Based Prediction of Fraudul
No ratings yet
Ensemble Deep Learning Based Prediction of Fraudul
13 pages
Final B.Tech. Project Report Sample Format by R&D Cell revised
No ratings yet
Final B.Tech. Project Report Sample Format by R&D Cell revised
18 pages
S. Learning - Clase 4
No ratings yet
S. Learning - Clase 4
29 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
4-Recurrent Neural Network
No ratings yet
4-Recurrent Neural Network
21 pages
Clustering
No ratings yet
Clustering
4 pages
Tensorflow Examples and Tutorials: Tutorial Index
No ratings yet
Tensorflow Examples and Tutorials: Tutorial Index
6 pages
Lecture 9 - Supervised Learning in ANN - (Part 2) New
No ratings yet
Lecture 9 - Supervised Learning in ANN - (Part 2) New
7 pages
11 ANN (Backpropagation)
No ratings yet
11 ANN (Backpropagation)
37 pages
Backpropagation in Convolutional Neural Networks
No ratings yet
Backpropagation in Convolutional Neural Networks
4 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 3rd Edition by OReilly Media ISBN 9781098122461 1098122461instant download
100% (3)
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 3rd Edition by OReilly Media ISBN 9781098122461 1098122461instant download
79 pages
Deep Learning with Python Develop Deep Learning Models on Theano and TensorFLow Using Keras Jason Brownlee all chapter instant download
100% (6)
Deep Learning with Python Develop Deep Learning Models on Theano and TensorFLow Using Keras Jason Brownlee all chapter instant download
65 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
Ann-Unit Iv
No ratings yet
Ann-Unit Iv
27 pages
Project Report
No ratings yet
Project Report
30 pages
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
No ratings yet
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
37 pages
Assignment 01
No ratings yet
Assignment 01
3 pages

MLP U3

Uploaded by

MLP U3

Uploaded by

MACHINE

Artificial Neural Networks (ANN) are a type of machine learning model

HiCollege Click Here For More Notes 01

Compared to other ANN architectures, HebbNet is relatively simple and

Implementation of HebbNet using Python would require defining the

The perceptron consists of a single artificial neuron that takes in multiple

HiCollege Click Here For More Notes 02

Here are some key characteristics of Adaline:

Adaline consists of a single-layer neural network with one or more input

HiCollege Click Here For More Notes 03

Here are some key characteristics of a multilayer neural network:

Structure: A multilayer neural network typically consists of an input layer,

Neuron connectivity: In a multilayer neural network, neurons are organized

Learning nonlinearity: The presence of hidden layers and the use of

Universal approximator: A well-designed multilayer neural network with

Overfitting: Multilayer neural networks are prone to overfitting, which occurs

HiCollege Click Here For More Notes 04

Here are some key elements of neural network architecture:

4. Neurons and Activation Functions: Neurons are the fundamental units in a

5. Connections and Weights: Connections are the links or pathways through

HiCollege Click Here For More Notes 05

7. Dropouts: Dropouts are a regularization technique used in deep neural

8. Architectural Choices: The design of a neural network architecture depends

Example: A common architecture for a simple neural network might have an

HiCollege Click Here For More Notes 06

1. Sigmoid Activation Function: The sigmoid function applies a sigmoid curve

Formula: f(x) = 1 / (1 + exp(-x))

2. Hyperbolic Tangent (tanh) Activation Function: The tanh function is similar

Formula: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

3. Rectified Linear Unit (ReLU) Activation Function: ReLU is a widely used

Formula: f(x) = max(0, x)

4. Leaky ReLU Activation Function: Leaky ReLU is a modification of the ReLU

HiCollege Click Here For More Notes 07

6. Linear Activation Function: The linear function simply computes the

Formula: MSE = (1/n) * sum((y_pred - y_true)^2)

Formula: MAE = (1/n) * sum(abs(y_pred - y_true))

3. Binary Cross-Entropy (or Logistic Loss): Binary cross-entropy is commonly

HiCollege Click Here For More Notes 08

4. Categorical Cross-Entropy: Categorical cross-entropy is used in multi-class

Formula: CCE = - sum( y_true * log(y_pred) )

5. Sparse Categorical Cross-Entropy: Sparse categorical cross-entropy is similar

Formula: Hinge Loss = max(0, 1 - y_true * y_pred)

7. Kullback-Leibler Divergence (KL Divergence): KL divergence is a loss function

Formula: KL Divergence = sum( P(x) * log(P(x) / Q(x)) )

HiCollege Click Here For More Notes 09

2. Number of Hidden Layers and Neurons: The architecture of a neural network

3. Activation Functions: Different activation functions (such as sigmoid, tanh,

4. Regularization Parameters: Regularization techniques such as L1 or L2

5. Dropout Rate: Dropout is a regularization technique that randomly drops out

HiCollege Click Here For More Notes 10

Here is a high-level overview of the gradient descent algorithm:

1. Initialize the model's parameters with some initial values.

HiCollege Click Here For More Notes 11

The backpropagation algorithm consists of two main steps: forward

HiCollege Click Here For More Notes 12

The backpropagation algorithm can be visualized as propagating the error

Backpropagation commonly employs the gradient of the activation function

Backpropagation continues to iterate through the forward and backward

HiCollege Click Here For More Notes 13

HiCollege Click Here For More Notes 14

AVOIDING OVERFITTING THROUGH REGULARIZATION

Here are some common regularization techniques to avoid overfitting:

HiCollege Click Here For More Notes 15

HiCollege Click Here For More Notes 16

Natural Language Processing (NLP):

Speech and Speaker Recognition:

HiCollege Click Here For More Notes 17

You might also like