MLP U3
MLP U3
LEARNING
WITH PYTHON
SEMESTER 5
UNIT - 3
HI COLLEGE
SYLLABUS
UNIT - 3
HI COLLEGE
ARTIFICIAL NEURAL NETWORKS
HEBBNET
HebbNet, also known as Hebbian Networks, is a type of artificial neural network
(ANN) that follows the Hebbian learning rule.
The Hebbian learning rule states that "neurons that fire together wire
together". It means that when two connected neurons have correlated
activities, the strength of their connection is increased. This rule captures
the idea that the connections between neurons are strengthened through
repeated activation.
In HebbNet, the weights between neurons are adjusted based on the input
correlation. If two connected neurons are simultaneously firing or inhibiting
each other, the weight between them is increased. Conversely, if they have
opposite activities, the weight is decreased.
PERCEPTRON
A perceptron is a simple type of artificial neural network (ANN) that can be
used for binary classification tasks.
ADALINE
Adaline (Adaptive Linear Neuron) is another type of artificial neural network
(ANN) that is closely related to the perceptron. While the perceptron focuses on
binary classification, Adaline is primarily used for regression tasks or continuous
output prediction.
Activation function: Each neuron in the hidden layers and the output layer
applies an activation function to its inputs, transforming the input values
into a more useful form. Common activation functions used in MLPs include
sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU).
Training: The weights and biases of the network are initially assigned
randomly, and the network goes through a training process to adjust these
parameters. Backpropagation, a popular training algorithm, is commonly
used in MLPs. It calculates the gradient of the network's error with respect to
its weights and biases and then updates these parameters iteratively to
minimize the error.
1. Input Layer: The input layer accepts the input data and passes it to the next
layer. The number of neurons in the input layer depends on the number of
input features or variables.
2. Hidden Layers: Hidden layers are intermediary layers between the input and
output layers. They play a crucial role in capturing and learning complex
patterns and representations from the input data. The number of hidden layers
and the number of neurons in each hidden layer are design choices and
depend on the specific problem and data.
3. Output Layer: The output layer produces the final predictions or outputs of
the neural network. The number of neurons in the output layer depends on the
type of task. For example, in binary classification, there may be a single neuron
with a sigmoid activation function, while in multi-class classification, there may
be multiple neurons with softmax activation.
Formula: f(x) = max(0.01x, x) (or any small positive value instead of 0.01)
Formula: f(x_i) = exp(x_i) / sum(exp(x_j)) for each element x_i in the input
vector x
Formula: f(x) = x
LOSS FUNCTION
A loss function, also known as a cost function or objective function, measures
the discrepancy between the predicted output of a model and the true target
values. It quantifies how well the model is performing and guides the learning
process by updating the model's parameters to minimize the loss.
The choice of the loss function depends on the type of task being solved. Here
are some commonly used loss functions for different types of problems:
1. Mean Squared Error (MSE): MSE is a common loss function for regression
problems, where the aim is to predict continuous values. It calculates the
average squared difference between the predicted and true values.
2. Mean Absolute Error (MAE): MAE is another loss function for regression tasks.
It calculates the average absolute difference between the predicted and true
values. MAE provides a more interpretable measure as compared to MSE.
6. Hinge Loss: Hinge loss is used in binary and multi-class classification tasks,
particularly in support vector machines (SVMs). It encourages correct
classification by penalizing misclassifications based on a margin.
1. Learning Rate: This hyperparameter controls the step size or rate at which the
model updates its parameters during training. A larger learning rate can lead to
faster convergence, but if it is too large, the model may overshoot the optimal
solution. Conversely, a too small learning rate can slow down or prevent
convergence.
6. Batch Size: During training, the data is divided into batches for efficiency. The
batch size hyperparameter specifies the number of samples to be processed
before updating the model's parameters. Larger batch sizes can accelerate
training, but smaller batch sizes can lead to better generalization and faster
convergence for certain datasets.
GRADIENT DESCENT
Gradient descent is an optimization algorithm commonly used in machine
learning and deep learning to minimize the loss function and find the optimal
values for the model's parameters. It works by iteratively adjusting the
parameters in the direction of the steepest descent of the loss function.
The main idea behind gradient descent is to compute the gradient of the loss
function with respect to each parameter. The gradient represents the direction
of the steepest ascent of the loss function. Since we want to minimize the loss,
we move in the opposite direction of the gradient, which is the direction of
steepest descent.
Gradient descent is an iterative process that continues until the loss function is
minimized or until a stopping criterion is met. By updating the parameters in
the direction of the negative gradient, the algorithm slowly converges towards
the optimal parameter values that minimize the loss function and improve the
model's performance.
BACKPROPAGATION
Backpropagation, short for "backward propagation of errors," is a key algorithm
used to train artificial neural networks. It is a technique to compute the
gradients of the loss function with respect to the parameters of a neural
network efficiently. These gradients are then used to update the parameters
during the optimization process, typically using gradient descent.
1. Forward Propagation:
During forward propagation, the input data is passed through the neural
network, layer by layer, to compute the predicted output. Each layer
performs a weighted sum of the inputs, applies an activation function, and
passes the result to the next layer.
The computed output is compared with the ground truth labels to calculate
the loss (or cost) function, which quantifies the discrepancy between the
predicted and actual outputs.
1. Stochastic Backpropagation:
In traditional backpropagation, the gradients are computed using the entire
training dataset (batch gradient descent). In stochastic backpropagation,
the gradients are computed and updated for each individual training
sample.
This variant is computationally more efficient and works well for large
datasets. However, the updates can be noisy as they are based on individual
samples, which can introduce more variability in the optimization process.
2. Mini-Batch Backpropagation:
Mini-batch backpropagation is a compromise between batch gradient
descent and stochastic backpropagation.
Instead of using the entire training dataset or just a single sample, this
variant computes the gradients and updates the parameters using a small
batch of training samples.
This approach balances the computational efficiency of stochastic
backpropagation and the stability of batch gradient descent.
3. Hessian Backpropagation:
Hessian backpropagation takes into account the curvature information of
the loss function by computing and using the Hessian matrix during the
parameter updates.
The Hessian matrix provides additional information about the shape of the
loss function and can lead to more efficient optimization.
However, computing and manipulating the Hessian matrix can be
computationally expensive, especially for large neural networks.
5. Levenberg-Marquardt Algorithm:
The Levenberg-Marquardt algorithm is a specialized method for training
neural networks with a form of backpropagation called the Gauss-Newton
method.
This algorithm utilizes second-order derivatives (the Hessian matrix) to
compute efficient weight updates.
It is especially useful in situations where the neural network is trained on
continuous-valued output data and the underlying model is non-linear.
1. L1 and L2 Regularization:
L1 and L2 regularization are two widely used techniques.
L1 regularization adds the sum of the absolute values of the model weights
multiplied by a regularization parameter to the loss function. It encourages
the model to have sparse weights, i.e., many weights become close to zero,
effectively performing feature selection.
L2 regularization adds the sum of squares of the model weights multiplied
by a regularization parameter to the loss function. It encourages the model
to have small weights, spreading the influence of each feature across
multiple weights.
Both regularization techniques help prevent overfitting by reducing the
model's reliance on any single feature and mitigating the impact of noisy or
irrelevant features.
3. Early Stopping:
Early stopping is a simple yet effective regularization technique based on
monitoring the model's performance on a validation dataset during training.
The training process is stopped early if the model's performance on the
validation dataset starts deteriorating.
This prevents the model from overfitting by finding the point where further
training is likely to lead to generalization loss.
Early stopping saves computation time as the model does not have to
continue training until convergence but rather stops at an optimal point.
4. Data Augmentation:
Data augmentation is a technique commonly used in computer vision tasks
to prevent overfitting.
It involves generating additional training examples by applying random
transformations to the existing data, such as random rotations, flips,
translations, or adding noise.
Data augmentation increases the size and diversity of the training dataset,
making the model more robust and less prone to overfitting to specific
patterns in the original data.
5. Batch Normalization:
Batch normalization is a regularization technique often used in deep neural
networks.
It normalizes the intermediate outputs of the network by subtracting the
batch mean and scaling by the batch standard deviation.
Batch normalization reduces the internal covariate shift, helping the model
converge faster and making it less sensitive to the weight initialization.
By providing a form of regularization, batch normalization can also help
prevent overfitting.
Recommender Systems:
Neural networks power recommender systems predicting user
preferences for personalized recommendations.
Collaborative Filtering methods like matrix factorization and neural
collaborative filtering analyze user-item interactions, offering precise
suggestions.
Applications encompass product recommendations, movie/music
suggestions, and personalized content delivery.
Financial Forecasting:
Neural networks find applications in financial markets for tasks like stock
market prediction, asset price forecasting, and credit scoring.
Models employing recurrent or feedforward neural networks capture
patterns and trends in financial data.
Applications span stock market prediction, algorithmic trading, credit
risk assessment, fraud detection, and investment portfolio optimization.