Least Mean-Squares Algorithm in Neural Networks
Last Updated :
15 Jul, 2024
The Least Mean-Squares (LMS) algorithm is a widely used adaptive filter technique in neural networks, signal processing, and control systems. Developed by Bernard Widrow and Ted Hoff in 1960, the LMS algorithm is a stochastic gradient descent method that iteratively updates filter coefficients to minimize the mean square error between the desired and actual signals. This article provides a detailed technical overview of the LMS algorithm, its applications, and its significance in neural networks.
Introduction to Least Mean-Squares (LMS) Algorithm
The Least Mean Squares (LMS) method is an adaptive algorithm widely used for finding the coefficients of a filter that will minimize the mean square error between the desired signal and the actual signal. It is mainly utilized in training algorithms such as gradient descent, where the network finalizes a target function by iteratively adjusting its weights w.r.t. the error between predicted and actual outputs.
Neural networks are composed of simple input/output units called neurons. The input and output units in a neural network are interconnected, and each connection has an associated weight. It can be used for both classification and regression. In this article, we will discuss the least mean-square algorithm and how to construct a neural network based on the LMS algorithm.
Key Concepts:
- Adaptive Filtering: Adaptive filters adjust their coefficients based on the input signal. The LMS algorithm is an example of an adaptive filter.
- Mean Square Error (MSE): This is the criterion the LMS algorithm aims to minimize. MSE is the expectation of the square of the error signal.
- Error Signal (e(n)): The difference between the desired signal (d(n)) and the output of the filter (y(n)). e(n) = d(n) - x^T(n)w(n)
- Filter Coefficients (w(n)): The parameters of the filter that are updated iteratively to minimize the MSE.
Mathematical Foundation of LMS algorithm
LMS algorithm is based on using the immediate values of cost function \varepsilon(w) which can be expressed in terms of error signal as:
\varepsilon(w) = \frac{1}{2}e^2(n)
where,
- e(n) is the error calculated by the difference between desired and target output values.
- e^2(n) is used here because we are using squared error and \frac{1}{2} is used for simplifying the calculation purpose.
Error e(n) = d(n) - x^T(n)w(n)
where
- d(n)-desired output value
- x^T(n)- transpose of input vector x
- w(n) - weight vector
So, cost function can be written as:
\varepsilon (w) = \frac{1}{2}(d(n)-x^T(n)w(n))^2
differentiating the cost function \varepsilon with respect to weight vector w:
- \frac{\delta\varepsilon(w)}{\delta w(n)} = e(n)\frac{\delta e(n)}{\delta w(n)}
- \frac{\delta e(n)}{\delta w(n)} = -x(n)
- So, \frac{\delta \varepsilon(w)}{\delta w(n)} = -x(n)e(n) = g'(n)
where g'(n) is the estimate of gradient vector
LMS Algorithm in Steepest Descent Method
Steepest descent method is a general optimization technique used to find the minimum of a function. It iteratively updates the parameters in the direction of the negative gradient of cost function.
In this method, weight updating works as:
w(n+1) = w(n) - \eta g(n) where \eta is the learning rate parameter and g(n) is the gradient evaluated at vector point w(n)
\Delta w(n) = w(n+1) - w(n) = -\eta g(n)
LMS algorithm is the specific implementation of the steepest descent method, mostly used for adaptive filtering and linear regression. In other words, we can say that LMS algorithm is the application of steepest descent.
Accordingly, we can write weight update rule in LMS as:
w'(n+1) = w'(n) - \eta g'(n)
Substituting the value of g'(n) calculated before, we get:
w'(n+1) = w'(n) + \eta x(n)e(n)
where,
- w'(n) is the estimate of weight vector.
- and \eta is the learning rate parameter.
Now substituting the value of error signal e(n):
w'(n+1) = w'(n) + \eta x(n)[d(n)-x^T(n)w'(n)]
Taking w'(n) as common and solving the equation as:
w'(n+1) = [I - \eta x(n)x^T(n)] w'(n) + \eta x(n)d(n)
where,
- I is the Identity matrix, i.e. a square matrix containing 1 in its all diagonal values.
- w'(n) = z^{-1}[w'(n+1)]
- where z^{-1} is a unit delay operator. It represents a delay of one time step.
Convergence Consideration of LMS algorithm
Convergence refers to the algorithm's ability to reach a steady state where the error signal becomes minimal. For the LMS algorithm, convergence behavior of LMS algorithm depends on the input vector x(n) and learning rate parameter value \eta.
Conditions for Convergence in Mean-Square:
LMS algorithm is convergent in mean-square when \eta satisfies the following conditions:
- 0<\eta < \frac{2}{\lambda_{max}}
- \lambda_{max} is the largest eigenvalue of correlation matrix R_x
- 0<\eta<\frac{2}{tr[R_x]} where tr[R_x] is the trace of correlation matrix R_x
Stability of the LMS Algorithm
Stability refers to the algorithm's ability to produce bounded outputs over time. For the LMS algorithm, stability is closely related to the learning rate parameter value \eta.
For stability , \eta must satisfy the following condition:
0<\eta<\frac{1}{\lambda_{max}}
This condition ensures that the weight updates do not diverge and remain bounded.
Stability also involves considering the trade-off between convergence speed and steady-state error. A larger \eta leads to the faster convergence but higher steady-state error, while a smaller \eta results in slower convergence but lower steady-state error.
Workings of Least Mean-Squares Algorithm in Neural Networks
1. Initialization: Set the initial filter coefficients to zero and define other necessary parameters such as learning rate parameter \eta and number of iterations.
2. Iteration (for each time step n):
- Compute the Filter Output: y(n) = w^T(n).x(n) : This is the output of the filter for the input signal x(n) using the current filter coefficients w(n).
- Compute the Error Signal: e(n) = d(n)-y(n) : The error signal is the difference between the desired signal d(n) and the actual output y(n) of the filter.
- Update the Filter Coefficients: w(n+1) = w(n) + \eta x(n) x^T(n) : The filter coefficients are updated using the error signal, the step-size parameter, and the input signal. This update rule is derived from the gradient descent optimization method, aiming to minimize the mean squared error.
Let's discuss the signal flow graph. The diagram is shown below:
Signal flow graphExplanation of the Signal Flow Graph:
- The signal \eta x(n) d(n) represents the scaled product of the input signal and the desired signal.
- The summing junction calculates the error signal e(n) by subtracting the product of the filter output and the input signal from the desired signal.
- The product \eta x(n) x^T(n) is used to update the filter coefficients w(n+1).
- The delay element z^{-1} represents the update of the filter coefficients for the next time step.
Important points about LMS algorithm:
- Feedback loop of weight vector w'(n) acts as low pass filter. It allows low frequency error signal components to pass while reducing high frequency ones.
- Average time constant of this filtering is inversely proportional to \eta.
- In steepest-descent algorithm, w(n) follows the well-defined path but in LMS algorithm, w'(n) follows random path.
- LMS algorithm is also known as "Stochastic gradient algorithm".
Implementing Least Mean-Squares Algorithm for Linear Regression
Let's implement a neural network based on Least Mean Square (LMS) algorithm.
1. Define the LMS Learning Algorithm:
- The lms_learning function takes the input features (X), target values (y), learning rate, and number of epochs.
- It initializes random weights, iterates over the dataset for the specified number of epochs, calculates predictions, computes errors, and updates the weights using the LMS update rule.
Initialization:
# get the number of features
num_features = X.shape[1]
# Initialize weights randomly
weights = np.random.randn(num_features)
# learning rate
learning_rate=0.01
# no of iterations
epochs=100
Initialize random weights for each feature and set the learning rate and no of iterations.
LMS Rule:
# prediction
prediction = np.dot(X[i], weights) # forward pass
# error calculation
error = y[i] - prediction
# update weight
weights += learning_rate * error * X[i] # backward propogation
The neural network will Iterates over the dataset for the specified number of epochs and apply the LMS rule for each dataset.
2. Implement Neural Network Based on LMS Algorithm
Let's create a class based neural network based on LMS algorithm. The code is as follows:
Python
import numpy as np
class NeuralNet(object):
def __init__(self, num_features):
# Initialize weights randomly
self.weights = np.random.randn(num_features)
def forward_pass(self, X):
# Implement the forward pass
prediction = self.predict(X)
return prediction
def error_function(self, y, prediction):
# Implement the error function
error = y - prediction
return error
def backward_prop(self, error, learning_rate, X):
# Implement the backward propagation
self.weights += learning_rate * error * X
return self.weights
def train(self, X, y, learning_rate=0.01, epochs=100):
# Train the neural network using LMS algorithm
for epoch in range(epochs):
for i in range(X.shape[0]):
prediction = self.forward_pass(X[i])
error = self.error_function(y[i], prediction)
weights = self.backward_prop(error, learning_rate, X[i])
return weights
def predict(self, X):
prediction = np.dot(X, self.weights)
return prediction
The neural network trains the model for n number of iterations(epochs) and apply forward pass, compute error and backward propagation based on LMS algorithm.
3. Generate Random Dataset
We can generate a random dataset using numpy.
- Generate synthetic data using NumPy.
- X is a matrix of 100 samples with 2 features.
- true_weights represent the true weights used for generating y.
- Noise is added to y to simulate real data.
Python
# Generate some synthetic training data
np.random.seed(42) # Using a different seed for uniqueness
X = np.random.randn(100, 2) # 100 samples with 2 features
# Different true weights for generating y
true_weights = np.array([4, -2])
# Add noise to simulate real data
y = np.dot(X, true_weights) + np.random.randn(100) * 0.5
4. Add a Bias Term
To account for the intercept term in the linear model, we add a column of ones to X.
Python
# Add a bias term (constant feature) to X
X = np.concatenate([X, np.ones((X.shape[0], 1))], axis=1)
5. Apply the LMS based Neural Network to Learn Weights
The lms_learning function is called with the generated data X and y to learn the weights over the specified number of epochs.
Python
# LMS based Neural Network
neural_net = NeuralNet(X.shape[1])
learned_weights = neural_net.train(X, y)
print("True Weights:", true_weights)
print("Learned Weights:", learned_weights)
Full Implementation Code
The complete code is as follows:
Python
import numpy as np
class NeuralNet(object):
def __init__(self, num_features):
# Initialize weights randomly
self.weights = np.random.randn(num_features)
def forward_pass(self, X):
# Implement the forward pass
prediction = self.predict(X)
return prediction
def error_function(self, y, prediction):
# Implement the error function
error = y - prediction
return error
def backward_prop(self, error, learning_rate, X):
# Implement the backward propagation
self.weights += learning_rate * error * X
return self.weights
def train(self, X, y, learning_rate=0.01, epochs=100):
# Train the neural network using LMS algorithm
for epoch in range(epochs):
for i in range(X.shape[0]):
prediction = self.forward_pass(X[i])
error = self.error_function(y[i], prediction)
weights = self.backward_prop(error, learning_rate, X[i])
return weights
def predict(self, X):
prediction = np.dot(X, self.weights)
return prediction
# Generate some synthetic training data
np.random.seed(42) # Using a different seed for uniqueness
X = np.random.randn(100, 2) # 100 samples with 2 features
# Different true weights for generating y
true_weights = np.array([4, -2])
# Add noise to simulate real data
y = np.dot(X, true_weights) + np.random.randn(100) * 0.5
# Add a bias term (constant feature) to X
X = np.concatenate([X, np.ones((X.shape[0], 1))], axis=1)
# Neural Network
neural_net = NeuralNet(X.shape[1])
learned_weights = neural_net.train(X, y)
print("True Weights:", true_weights)
print("Learned Weights:", learned_weights)
Output:
True Weights: [ 4 -2]
Learned Weights: [ 4.10457699 -2.08895075 0.03630361]
Advantages and Limitations of LMS algorithm
Advantages:
- Simplicity and Implementation: The LMS algorithm is straightforward to implement, making it accessible for various applications without the need for complex computations
- Robustness: It is robust and performs well even in the presence of noise, making it suitable for real-world signal processing tasks.
- Adaptability: The LMS algorithm is model-independent.
- Optimality: It can work in both stationary and non-stationary environment.
Limitations:
- Convergence Rate: The LMS algorithm has a relatively slow convergence rate, necessitating many iterations to reach the optimal solution..
- Sensitivity to Eigenvalue Spread: The algorithm is sensitive to variations in the eigenvalue spread of the correlation matrix R_x.
Conclusion
The LMS algorithm is fundamental in many practical applications due to its balance between simplicity, computational efficiency, and effectiveness in adaptive filtering tasks. By iteratively adjusting filter coefficients to minimize the mean square error, the LMS algorithm finds widespread use in areas such as noise cancellation, system identification, and channel equalization.
Similar Reads
ANN - Self Organizing Neural Network (SONN) Learning Algorithm Prerequisite: ANN | Self Organizing Neural Network (SONN) In the Self Organizing Neural Network (SONN), learning is performed by shifting the weights from inactive connections to active ones. The neurons which were won are selected to learn along with their neighborhood neurons. If a neuron does not
3 min read
Least Mean Squares Filter in Signal Processing Filtering is a fundamental process in signal processing used to enhance or extract useful information from a signal by reducing noise, isolating certain frequency components, or performing other transformations. Filters are employed in various applications such as audio processing, image enhancement
6 min read
Learning Rate in Neural Network In machine learning, parameters play a vital role for helping a model learn effectively. Parameters are categorized into two types: machine-learnable parameters and hyper-parameters. Machine-learnable parameters are estimated by the algorithm during training, while hyper-parameters, such as the lear
5 min read
Batch Size in Neural Network Batch size is a hyperparameter that determines the number of training records used in one forward and backward pass of the neural network. In this article, we will explore the concept of batch size, its impact on training, and how to choose the optimal batch size.Prerequisites: Neural Network, Gradi
5 min read
Architecture and Learning process in neural network In order to learn about Backpropagation, we first have to understand the architecture of the neural network and then the learning process in ANN. So, let's start about knowing the various architectures of the ANN: Architectures of Neural Network: ANN is a computational system consisting of many inte
9 min read
Weights and Bias in Neural Networks Machine learning, with its ever-expanding applications in various domains, has revolutionized the way we approach complex problems and make data-driven decisions. At the heart of this transformative technology lies neural networks, computational models inspired by the human brain's architecture. Neu
13 min read
Neural Architecture Search Algorithm Neural Architecture Search (NAS) falls within the realm of automated machine learning (AutoML). AutoML is a comprehensive term encompassing the automation of diverse tasks in the application of machine learning to real-world challenges. The article explores the fundamentals, and applications of the
11 min read
Auto-associative Neural Networks Auto associative Neural networks are the types of neural networks whose input and output vectors are identical. These are special kinds of neural networks that are used to simulate and explore the associative process. Association in this architecture comes from the instruction of a set of simple pro
3 min read
Feedforward Neural Networks (FNNs) in R Feedforward Neural Networks (FNNs) are a type of artificial neural network where connections between nodes do not form a cycle. This means that data moves in one directionâforwardâfrom the input layer through the hidden layers to the output layer. These networks are often used for tasks such as clas
6 min read
Layers in Artificial Neural Networks (ANN) In Artificial Neural Networks (ANNs), data flows from the input layer to the output layer through one or more hidden layers. Each layer consists of neurons that receive input, process it, and pass the output to the next layer. The layers work together to extract features, transform data, and make pr
4 min read