Open In App

Least Mean-Squares Algorithm in Neural Networks

Last Updated : 15 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The Least Mean-Squares (LMS) algorithm is a widely used adaptive filter technique in neural networks, signal processing, and control systems. Developed by Bernard Widrow and Ted Hoff in 1960, the LMS algorithm is a stochastic gradient descent method that iteratively updates filter coefficients to minimize the mean square error between the desired and actual signals. This article provides a detailed technical overview of the LMS algorithm, its applications, and its significance in neural networks.

Introduction to Least Mean-Squares (LMS) Algorithm

The Least Mean Squares (LMS) method is an adaptive algorithm widely used for finding the coefficients of a filter that will minimize the mean square error between the desired signal and the actual signal. It is mainly utilized in training algorithms such as gradient descent, where the network finalizes a target function by iteratively adjusting its weights w.r.t. the error between predicted and actual outputs.

Neural networks are composed of simple input/output units called neurons. The input and output units in a neural network are interconnected, and each connection has an associated weight. It can be used for both classification and regression. In this article, we will discuss the least mean-square algorithm and how to construct a neural network based on the LMS algorithm.

Key Concepts:

  • Adaptive Filtering: Adaptive filters adjust their coefficients based on the input signal. The LMS algorithm is an example of an adaptive filter.
  • Mean Square Error (MSE): This is the criterion the LMS algorithm aims to minimize. MSE is the expectation of the square of the error signal.
  • Error Signal (e(n)): The difference between the desired signal (d(n)) and the output of the filter (y(n)). e(n) = d(n) - x^T(n)w(n)
  • Filter Coefficients (w(n)): The parameters of the filter that are updated iteratively to minimize the MSE.

Mathematical Foundation of LMS algorithm

LMS algorithm is based on using the immediate values of cost function \varepsilon(w) which can be expressed in terms of error signal as:

\varepsilon(w) = \frac{1}{2}e^2(n)

where,

  • e(n) is the error calculated by the difference between desired and target output values.
  • e^2(n) is used here because we are using squared error and \frac{1}{2} is used for simplifying the calculation purpose.

Error e(n) = d(n) - x^T(n)w(n)

where

  • d(n)-desired output value
  • x^T(n)- transpose of input vector x
  • w(n) - weight vector

So, cost function can be written as:

\varepsilon (w) = \frac{1}{2}(d(n)-x^T(n)w(n))^2

differentiating the cost function \varepsilon with respect to weight vector w:

  • \frac{\delta\varepsilon(w)}{\delta w(n)} = e(n)\frac{\delta e(n)}{\delta w(n)}
  • \frac{\delta e(n)}{\delta w(n)} = -x(n)
  • So, \frac{\delta \varepsilon(w)}{\delta w(n)} = -x(n)e(n) = g'(n)

where g'(n) is the estimate of gradient vector

LMS Algorithm in Steepest Descent Method

Steepest descent method is a general optimization technique used to find the minimum of a function. It iteratively updates the parameters in the direction of the negative gradient of cost function.

In this method, weight updating works as:

w(n+1) = w(n) - \eta g(n) where \eta is the learning rate parameter and g(n) is the gradient evaluated at vector point w(n)

\Delta w(n) = w(n+1) - w(n) = -\eta g(n)

LMS algorithm is the specific implementation of the steepest descent method, mostly used for adaptive filtering and linear regression. In other words, we can say that LMS algorithm is the application of steepest descent.

Accordingly, we can write weight update rule in LMS as:

w'(n+1) = w'(n) - \eta g'(n)

Substituting the value of g'(n) calculated before, we get:

w'(n+1) = w'(n) + \eta x(n)e(n)

where,

  • w'(n) is the estimate of weight vector.
  • and \eta is the learning rate parameter.

Now substituting the value of error signal e(n):

w'(n+1) = w'(n) + \eta x(n)[d(n)-x^T(n)w'(n)]

Taking w'(n) as common and solving the equation as:

w'(n+1) = [I - \eta x(n)x^T(n)] w'(n) + \eta x(n)d(n)

where,

  • I is the Identity matrix, i.e. a square matrix containing 1 in its all diagonal values.
  • w'(n) = z^{-1}[w'(n+1)]
  • where z^{-1} is a unit delay operator. It represents a delay of one time step.

Convergence Consideration of LMS algorithm

Convergence refers to the algorithm's ability to reach a steady state where the error signal becomes minimal. For the LMS algorithm, convergence behavior of LMS algorithm depends on the input vector x(n) and learning rate parameter value \eta.

Conditions for Convergence in Mean-Square:

LMS algorithm is convergent in mean-square when \eta satisfies the following conditions:

  • 0<\eta < \frac{2}{\lambda_{max}}
  • \lambda_{max} is the largest eigenvalue of correlation matrix R_x
  • 0<\eta<\frac{2}{tr[R_x]} where tr[R_x] is the trace of correlation matrix R_x

Stability of the LMS Algorithm

Stability refers to the algorithm's ability to produce bounded outputs over time. For the LMS algorithm, stability is closely related to the learning rate parameter value \eta.

For stability , \eta must satisfy the following condition:

0<\eta<\frac{1}{\lambda_{max}}

This condition ensures that the weight updates do not diverge and remain bounded.

Stability also involves considering the trade-off between convergence speed and steady-state error. A larger \eta leads to the faster convergence but higher steady-state error, while a smaller \eta results in slower convergence but lower steady-state error.

Workings of Least Mean-Squares Algorithm in Neural Networks

1. Initialization: Set the initial filter coefficients to zero and define other necessary parameters such as learning rate parameter \eta and number of iterations.

2. Iteration (for each time step n):

  1. Compute the Filter Output: y(n) = w^T(n).x(n) : This is the output of the filter for the input signal x(n) using the current filter coefficients w(n).
  2. Compute the Error Signal: e(n) = d(n)-y(n) : The error signal is the difference between the desired signal d(n) and the actual output y(n) of the filter.
  3. Update the Filter Coefficients: w(n+1) = w(n) + \eta x(n) x^T(n) : The filter coefficients are updated using the error signal, the step-size parameter, and the input signal. This update rule is derived from the gradient descent optimization method, aiming to minimize the mean squared error.

Let's discuss the signal flow graph. The diagram is shown below:

LMS-algorithm-in-signal-flow-graph
Signal flow graph

Explanation of the Signal Flow Graph:

  • The signal \eta x(n) d(n) represents the scaled product of the input signal and the desired signal.
  • The summing junction calculates the error signal e(n) by subtracting the product of the filter output and the input signal from the desired signal.
  • The product \eta x(n) x^T(n) is used to update the filter coefficients w(n+1).
  • The delay element z^{-1} represents the update of the filter coefficients for the next time step.

Important points about LMS algorithm:

  1. Feedback loop of weight vector w'(n) acts as low pass filter. It allows low frequency error signal components to pass while reducing high frequency ones.
  2. Average time constant of this filtering is inversely proportional to \eta.
  3. In steepest-descent algorithm, w(n) follows the well-defined path but in LMS algorithm, w'(n) follows random path.
  4. LMS algorithm is also known as "Stochastic gradient algorithm".

Implementing Least Mean-Squares Algorithm for Linear Regression

Let's implement a neural network based on Least Mean Square (LMS) algorithm.

1. Define the LMS Learning Algorithm:

  • The lms_learning function takes the input features (X), target values (y), learning rate, and number of epochs.
  • It initializes random weights, iterates over the dataset for the specified number of epochs, calculates predictions, computes errors, and updates the weights using the LMS update rule.

Initialization:

# get the number of features 
num_features = X.shape[1]
# Initialize weights randomly
weights = np.random.randn(num_features)
# learning rate
learning_rate=0.01
# no of iterations
epochs=100

Initialize random weights for each feature and set the learning rate and no of iterations.

LMS Rule:

# prediction
prediction = np.dot(X[i], weights) # forward pass
# error calculation
error = y[i] - prediction
# update weight
weights += learning_rate * error * X[i] # backward propogation

The neural network will Iterates over the dataset for the specified number of epochs and apply the LMS rule for each dataset.

2. Implement Neural Network Based on LMS Algorithm

Let's create a class based neural network based on LMS algorithm. The code is as follows:

Python
import numpy as np

class NeuralNet(object):

    def __init__(self, num_features):
        # Initialize weights randomly
        self.weights = np.random.randn(num_features)

    def forward_pass(self, X):
        # Implement the forward pass
        prediction = self.predict(X)
        return prediction

    def error_function(self, y, prediction):
        # Implement the error function
        error = y - prediction
        return error

    def backward_prop(self, error, learning_rate, X):
        # Implement the backward propagation
        self.weights += learning_rate * error * X
        return self.weights

    def train(self, X, y, learning_rate=0.01, epochs=100):
        # Train the neural network using LMS algorithm
        for epoch in range(epochs):
          for i in range(X.shape[0]):
            prediction = self.forward_pass(X[i])
            error = self.error_function(y[i], prediction)
            weights = self.backward_prop(error, learning_rate, X[i])
        return weights

    def predict(self, X):
        prediction = np.dot(X, self.weights)
        return prediction

The neural network trains the model for n number of iterations(epochs) and apply forward pass, compute error and backward propagation based on LMS algorithm.

3. Generate Random Dataset

We can generate a random dataset using numpy.

  • Generate synthetic data using NumPy.
  • X is a matrix of 100 samples with 2 features.
  • true_weights represent the true weights used for generating y.
  • Noise is added to y to simulate real data.
Python
# Generate some synthetic training data
np.random.seed(42)  # Using a different seed for uniqueness
X = np.random.randn(100, 2)  # 100 samples with 2 features
# Different true weights for generating y
true_weights = np.array([4, -2])
# Add noise to simulate real data
y = np.dot(X, true_weights) + np.random.randn(100) * 0.5

4. Add a Bias Term

To account for the intercept term in the linear model, we add a column of ones to X.

Python
# Add a bias term (constant feature) to X
X = np.concatenate([X, np.ones((X.shape[0], 1))], axis=1)

5. Apply the LMS based Neural Network to Learn Weights

The lms_learning function is called with the generated data X and y to learn the weights over the specified number of epochs.

Python
# LMS based Neural Network
neural_net = NeuralNet(X.shape[1])
learned_weights = neural_net.train(X, y)

print("True Weights:", true_weights)
print("Learned Weights:", learned_weights)

Full Implementation Code

The complete code is as follows:

Python
import numpy as np

class NeuralNet(object):

    def __init__(self, num_features):
        # Initialize weights randomly
        self.weights = np.random.randn(num_features)

    def forward_pass(self, X):
        # Implement the forward pass
        prediction = self.predict(X)
        return prediction

    def error_function(self, y, prediction):
        # Implement the error function
        error = y - prediction
        return error

    def backward_prop(self, error, learning_rate, X):
        # Implement the backward propagation
        self.weights += learning_rate * error * X
        return self.weights

    def train(self, X, y, learning_rate=0.01, epochs=100):
        # Train the neural network using LMS algorithm
        for epoch in range(epochs):
          for i in range(X.shape[0]):
            prediction = self.forward_pass(X[i])
            error = self.error_function(y[i], prediction)
            weights = self.backward_prop(error, learning_rate, X[i])
        return weights

    def predict(self, X):
        prediction = np.dot(X, self.weights)
        return prediction


# Generate some synthetic training data
np.random.seed(42)  # Using a different seed for uniqueness
X = np.random.randn(100, 2)  # 100 samples with 2 features
# Different true weights for generating y
true_weights = np.array([4, -2]) 
# Add noise to simulate real data
y = np.dot(X, true_weights) + np.random.randn(100) * 0.5

# Add a bias term (constant feature) to X
X = np.concatenate([X, np.ones((X.shape[0], 1))], axis=1)

# Neural Network
neural_net = NeuralNet(X.shape[1])
learned_weights = neural_net.train(X, y)

print("True Weights:", true_weights)
print("Learned Weights:", learned_weights)

Output:

True Weights: [ 4 -2]
Learned Weights: [ 4.10457699 -2.08895075 0.03630361]

Advantages and Limitations of LMS algorithm

Advantages:

  1. Simplicity and Implementation: The LMS algorithm is straightforward to implement, making it accessible for various applications without the need for complex computations
  2. Robustness: It is robust and performs well even in the presence of noise, making it suitable for real-world signal processing tasks.
  3. Adaptability: The LMS algorithm is model-independent.
  4. Optimality: It can work in both stationary and non-stationary environment.

Limitations:

  1. Convergence Rate: The LMS algorithm has a relatively slow convergence rate, necessitating many iterations to reach the optimal solution..
  2. Sensitivity to Eigenvalue Spread: The algorithm is sensitive to variations in the eigenvalue spread of the correlation matrix R_x.

Conclusion

The LMS algorithm is fundamental in many practical applications due to its balance between simplicity, computational efficiency, and effectiveness in adaptive filtering tasks. By iteratively adjusting filter coefficients to minimize the mean square error, the LMS algorithm finds widespread use in areas such as noise cancellation, system identification, and channel equalization.


Next Article

Similar Reads