Building Artificial Neural Networks (ANN) from Scratch

Last Updated : 03 Jun, 2025

Artificial Neural Networks (ANNs) are a collection of interconnected layers of neurons. It includes:

Input Layer: Receives input features.
Hidden Layers: Process information through weighted connections and activation functions.
Output Layer: Produces the final prediction.
Weights and Biases: Trainable parameters that adjust during learning.
Activation Functions: Introduces non-linearity which allows the network to learn complex patterns.

Let's build an ANN from scratch using Python and NumPy without relying on deep learning libraries such as TensorFlow or PyTorch. This approach will help in better understanding of the workings of neural networks.

Step 1: Importing Necessary Libraries

We will use NumPy to handle numerical computations efficiently.

Python

import numpy as np

Step 2: Initializing the Neural Network

Sets initial weights and biases for a two-layer neural network.
Uses np.random.seed(42) for reproducible results.
Weights (W1, W2) initialized with small random values scaled by 0.01 to avoid large initial weights.
W1 shape: (hidden layer size, input layer size).
W2 shape: (output layer size, hidden layer size).
Biases (b1, b2) initialized to zero vectors matching their layer sizes.

Python

def initialize_parameters(input_size, hidden_size, output_size):
    np.random.seed(42)  # For reproducibility
    parameters = {
        "W1": np.random.randn(hidden_size, input_size) * 0.01,
        "b1": np.zeros((hidden_size, 1)),
        "W2": np.random.randn(output_size, hidden_size) * 0.01,
        "b2": np.zeros((output_size, 1))
    }
    return parameters

Step 3: Defining Activation Functions

Activation functions introduce non-linearity into the model, helping it learn complex patterns. We here are using:

ReLU for the hidden layer
Sigmoid for the output layer.

Python

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

def relu(Z):
    return np.maximum(0, Z)

def relu_derivative(Z):
    return (Z > 0).astype(int)

Step 4: Forward Propagation

In Forward propagation the function computes the output of the neural network for a given input X and parameters.

First, it calculates the linear combination Z1 for the hidden layer by multiplying the input X with the weights W1 and adding bias b1.
It then applies the ReLU activation function to Z1 producing the hidden layer activations A1.
Next, it calculates the linear combination Z2 for the output layer by multiplying A1 with W2 and adding b2.
The sigmoid activation function is applied to Z2 to produce the final output A2.
The function returns the output A2 along with a cache containing intermediate values needed for backpropagation.

Python

def forward_propagation(X, parameters):
    W1, b1, W2, b2 = parameters["W1"], parameters["b1"], parameters["W2"], parameters["b2"]
    
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    
    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
    return A2, cache

Step 5: Computing the Cost

Cost function calculates the binary cross-entropy loss which measures how well the neural network’s predictions A2 match the true labels Y.

m is the number of examples.
np.squeeze removes any extra dimensions, returning the cost as a scalar.

Python

def compute_cost(Y, A2):
    m = Y.shape[1]
    cost = -np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2)) / m
    return np.squeeze(cost)

Step 6: Backpropagation

Backpropagation computes the gradients needed to update the network parameters during training.

It calculates the error at the output layer (dZ2) as the difference between predicted outputs (A2) and true labels (Y).
Using this error, it computes gradients of the weights (dW2) and biases (db2) for the output layer.
Then, it backpropagates the error to the hidden layer by multiplying with the transpose of W2 and element-wise with the derivative of the ReLU activation (relu_derivative).
Finally, it calculates gradients for the hidden layer weights (dW1) and biases (db1).
All gradients are averaged over the number of examples m to ensure stable updates.

Python

def backward_propagation(X, Y, parameters, cache):
    m = X.shape[1]
    W2 = parameters["W2"]
    
    dZ2 = cache["A2"] - Y
    dW2 = np.dot(dZ2, cache["A1"].T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    
    dZ1 = np.dot(W2.T, dZ2) * relu_derivative(cache["Z1"])
    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
    return grads

Step 7: Updating Parameters

Gradient descent updates the parameters using the computed gradients and a learning rate.

Python

def update_parameters(parameters, grads, learning_rate):
    for key in parameters.keys():
        parameters[key] -= learning_rate * grads["d" + key]
    return parameters

Step 8: Training the Neural Network

We train the neural network over multiple iterations, updating parameters using backpropagation and gradient descent.

Python

def train_neural_network(X, Y, input_size, hidden_size, output_size, epochs=1000, learning_rate=0.01):
    parameters = initialize_parameters(input_size, hidden_size, output_size)
    
    for i in range(epochs):
        A2, cache = forward_propagation(X, parameters)
        cost = compute_cost(Y, A2)
        grads = backward_propagation(X, Y, parameters, cache)
        parameters = update_parameters(parameters, grads, learning_rate)
        
        if i % 100 == 0:
            print(f"Epoch {i}: Cost = {cost}")
    
    return parameters

Step 9: Making Predictions

The trained model predicts outputs by performing forward propagation and applying a threshold of 0.5.

Python

def predict(X, parameters):
    A2, _ = forward_propagation(X, parameters)
    return (A2 > 0.5).astype(int)

Step 10: Testing the Model

We test the model using an AND logic gate dataset.

Python

# Example data (AND logic gate)
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])
Y = np.array([[0, 0, 0, 1]])

trained_parameters = train_neural_network(X, Y, input_size=2, hidden_size=4, output_size=1, epochs=10000, learning_rate=0.1)

predictions = predict(X, trained_parameters)
print("Predictions:", predictions)

Output:

The neural network started with random weights and a high error. Over 10,000 epochs, it optimized its weights and biases using gradient descent. The cost function continuously decreased, confirming effective learning. The final predictions match the expected AND gate truth table, proving that the network has successfully generalized the AND logic.

Introduction to Artificial Neural Networks (ANNs)

alka1974

Improve

Article Tags :

Building Artificial Neural Networks (ANN) from Scratch

Step 1: Importing Necessary Libraries

Step 2: Initializing the Neural Network

Step 3: Defining Activation Functions

Step 4: Forward Propagation

Step 5: Computing the Cost

Step 6: Backpropagation

Step 7: Updating Parameters

Step 8: Training the Neural Network

Step 9: Making Predictions

Step 10: Testing the Model

Similar Reads

Thank You!

What kind of Experience do you want to share?