0% found this document useful (0 votes)
2 views43 pages

Project Report 4th Year

This project involves creating a neural network from scratch using NumPy to understand the mathematical principles behind machine learning, particularly backpropagation. It includes implementing core components, testing on various datasets, and visualizing the learning process through detailed plots. The project aims to bridge the gap left by high-level libraries, enhancing comprehension of neural network mechanics and training dynamics.

Uploaded by

varshiniyara785
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views43 pages

Project Report 4th Year

This project involves creating a neural network from scratch using NumPy to understand the mathematical principles behind machine learning, particularly backpropagation. It includes implementing core components, testing on various datasets, and visualizing the learning process through detailed plots. The project aims to bridge the gap left by high-level libraries, enhancing comprehension of neural network mechanics and training dynamics.

Uploaded by

varshiniyara785
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

ABSTRACT

This project focuses on building a neural network entirely from scratch using NumPy, with the
primary aim of gaining a deep understanding of the mathematical principles and algorithms that
power modern machine learning models—most notably, the backpropagation algorithm. Unlike
popular machine learning libraries such as TensorFlow, PyTorch, or Keras that abstract away
much of the underlying complexity, this project exposes and implements the inner workings of
neural networks manually. The network includes key components such as forward and backward
passes, gradient-based optimization, and parameter updates, all of which were constructed
without the use of high-level machine learning APIs. Through this hands-on approach, we
demystify how neural networks actually "learn" from data.

To evaluate and visualize the learning process, the implemented model was tested on both simple
(logic gate) and more complex datasets obtained from Kaggle, covering both classification and
regression tasks. A wide variety of plots were generated—including weight trajectories, gradient
updates, accuracy/R² scores, and contour plots of the loss landscape—to illustrate the behavior
and dynamics of training. Additionally, Principal Component Analysis (PCA) was implemented
from scratch to visualize changes in the model’s output across epochs. This comprehensive
exploration not only validates the effectiveness of our custom-built neural network but also
strengthens our conceptual and practical grasp of foundational machine learning techniques.

II
Table of Contents

Page No.
1 Introduction ​​ ​ ​ ​ ​
1.1 Problem statement 1
1.2 Motivation 1
1.3 Objectives 2
2 Literature Survey
2.1 Existing work 3
2.2 Limitations of Existing work 4
3 Software Requirements Specifications
3.1 Overall Description 6
3.2 Operating Environment 6
3.3 Functional Requirements 7
3.4 Non-Functional Requirements 7-8
4 Design ​
4.1 Use Case Diagram 9
4.2 Class Diagram 10
4.3 Sequence Diagram 11
4.4 Data Flow Diagram 12
4.5 System Architecture 13
5 Implementation
5.1 Sample Code 14-26
6 Testing
6.1 Test Cases 27-28
7 Screenshots
7.1 AND gate dataset – Classification Task 29-31
7.2 Kaggle dataset – Classification Task 32-34
7.3 Kaggle dataset – Regression Task 34-36
7.4 Project Structure 37
8 Conclusion & Future scope 38-39
References 40-41

III
1. INTRODUCTION

1.1 Problem Statement


In the domain of machine learning, neural networks are widely used for solving complex tasks
such as classification, regression, and pattern recognition. While powerful, the process by which
these networks learn from data involves intricate mathematical operations, particularly in the
context of training through optimization algorithms like backpropagation. Most widely-used
frameworks abstract these details, offering simplified interfaces that conceal the underlying
computations.

This abstraction creates a gap in understanding the fundamental mechanics of neural networks,
especially for learners and practitioners aiming to grasp how learning actually occurs at the level
of individual operations—such as computing gradients, updating weights, and propagating
errors. Without direct exposure to these internal processes, it becomes challenging to build a
strong foundational intuition about model behavior, performance, and limitations.

1.2 Motivation
The primary motivation behind this project is to develop a deeper, hands-on understanding of
how neural networks operate at a mathematical and algorithmic level—beyond the abstraction
offered by modern libraries. By implementing each component manually, especially the
backpropagation algorithm, we aimed to bridge the conceptual gap left by high-level tools. This
exercise not only reinforces our theoretical knowledge but also enhances our ability to diagnose,
interpret, and optimize models in practical machine learning tasks by understanding what
happens beneath the surface during training.

1
1.3 Objectives

The goal of this project is to break away from high-level abstractions and understand, at a
granular level, how neural networks learn from data. To achieve this, we defined the following
specific objectives:

●​ To implement a feedforward neural network from scratch using only NumPy, covering
core components such as forward pass, backward pass, and parameter updates.

●​ To study and manually apply the mathematical foundations behind the backpropagation
algorithm, including calculus-based gradient derivations.

●​ To visualize the internal behavior of the model during training using detailed plots of
gradients, weights, scores, and loss landscapes.

●​ To test the neural network on both simple datasets (e.g., logic gates) and more complex
datasets, across classification and regression tasks.

2
2. LITERATURE SURVEY

2.1 Existing Work


This section presents a review of recent research efforts that explore various applications and
advancements in neural networks. The emphasis is on how neural networks have been applied
across different problem domains and the practical implications of these approaches.

Mellah et al. [1] proposed a neural network-based estimator for brushed DC machines capable
of simultaneously predicting speed, armature temperature, and resistance using only voltage and
current measurements. To enhance learning efficiency, the authors implemented a
Cascade-Forward Neural Network (CFNN) trained using the Resilient Backpropagation (RBP)
algorithm, known for its fast convergence and robustness. Unlike traditional estimators—which
typically target a single parameter and are often prone to instability and noise sensitivity—the
proposed method eliminates the need for physical speed and thermal sensors. Comparative
results demonstrated that the neural estimator closely matches model predictions and offers
improved estimation accuracy, making it suitable for thermal monitoring and high-performance
motor drive applications.

Bülte et al. [2] introduced a graph neural network (GNN)-based framework to post-process
ensemble precipitation forecasts with a focus on extreme weather events. Unlike traditional
post-processing methods that often overlook complex spatial dependencies and tail behaviors in
precipitation data, this approach directly targets extremes in the distribution. By leveraging the
structure of GNNs, the model captures spatial correlations and enhances the accuracy of
probabilistic forecasts, particularly for extreme precipitation. Experimental comparisons showed
improved performance over standard baselines, highlighting the framework’s potential in
reducing flood risks and informing climate resilience strategies. Future directions include
integrating this approach with end-to-end forecasting systems and refining extreme-value
modeling techniques.

Karabayir et al. [3] introduced a novel optimization approach called the Evolved Gradient
Direction Optimizer (EVGO), designed to address the vanishing gradient issue commonly
encountered in training deep neural networks (DNNs). The method leverages both first-order
gradients and a specially constructed hyperplane to guide weight updates. The authors
benchmarked EVGO against several established gradient-based optimizers, including Adam,
RMSProp, and gradient descent, across datasets such as MNIST, CIFAR-10, and CIFAR-100
using well-known architectures like AlexNet and ResNet. Their experiments showed that EVGO
consistently outperformed the other optimizers in terms of accuracy and convergence, even in
deeper or narrower networks.

3
Yang et al. [4] proposed a hybrid training strategy called GEMONN, which integrates gradient
information into evolutionary algorithms to improve the training of deep neural networks
(DNNs). The key innovation lies in a specialized genetic operator that guides the search using
gradient directions while also optimizing for network sparsity, helping reduce complexity and
prevent overfitting. Through experiments on various architectures — including autoencoders,
LSTMs, and CNNs — GEMONN demonstrated superior performance compared to both
traditional evolutionary methods and standard gradient-based optimizers like SGD and Adam.
Although it showed slightly lower performance on CNNs, the approach still offers strong
potential, especially for resource-constrained environments where sparse networks are beneficial.

Na et al. [5] addressed the vanishing gradient problem in deep neural networks (DNNs) by
integrating batch normalization (BN) layers before each sigmoid activation layer. Their approach
was specifically applied to the modeling of microwave components, a domain known for its
complex non-linear behavior. By normalizing layer inputs with additional scaling and shifting,
the BN layers improved gradient flow and training stability. Additionally, they employed an
Automated Model Generation (AMG) algorithm to dynamically configure the network
architecture, including the number of hidden and BN layers. This combination of BN and AMG
contributed to a more robust and efficient training process for deep networks in engineering
applications.

Li et al. [6] investigated pruning techniques for Binary Neural Networks (BNNs), which are
already compact due to their binary weights and activations. Recognizing that existing pruning
methods for full-precision networks are unsuitable for BNNs, the authors introduced a novel
pruning strategy based on weight flipping frequency — a measure of how often weights change
during training. This metric serves as an indicator of a weight's sensitivity to model accuracy.
Experiments on binary versions of AlexNet and a 9-layer Network-in-Network (NIN), using the
CIFAR-10 dataset, demonstrated that their method could reduce binary operations by 20–40%
with only minimal accuracy loss. The approach also achieved significant runtime improvements,
highlighting its effectiveness for optimizing BNNs without sacrificing performance.

2.2 Limitations of Existing Work


While these studies offer valuable applications and insights, they also reveal certain limitations
and open challenges that remain unaddressed:

1.​ Lack of Foundational Understanding in Existing Implementations​


Most surveyed works rely heavily on pre-built neural network frameworks (e.g., TensorFlow,
Keras), which abstract away the underlying mathematical operations. This leads to a gap in
foundational comprehension of neural network internals, especially the backpropagation
algorithm and gradient flow mechanics.

4
2.​ Limited Exploration of Neural Network Construction from Scratch​
Few studies focus on implementing neural networks from the ground up using only low-level
tools like NumPy. This leaves a gap in hands-on understanding of core components such as
forward pass, backpropagation, loss functions, and optimization routines.

3.​ Overemphasis on Application, Underemphasis on Intuition​


The reviewed literature largely applies neural networks to specific tasks (e.g., forecasting,
classification, parameter tuning), prioritizing performance over interpretability. This creates a
gap in pedagogical or learning-oriented projects that emphasize why neural networks behave
the way they do.

4.​ High-Level Libraries Mask Model Behavior​


Optimization algorithms like Adam or SGD are often used out-of-the-box. The internal
workings — like how gradients update weights or how learning rates impact convergence —
are not typically analyzed or re-implemented. This obscures a full understanding of model
training dynamics.

5
3. SOFTWARE REQUIREMENTS SPECIFICATION

3.1 Overall Description

This project implements a neural network system from scratch using NumPy. The system is
composed of modular components, including classes for building models, layers, optimizers,
training routines, and plotting tools. It supports classification and regression tasks, allows custom
dataset integration, and includes visualization capabilities to monitor various training metrics.
The intended users of this system are students, educators, or developers who want to study or
demonstrate the internal workings of neural networks in a transparent and configurable
environment.

This specification outlines the functional and non-functional requirements of the system, its
operating conditions, and a summary of its design structure.

3.2 Operating Environment

Software Requirements

●​ Operating System: Windows 10 / Windows 11 / Ubuntu 20.04+ / macOS (any modern


version)
●​ Programming Language: Python 3.8 or higher
●​ Development Environment: Jupyter Notebook, Visual Studio Code
●​ Libraries and Dependencies:
○​ NumPy (for numerical computations)
○​ Matplotlib (for plotting and visualization)
○​ Pandas (for dataset handling)
●​ Notebook Runtime: Jupyter Lab / Jupyter Notebook
●​ Dataset Sources: Local .csv files and synthetic datasets generated via scripts

Hardware Requirements

●​ Processor: Dual-core CPU (Intel i3 or equivalent, minimum)


●​ RAM: 4 GB (minimum), 8 GB or more recommended
●​ Storage: Minimum 500 MB of free disk space for code, logs, and plots
●​ Graphics: No dedicated GPU required (CPU-based training)

6
3.3 Functional Requirements

The system should provide the following core functionalities for users working within Python
notebooks or scripts:

1.​ Data Handling


●​ Generate simple datasets programmatically (e.g., AND gate)
●​ Load and preprocess external datasets (e.g., standardization, class balancing)
●​ Perform representative train-test splits for both classification and regression tasks

2.​ Model Construction and Configuration


●​ Define and configure custom neural network architectures by stacking layers
●​ Set activation functions (e.g., ReLU, Sigmoid) and loss functions
●​ Initialize model weights and biases

3.​ Training and Optimization


●​ Perform forward propagation to compute model predictions
●​ Compute loss and apply backward propagation to calculate gradients
●​ Update model parameters using optimization algorithms (e.g., SGD)
●​ Log training metrics (loss, accuracy, etc.) to a file for later reference

4.​ Visualization and Debugging


●​ Plot gradients, weight updates, and performance metrics across epochs
●​ Visualize output predictions and loss landscapes

5.​ Model Evaluation and Testing


●​ Assess model performance using classification accuracy or R² score
●​ Save and load trained model weights for reuse or analysis

3.4 Non-Functional Requirements

1.​ Usability
The system should be intuitive and user-friendly for individuals familiar with Python
programming and Jupyter notebooks. Users should be able to interact with the system by
importing its components as Python modules and using them with minimal setup. The codebase
should follow clear naming conventions and be well-documented to support learning and
experimentation. The interface should encourage educational use and provide clarity over
automation.

2.​ Modularity and Maintainability


The system should be designed in a modular fashion, with separate files or packages responsible
for distinct functionalities such as data processing, model definition, training, and plotting. This
modular structure should make it easy to isolate, modify, or extend specific components without

7
affecting others. The architecture should support long-term maintainability and allow new
features or improvements to be integrated with minimal disruption.

3.​ Performance
The system should perform efficiently for small to medium-sized datasets on CPU-based
machines. It should support mini-batch training to manage computational load and allow
flexibility in tuning performance during training. While not optimized for large-scale use, the
system should maintain acceptable responsiveness and throughput during experimentation.

4.​ Scalability
Although the system is not intended for industrial-scale deployment, it should be scalable within
the context of academic or prototype-level tasks. It should allow users to build deeper models,
modify layer configurations, and process larger batches without requiring structural code
changes. The design should make it possible to scale complexity upward for controlled
experimentation.

5.​ Reproducibility and Transparency


The system should provide full visibility into the training process through logging and
visualizations. Key training metrics such as loss, gradients, accuracy scores, and model
configurations should be recorded and reproducible across runs. Users should be able to
reproduce experimental results, e.g. through consistent random seeds and saved weights. The
system should offer visual tools to trace how model parameters and outputs evolve over time.

8
4. DESIGN

4.1 Use Case Diagram

9
4.2 Class Diagram

10
4.3 Sequence Diagram

11
4.4 Data Flow Diagram

12
4.5 System Architecture

13
5. IMPLEMENTATION

5.1 Sample Code

1.​ dataset_utils.py

import numpy as np
from numpy.random import default_rng
import pandas as pd

def get_vector(seed = 1, upper_bound = 10, n_samples = 10):


generator = default_rng(seed)
vector = upper_bound * generator.random((n_samples, 1))
return vector

def and_gate_dataset(positive_samples = 100, seed = 1000):


generator = default_rng(seed)
seeds = generator.choice(1000, 4, replace = False)
# features
both_positive = np.hstack((get_vector(seed = seeds[0], n_samples = positive_samples),
get_vector(seed = seeds[1], n_samples = positive_samples)))
one_zero_1 = np.hstack((np.zeros((positive_samples//2, 1)), get_vector(seed = seeds[2],
n_samples = positive_samples//2)))
one_zero_2 = np.hstack((get_vector(seed = seeds[3], n_samples = positive_samples//2),
np.zeros((positive_samples//2, 1))))
both_zero = np.array([0, 0])
features = np.vstack((both_positive, one_zero_1, one_zero_2, both_zero))
# labels
labels_positive = np.ones((positive_samples, 1), dtype = int)
labels_negative = np.zeros((positive_samples//2 + positive_samples//2 + 1, 1), dtype = int)
labels = np.vstack((labels_positive, labels_negative))

dataset = np.hstack((features, labels))


generator.shuffle(dataset)
return dataset[:, [0, 1]], dataset[:, [2]]

def get_minibatch(features, targets, batch_size = 1, start_at = 0):


# give this the same stuff every time
if start_at >= features.shape[0]:
print("invalid start_at for get_minibatch")
return None, None

14
return features[start_at:min(features.shape[0], start_at + batch_size)],
targets[start_at:min(targets.shape[0], start_at + batch_size)]

def standardize_data(features, include_indices = [], exclude_indices = [], from_means = None,


from_stds = None): # in-place operation
means = list()
stds = list()
from_counter = 0
for i in range(features.shape[1]):
if (i in include_indices and i not in exclude_indices) or (len(include_indices) == 0 and
len(exclude_indices) == 0):
mean = np.mean(features[:, i]) if from_means is None else from_means[from_counter]
std = np.std(features[:, i]) if from_stds is None else from_stds[from_counter]
features[:, i] = (features[:, i] - mean) / std
means.append(mean)
stds.append(std)
from_counter += 1
return means, stds

2.​ functions.py

import numpy as np

@np.vectorize
def relu(x):
return x if x > 0 else 0

def leaky_relu(leak = 0.1, alpha = 1):


@np.vectorize
def func(x):
return alpha * x if x > 0 else x*leak # leak is accessed through a 'closure'
return func

def der_leaky_relu(leak = 0.1, alpha = 1):


@np.vectorize
def func(x):
return alpha if x > 0 else leak
return func

@np.vectorize
def sigmoid(x):

15
return 1 / (1 + np.exp(-x))

@np.vectorize
def round_off(x):
return 1 if x >= 0.5 else 0

@np.vectorize
def mirror(x):
return x

class MSE:
def calculate_loss(self, labels, predictions):
labels = labels.reshape(predictions.shape)
return (1 / labels.size) * np.sum((labels - predictions)**2)

def der_loss(self, label, output):


return 2 * (output - label)

def error_output_layer(self, layer, label):


first_term = self.der_loss(label, layer.a_[0][0])
second_term = layer.der_activate()
return first_term * second_term

class BinaryLoss(MSE):
def calculate_loss(self, labels, predictions, epsilon = 1e-7):
total = predictions.size
labels = labels.reshape(predictions.shape)
summation = 0

predictions = np.clip(predictions, epsilon, 1 - epsilon) # avoiding values of 0 ( => inf loss)


and 1 ( => 0 loss)
iterator = zip(labels, predictions)
for label, prediction in iterator:
summation += label * np.log(prediction) + (1 - label) * np.log(1 - prediction)
return (-1 / total) * summation

def der_loss(self, label, output, epsilon = 1e-7): # output belongs to range [0, 1]
output = np.clip(output, epsilon, 1 - epsilon)
return ((1 - label)/(1 - output)) - (label / output)

16
3.​ model_classes.py

import numpy as np
from numpy.random import default_rng
from math import sqrt
import os

class Layer:

def __init__(self, n_neurons, activation = None, der_activation = None):


self.n_neurons = n_neurons
self.activation = activation
self.der_activation = der_activation
self.z_ = np.zeros((n_neurons, 1))
self.del_ = np.zeros((n_neurons, 1))
self.b_gradients = np.zeros((n_neurons, 1))
self.a_ = np.zeros((n_neurons, 1))
self.b_ = np.zeros((n_neurons, 1))

def init_biases(self):
self.b_ = np.zeros((self.n_neurons, 1))

def activate(self):
self.a_ = self.activation(self.z_)

def der_activate(self):
return self.der_activation(self.z_)

class Weights:

def __init__(self, layer_1, layer_2, seed = 1000):


self.rows = layer_2.n_neurons
self.cols = layer_1.n_neurons
self.seed = seed
self.layer_1 = layer_1
self.layer_2 = layer_2
self.matrix = self.init_weights(self.rows, self.cols, self.seed)
self.gradients = np.zeros((self.rows, self.cols))

def init_weights(self, destination_neurons, source_neurons, seed): # source_neurons = fan_in


std = sqrt(2 / source_neurons) # standard deviation for 'He' initialization

17
generator = default_rng(seed)

weights = generator.standard_normal((destination_neurons, source_neurons)) * std


return weights

class Model:

def __init__(self, loss, seed = 1000):


self.layers = list()
self.weights = list()
self.loss = loss # 'class' for the loss function
self.seed = seed

def add_layer(self, layer):


self.layers.append(layer)

def compile(self):
generator = default_rng(seed = self.seed)
seeds = generator.integers(0, len(self.layers)*100, (len(self.layers)-1,))

self.weights = list() # allows re-compilation


for i in range(len(self.layers)-1):
self.weights.append(Weights(self.layers[i], self.layers[i+1], seed = seeds[i]))

for layer in self.layers:


layer.init_biases()

4.​ optimizers.py

import numpy as np

# optimizer will give the trainer the gradients[] matrix to apply to the weights.
class SGD():
def set_model(self, model):
self.model = model

def gradient_weights(self, weight_index):


return self.model.weights[weight_index].gradients

def gradient_biases(self, layer_index):


return self.model.layers[layer_index].b_gradients

18
def current_gradient(self, weight_index):
return np.matmul(self.model.weights[weight_index].layer_2.del_,
np.transpose(self.model.weights[weight_index].layer_1.a_))

def error_output_layer(self, label):


return self.model.loss.error_output_layer(self.model.layers[-1], label)

def error_layer(self, this_index, weight_index): # weights connecting this layer to next layer
return np.matmul(
np.transpose(self.model.weights[weight_index].matrix),
self.model.layers[this_index+1].del_) * self.model.layers[this_index].der_activate()

def on_pass(self):
# reset gradients
for weight in self.model.weights:
weight.gradients = np.zeros((weight.rows, weight.cols))

# reset errors
for layer in self.model.layers:
layer.del_ = np.zeros((layer.n_neurons, 1))
layer.b_gradients = np.zeros((layer.n_neurons, 1))

def update_biases(self, layer_index, l_rate):


value = l_rate * self.gradient_biases(layer_index)
self.model.layers[layer_index].b_ -= value
return value

def update_weights(self, weight_index, l_rate):


value = l_rate * self.gradient_weights(weight_index)
self.model.weights[weight_index].matrix -= value
return value

5.​ plotter.py

import numpy as np
from numpy.random import default_rng
import matplotlib.pyplot as plt
from nn.trainer import Logger
import os, time

19
class Plotter:
def read_file(self, path):
self.data = Logger.load_data(path)
if self.data is None:
print("unsucessful.")
else:
print("file read.")
def plot_gradients(self, dir, name = None, n_points = None):
if not os.access(dir, os.F_OK):
print(f'cannot access {dir}')
return
time_str = time.strftime("%I-%M-%S_%p", time.localtime(time.time()))
name = "/gradients_" + (time_str if name is None else name) + ".png"

weights_gradients = dict()
bias_gradients = dict()

for i in range(self.data['n-epochs']):
epoch = f'epoch-{i}'
for j in range(self.data[epoch]['n-updates']):
update = f'update-{j}'
for w in self.data['n_weights']:
weight = f'weights-gradient-{w}'
if weight not in weights_gradients:
weights_gradients[weight] = list()
array = np.asarray(self.data[epoch][update][weight]).ravel()[:4]
if array.shape[0] > 1:
array = array.reshape((2, -1))
else:
array = array.reshape((1, 1))
weights_gradients[weight].append(array)
for b in self.data['n_weights']:
b += 1
bias = f'bias-gradient-{b}'
if bias not in bias_gradients:
bias_gradients[bias] = list()
array = np.asarray(self.data[epoch][update][bias]).ravel()[:4]
array = array.reshape((-1, 1))
bias_gradients[bias].append(array)

20
fig, axs = plt.subplots(2, len(self.data['n_weights']), figsize = (7 *
len(self.data['n_weights']), 8), gridspec_kw = {'wspace': 0.2, 'hspace': 0.3}, squeeze = False)
fig.suptitle(f'Gradients', size = 'xx-large')

# weights
for ax in range(len(self.data['n_weights'])):
weight = f"weights-gradient-{self.data['n_weights'][ax]}"
weights_gradients[weight] = np.asarray(weights_gradients[weight]) # convert entire
history of gradients into one numpy array
if n_points is not None:
weights_gradients[weight] = weights_gradients[weight][:n_points]
for r in range(weights_gradients[weight][0].shape[0]):
for c in range(weights_gradients[weight][0].shape[1]):
axs[0, ax].plot(weights_gradients[weight][:, r, c], label = f'{r*2 + c}', linewidth =
0.7)
axs[0, ax].legend(title = 'elements')
axs[0, ax].set_title(f"weights-{self.data['n_weights'][ax]}")
axs[0, ax].set_xlabel("Updates")

# bias
for ax in range(len(self.data['n_weights'])):
bias = f"bias-gradient-{self.data['n_weights'][ax] + 1}"
bias_gradients[bias] = np.asarray(bias_gradients[bias])
if n_points is not None:
bias_gradients[bias] = bias_gradients[bias][:n_points]
for r in range(bias_gradients[bias][0].shape[0]):
axs[1, ax].plot(bias_gradients[bias][:, r, 0], label = f'{r}', linewidth = 0.7)
axs[1, ax].legend(title = 'elements')
axs[1, ax].set_title(f"bias-{self.data['n_weights'][ax] + 1}")
axs[1, ax].set_xlabel("Updates")

fig.savefig(dir + name, bbox_inches = "tight")


print(f'plot saved at {dir + name}')
plt.show()

6.​ trainer.py

import numpy as np
from numpy.random import default_rng
from nn.dataset_utils import get_minibatch
from nn.functions import round_off

21
import json, os, time

class Trainer:

def __init__(self, model, optimizer):


self.model = model
self.optimizer = optimizer
self.optimizer.set_model(self.model)
self.logger = Logger()

def backward_pass(self, label): # single instance


self.error_output_layer(label)
for i in range(len(self.model.layers)-2, 0, -1): # except the input layer
self.error_layer(i, i)

for i in range(len(self.model.weights)):
self.current_gradient(i)

def forward_pass(self, input_data):


# input_data shape = (1, n_cols)
self.model.layers[0].a_ = input_data.reshape((input_data.shape[-1], 1))
for i in range(1, len(self.model.layers)):
self.model.layers[i].z_ = np.matmul(self.model.weights[i-1].matrix,
self.model.layers[i-1].a_) + self.model.layers[i].b_
self.model.layers[i].activate()

def confusion_matrix(self, labels, predictions):


predictions = predictions.reshape((predictions.size, 1))
labels = labels.reshape((labels.size, 1))
stack = np.hstack((labels, predictions))
matrix = {'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0}

for i in range(stack.shape[0]):
if stack[i][0] == 1:
if stack[i][1] == 1:
matrix['tp'] += 1
else:
matrix['fn'] += 1
else:
if stack[i][1] == 1:
matrix['fp'] += 1

22
else:
matrix['tn'] += 1
return matrix

def update_biases(self, layer_index):


return self.optimizer.update_biases(layer_index, self.learning_rate) # final, applied value
(includes scaling by learning rate)

def update_weights(self, weight_index):


return self.optimizer.update_weights(weight_index, self.learning_rate)

7.​ classification.ipynb (AND gate)

from nn.model_classes import Model, Layer


from nn.functions import BinaryLoss, leaky_relu, der_leaky_relu, sigmoid, der_sigmoid
from nn.trainer import Trainer
from nn.plotter import Plotter
from nn.dataset_utils import and_gate_dataset
from nn.optimizers import SGD
import numpy as np

model = Model(BinaryLoss(), 1)
model.add_layer(Layer(2))
model.add_layer(Layer(2, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(1, sigmoid, der_sigmoid))
model.compile()

trainer = Trainer(model, SGD())

X_train, y_train = and_gate_dataset(100, 1)


dataset = np.hstack((X_train, y_train))
print(dataset[:10])

X_test, y_test = and_gate_dataset(50, 2)


test_set = np.hstack((X_test, y_test))
print(test_set[:10])

plotter = Plotter()

model.compile()

print('weights:')
model.show_weights()
print('biases:')
model.show_biases()

23
trainer.train(X_train, y_train, 1, 0.02, epochs = 120)

trainer.save_history('./logs', 'batch_size_1')

_ = trainer.predict(X_test, y_test, True)

model.save_weights('./models', 'batch_size_1')

plotter.read_file('./logs/batch_size_1.txt')
plotter.plot_gradients('./plots', 'gradients_batch_size_1', 700)
plotter.plot_weights('./plots', 'weights_batch_size_1', 700)
plotter.plot_score('./plots', 'accuracy_batch_size_1', 700)

8.​ dataset_classification.ipynb (Kaggle – classification)

import pandas as pd
import numpy as np
from nn.model_classes import Model, Layer
from nn.functions import BinaryLoss, leaky_relu, der_leaky_relu, sigmoid, der_sigmoid
from nn.trainer import Trainer
from nn.plotter import Plotter
from nn.dataset_utils import pca, standardize_data, split_classes
from nn.optimizers import SGD

dataset = pd.read_csv(r'datasets\raisin_Sruthi.csv')

dataset.info()

dataset = pd.get_dummies(dataset, columns = ["Class"], prefix = "", prefix_sep = "", drop_first


= True, dtype = int)
dataset.head()

D_train, D_test = split_classes(dataset, 'Kecimen')

X_train = D_train.drop(columns = ['Kecimen'])


y_train = D_train['Kecimen']

X_train = X_train.to_numpy()
X_means, X_stds = standardize_data(X_train)
X_train

model = Model(BinaryLoss(), 5)
model.add_layer(Layer(7))
model.add_layer(Layer(7, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(7, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(7, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(1, sigmoid, der_sigmoid))

24
model.compile()

trainer = Trainer(model, SGD())

y_train = y_train.to_numpy()
y_train[:5]

trainer.train(X_train, y_train, 16, 0.02, 20)

trainer.save_history('./logs', 'dataset_classification')
model.save_weights("./models", "dataset_classification")

plotter = Plotter()
plotter.read_file(r'logs\dataset_classification.txt')

plotter.plot_gradients("./plots", "dataset_classification", 700)


plotter.plot_weights("./plots", "dataset_classification", 700)
plotter.plot_score("./plots", "dataset_classification", 700, False)

9.​ dataset_regression.ipynb (Kaggle – regression)

import pandas as pd
from nn.model_classes import Model, Layer
from nn.functions import MSE, leaky_relu, der_leaky_relu, sigmoid, der_sigmoid, mirror,
der_mirror
from nn.trainer import RegressionTrainer
from nn.plotter import Plotter
from nn.dataset_utils import pca, standardize_data, split_data
from nn.optimizers import SGD

dataset = pd.read_csv(r'datasets\expenses_Sruthi.csv')

dataset.info()

dataset = pd.get_dummies(dataset, drop_first = True, dtype = int)


dataset.head()

D_train, D_test = split_data(dataset, 0.3, 10)

X_train = D_train.drop(columns = ['charges'])


y_train = D_train['charges']

model = Model(MSE(), 5)
model.add_layer(Layer(8))
model.add_layer(Layer(8, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(8, leaky_relu(), der_leaky_relu()))
model.add_layer(Layer(1, mirror, der_mirror))
model.compile()

25
trainer = RegressionTrainer(model, SGD())

X_train = X_train.to_numpy()
y_train = y_train.to_numpy()
y_train = y_train.reshape((y_train.shape[0], 1))
X_means, X_stds = standardize_data(X_train, [0, 1])
y_means, y_stds = standardize_data(y_train)

trainer.train(X_train, y_train, 4, 0.02, 25)

trainer.save_history('./logs', 'dataset_regression')

plotter = Plotter()
plotter.read_file(r'logs\dataset_regression.txt')
plotter.plot_gradients("./plots", "dataset_regression", 700)
plotter.plot_weights("./plots", "dataset_regression", 700)
plotter.plot_score("./plots", "dataset_regression", 700, False)

26
6. TESTING

To ensure the correctness and reliability of the neural network system, several forms of testing
were conducted. Given that this is not a traditional software application with UI elements or
discrete functional modules, the focus of testing shifted toward verifying the correctness of
mathematical computations (forward and backward passes), observing learning behavior over
time, and confirming that the system could make accurate predictions on given datasets.

The testing approach involved:

●​ Verifying outputs of individual components (such as activation and loss functions).


●​ Using simple datasets like the AND gate to confirm basic learning behavior.
●​ Applying the network to more complex datasets to validate generalization.
●​ Visualizing training metrics such as loss, gradients, and model outputs to monitor internal
behavior during training.
●​ Visual plots generated using matplotlib (covered in Section 7) played a central role in
debugging and confirming system behavior.

6.1 Test Cases

Test Case-1 Forward Pass Output (AND Gate)

Description Check that a trained model can output correct values for AND gate
inputs after training.

Input Four 2D binary combinations: [0,0], [0,1], [1,0], [1,1]

Expected Output Approximately 0 for first three inputs, and near 1 for [1,1]

Observed Result Final predictions after training: [0.03, 0.06, 0.07, 0.95]

Test Case-2 Gradient Flow Consistency

Description Ensure that gradients computed during backpropagation decrease in


magnitude over time and don’t vanish or explode.

Input A small classification dataset (AND gate, batch size 1) trained over 100
epochs.

27
Expected Output Smooth, converging gradient trajectories; no abrupt spikes.

Observed Result Plots of gradients confirmed a steady downward trend and numerical
stability during updates.

Test Case-3 Classification Accuracy on Kaggle Dataset

Description Evaluate classification performance on a moderately complex dataset


after full training.

Input Preprocessed dataset (features and labels), 80/20 train-test split.

Expected Output Training and test accuracy should be >85% with no overfitting or
underfitting symptoms.

Observed Result Final training accuracy: 85%, test accuracy: 81%

Test Case-4 Loss Curve Behavior

Description Check that loss value decreases consistently across epochs during
training.

Input Any dataset with a known ground truth; typically trained over 100+
epochs.

Expected Output Smooth downward loss curve; convergence after sufficient iterations.

Observed Result All training sessions produced typical exponential decay loss patterns;
no signs of divergence or instability.

28
7. SCREENSHOTS

This section presents a collection of visual outputs and screenshots that highlight key aspects of
the project. The majority of the images consist of graphs generated during model training, which
served as critical tools for observing and validating the behavior of the neural network.
Additionally, this section includes selected screenshots of the development environment (VS
Code and Jupyter notebooks), showcasing how different modules interact during training and
evaluation. Together, these visuals offer a comprehensive look into both the internal dynamics of
the neural network and the engineering effort behind the system.

7.1 AND gate dataset – Classification Task

29
30
31
7.2 Kaggle dataset – Classification Task

32
33
7.3 Kaggle dataset – Regression Task

34
35
36
7.4 Project Structure (VS Code)

37
8. CONCLUSION

The primary goal of this project was to gain a foundational understanding of neural networks by
implementing one from scratch without relying on high-level machine learning libraries.
Through this process, we successfully constructed a fully functional feedforward neural network
using only NumPy and Python, carefully designing every component including the forward pass,
backpropagation, loss computation, gradient updates, and training loop. The network was
evaluated on both simple and moderately complex datasets, where it demonstrated strong
learning capabilities and competitive performance. Visual tools played a central role in our
workflow, providing crucial insights into the training dynamics and helping us verify the
correctness of the implementation.

This hands-on approach allowed us to go beyond the surface-level use of popular frameworks
and engage directly with the mathematical and algorithmic core of deep learning. We not only
implemented the foundational equations but also handled data preprocessing, training utilities,
modular design, and real-time monitoring. This comprehensive experience helped solidify our
understanding of key concepts such as weight initialization, gradient descent, overfitting, and
loss landscapes. Overall, the project was successful in achieving its pedagogical objectives while
also delivering a practical and testable neural network system.

Future Scope
While the current project successfully delivered a functional neural network system built from
first principles, there remains significant room for both horizontal and vertical expansion. Future
work can further improve the system’s learning capabilities, flexibility, and applicability across
different problem domains — all while remaining grounded in the context of fundamental
feedforward neural networks.

Potential directions for future work include:

1.​ Enhancing the core system:

●​ Implementing advanced optimization algorithms from scratch, such as Adam, RMSProp,


or momentum-based SGD.
●​ Incorporating regularization techniques like L1/L2 weight penalties or dropout to
improve generalization.
●​ Exploring alternative, gradient-free optimization techniques such as Particle Swarm
Optimization (PSO), Genetic Algorithms (GA), or Differential Evolution (DE) for
training the network. These methods could offer insights into global search behavior,
robustness to local minima, and performance in cases where gradient descent struggles.

38
●​ Adding support for early stopping, dynamic learning rate schedules, and other training
control mechanisms.

2.​ Application-focused extensions:

●​ Designing an online learning module that allows the model to train incrementally on
streaming data, enabling use in dynamic environments such as anomaly detection.
●​ Developing a use-case-specific application — for example, a cyberattack detection
system — where data is generated or labeled in real time, simulating a high-stakes
decision-making scenario.
●​ Applying the network to sensor-based domains such as predictive maintenance or activity
recognition, where the temporal structure of incoming data can still be explored with
feedforward models using engineered features.
●​ Building a simple graphical interface or API endpoint so that non-technical users can
input data and view model predictions interactively.

These extensions would not only make the system more robust and versatile but also highlight
the potential of even basic neural architectures when carefully engineered and applied to
domain-specific challenges.

39
REFERENCES
[1]​ Mellah, Hacene, Kamel Eddine Hemsas, and Rachid Taleb. "Cascade-Forward Neural
Network Based on Resilient Backpropagation for Simultaneous Parameters and State Space
Estimations of Brushed DC Machines." arXiv preprint arXiv:2104.04348 (2021).
[2]​ Bülte, Christopher, et al. "Graph Neural Networks for Enhancing Ensemble Forecasts of
Extreme Rainfall." arXiv preprint arXiv:2504.05471 (2025).
[3]​ Karabayir, Ibrahim, Oguz Akbilgic, and Nihat Tas. "A novel learning algorithm to optimize
deep neural networks: Evolved gradient direction optimizer (EVGO)." IEEE transactions
on neural networks and learning systems 32.2 (2020): 685-694.
[4]​ Yang, Shangshang, et al. "A gradient-guided evolutionary approach to training deep neural
networks." IEEE Transactions on Neural Networks and Learning Systems 33.9 (2021):
4861-4875.
[5]​ Na, Weicong, et al. "Deep neural network with batch normalization for automated
modeling of microwave components." 2020 IEEE MTT-S International Conference on
Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO). IEEE,
2020.
[6]​ Li, Yixing, and Fengbo Ren. "Bnn pruning: Pruning binary neural network guided by
weight flipping frequency." 2020 21st International Symposium on Quality Electronic
Design (ISQED). IEEE, 2020.

Bibliography
●​ Documentation: NumPy – https://numpy.org/doc/2.2/
●​ Documentation: Matplotlib – https://matplotlib.org/stable/index.html
●​ Documentation: Pandas – https://pandas.pydata.org/docs/
●​ Documentation: Python 3.9 – https://docs.python.org/3.9/
●​ Article: “Neural Networks Representation” –
https://www.jeremyjordan.me/intro-to-neural-networks/
●​ Article: “Mechanics of a Simple Neural Network” –
https://shotlefttodatascience.com/2020/08/17/the-mechanics-of-a-simple-neural-network/
●​ Article: “Neural Networks Structure” –
https://python-course.eu/machine-learning/neural-networks-structure-weights-and-matrices.p
hp
●​ Article: “Weight Initialization Techniques in Neural Networks” –
https://www.pinecone.io/learn/weight-initialization/
●​ Article: “How the Backpropagation Algorithm Works” –
http://neuralnetworksanddeeplearning.com/chap2.html
●​ Article: “Gradient Accumulation” –
https://medium.com/@harshit158/gradient-accumulation-307de7599e87

40
●​ Article: “Visualizing the Loss Landscape of a Neural Network” –
https://mathformachines.com/posts/visualizing-the-loss-landscape/#the-linear-case
●​ Article: “Principal Component Analysis” –
https://www.turing.com/kb/guide-to-principal-component-analysis#step-4:-feature-vector
●​ Kaggle Regression dataset: “Medical Insurance” –
https://www.kaggle.com/datasets/harshsingh2209/medical-insurance-payout
●​ Kaggle Classification dataset – “Raisin Binary Classification” –
https://www.kaggle.com/datasets/nimapourmoradi/raisin-binary-classification

41

You might also like