0% found this document useful (0 votes)
12 views20 pages

Implemented MobileNet on PyTorch

The document details the implementation of MobileNet (V1) using PyTorch, including its architecture, convolutional layers, and activation functions like ReLU. It covers the setup for training and testing the model, data preprocessing, and the creation of custom convolution classes. The document also includes code snippets for loading datasets, defining the MobileNet structure, and visualizing training data.

Uploaded by

game zone CPF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views20 pages

Implemented MobileNet on PyTorch

The document details the implementation of MobileNet (V1) using PyTorch, including its architecture, convolutional layers, and activation functions like ReLU. It covers the setup for training and testing the model, data preprocessing, and the creation of custom convolution classes. The document also includes code snippets for loading datasets, defining the MobileNet structure, and visualizing training data.

Uploaded by

game zone CPF
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

COMSATS UNIVERSITY ISALAMABAD

Project # 02: Implemented MobileNet on PyTorch

Name Awais Mazhar Shafi

Registration Number CIIT/FA20-BEE-036/ISB

Instructor’s Name Dr. Faisal Siddiqui

Submission Date 02 Dec 2023


Mobile Net (V1) Architecture:
Type/Stride Filter Shape Input Size
Conv / s2 3 x 3 x 3 x 32 224 x 224 x 3
Conv dw / s1 3 x 3 x 32 dw 112 x 112 x 32
Conv / s1 1 x 1 x 32 x 64 112 x 112 x 32
Conv dw / s2 3 x 3 x 64 dw 112 x 112 x 64
Conv / s1 1 x 1 x 64 x 128 56 x 56 x 64
Conv dw / s1 3 x 3 x 128 dw 56 x 56 x 128
Conv / s1 3 x 3 x 128 x 128 56 x 56 x 128
Conv dw / s2 3 x 3 x 128 dw 56 x 56 x 128
Conv / s1 3 x 3 x 128 x 256 28 x 28 x 128
Conv dw / s1 3 x 3 x 256 dw 28 x 28 x 256
Conv / s1 3 x 3 x 256 x 256 28 x 28 x 256
Conv dw / s2 3 x 3 x 256 dw 28 x 28 x 256
Conv / s1 3 x 3 x 256 x 512 14 x 14 x 256
5 x Conv dw / s1 3 x 3 x 512 dw 14 x 14 x 512
Conv / s1 1 x 1 x 512 x 512 14 x 14 x 512
Conv dw / s2 3 x 3 x 512 dw 14 x 14 x 512
Conv / s1 1 x 1 x 512 x 1024 7 x 7 x 512
Conv dw / s1 3 x 3 x 1024 dw 7 x 7 x 1024
Conv / s1 1 x 1 x 1024 x 1024 7 x 7 x 1024
Avg Pool / s1 Pool 7 x 7 7 x 7 x 1024
FC / s1 1024 x 10 1 x 1 x 1024
Softmax / s1 Classifier 1 x 1 x 10

Table 1 shows. MobileNet Body Architecture


DepthWise Conv and 3x3 Conv:

Figure 1 shows. Left Side: Standard convolutional layer with batchnorm and ReLU.
Right Side: Depthwise Separable convolutions with Depthwise and Pointwise
layers followed by batchnorm and ReLU.

Image Padding:
Image padding is a technique used to increase the size of an image by adding pixels
of zero intensity around the edge of the original image. This technique is commonly
used in computer vision tasks such as convolution neural networks (CNNs) where
images are required to be of the same size.

Figure 2 shows the input image which has 3 x 3 dimensions and then we add the
1 x 1 padding in the input then the output size we get is 5 x 5.

Stride:
Stride is a parameter used in convolution neural networks to determine the step
size at which the kernel moves the input image. The stride value determines the
amount of overlap between the kernel’s and the input image.
A stride 1x1 means that the kernel moves one pixel at a time in both horizontal and
vertical directions (First complete the horizontal direction and then goes towards
vertical direction). This means that the kernel will cover every pixel in the input
image and the output size of the convolution operation will be the same as the
input size.

Figure 3 shows the input image which has 5 x 5 dimensions and then we apply the
1 x 1 stride in the input. It means the kernel moves one pixel at a time in both
horizontal and vertical directions.

On the other hand, a stride 2x2 means that the kernel moves two pixel at a time in
both horizontal and vertical directions (First complete the horizontal direction and
then goes towards vertical direction). This means that the kernel will skip very other
pixel in the input image, reducing the number of computations required to perform
the convolution operation. The output size of the convolution operation will be the
smaller than the input size, as the kernel skips some pixel in the input image.

Figure 4 shows the input image which has 5 x 5 dimensions and then we apply the
2 x 2 stride in the input. It means the kernel moves two pixel at a time in both
horizontal and vertical directions.
Pooling Layer:
A pooling layer is a type of layer commonly used in convolution neural network
(CNNs) for image recognition tasks. The purpose of the pooling layer is to reduce
the spatial dimensions of the input features maps, while retaining the most
important information.
Depending upon method used, there are several types of pooling operations. It
basically summarizes the features generated by a convolution layer. In Max
Pooling, the largest element is taken from feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of
the elements in the predefined section is computed in Sum Pooling. The Pooling
Layer usually serves as a bridge between the Convolutional Layer and the FC Layer.
This CNN model generalizes the features extracted by the convolution layer, and
helps the networks to recognize the features independently. With the help of this,
the computations are also reduced in a network. A pooling window is a

Figure 5 shows the input image which has 3 x 3 dimensions and then we apply the
2 x 2 pooling layer and 2 x 3 pooling layer in the input. It means it picks the
maximum or average value from that part of the input (if we use max pooling or
average pooling layer).

The filter of a pooling layer is always smaller than a feature map. Usually, it takes a
2x2 square (patch) and compresses it into one value. A 2x2 filter would reduce the
number of pixels in each feature map to one quarter the size. If you had a input
feature map size is 10×10, the output map would be 5×5.
Multiple different functions can be used for pooling. These are the most frequent:

 Maximum Pooling:

It calculates the maximum value for each patch of the feature map.

 Average pooling:

It calculates the average value for each patch on the feature map.

Figure 6 shows the result when we apply of max pooling and average pooling on
image matrix.

ReLU:
ReLU, which stands for Rectified Linear Unit, is an activation function commonly
used in artificial neural networks, particularly in deep learning models. The function
introduces non-linearity to the network, allowing it to learn from complex patterns
and relationships in the data.

The mathematical expression for the ReLU activation function is simple:

f(x) = max(0,x)

where x is the input to the function and f(x) is the output.

In other words, for any input x, the output of the ReLU function is the maximum of
0 and x. If x is positive, the function returns x and if x is negative, it returns 0.
The ReLU function introduces non-linearity to the model, enabling it to learn and
approximate complex, non-linear relationships within the data.

ReLU has several advantages over other activation functions, such as its ability to
address the vanishing gradient problem, which can occur in deep neural networks
when using other activation functions like sigmoid or tanh. It also tends to perform
well in many applications and is easy to implement.

Overall, ReLU is a powerful too in the field of deep learning and has contributed to
the success of many state of the art neural architectures.

Figure 7 shows the function of ReLU. If input value is positive, the function returns
input value and if input value is negative, it returns 0.
Implementation

Implemented Mobile Net (V1) on PyTorch:


In this section we are discussing the Pytorch code for training and testing step by
step. For implementing mobile net we can use anaconda or online Google Colab.
We use Google Colab.
First we set the device to GPU for faster processing.
Setting up a Device Agnostic code(Gpu):
# set up a device agnostic code
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
device

Then we import the necessary PyTorch libraries.


Import Imp Libraries:
# Import Libraries
import torch
from torch import nn
import torchvision
from torchvision import datasets
import torchvision.transforms as transforms
# Import matplotlib fro visualization
import matplotlib.pyplot as plt

# Check versions
print(torch.__version__)
print(torchvision.__version__)

In PyTorch, the MNIST dataset can be downloaded using the datasets.MNIST


module. The following code demonstrate how to download the MNIST dataset
using this module.
Loading DataSet and Preprocessing:
# Loading the dataset and preprocessing
train_dataset = datasets.MNIST(
root = ',/data',
train = True,
transform = transforms.Compose([
transforms.Resize((224,224)),
transforms.Grayscale(num_output_channels=3),
transforms.ToTensor(),
transforms.Normalize(mean = (0.5,), std = (0.5,))]),
download = True
)

test_dataset = datasets.MNIST(
root = ',/data',
train = False,
transform = transforms.Compose([
transforms.Resize((224,224)),
transforms.Grayscale(num_output_channels=3),
transforms.ToTensor(),
transforms.Normalize(mean = (0.5,), std = (0.5,))]),
download = True
)

The following code demonstrate how to see the first training data of the MNIST
dataset.
See First Training Examples:
# See the first training examples
image , label = train_dataset[0]
image.shape , label

The following code demonstrate how to load the MNIST dataset using DataLoader
module and we turn the train_dataset and test_dataset into batches for fast
processing.
Turn our Data Into Batches:
from torch.utils.data import DataLoader

# Set up a batch size hyperparameter


batch_size = 30

# Turn datasets into iterable (batches)


train_loader = DataLoader(
dataset = train_dataset,
batch_size = batch_size,
shuffle = True)

test_loader = DataLoader(
dataset = test_dataset,
batch_size = batch_size,
shuffle = True
)

The following code demonstrate how many batches are created from train dataset
and test data.
Check out What we have Created:
# Let's check out what we have created
print(f"DataLoader: {train_loader, test_loader}")
print(f"Length of train_dataLoader: {len(train_loader)} Batches of
{batch_size}...")
print(f"Length of test_dataloader: {len(test_loader)} Batches of
{batch_size}...")

The following code demonstrate how to check the features inside the training
dataset.
Check out What’s inside the training features:
# Check out what's inside the training features
train_features_batch, train_labels_batch = next(iter(train_loader))
train_features_batch.shape, train_labels_batch.shape

The following code demonstrate how to check the class names inside the training
dataset.
Check the Class names in our dataset:
class_names = train_dataset.classes
class_names
The following code demonstrate how to plot the random images from train
features.
Plot the Random Image from Train Features:
import matplotlib.pyplot as plt
from torchvision.transforms.functional import to_pil_image

# Show a sample
torch.manual_seed(42)
random_idx = torch.randint(0, len(train_features_batch), size=[1]).item()
img , label= train_features_batch[random_idx],
train_labels_batch[random_idx]

# Assuming img is your PyTorch tensor with shape (3, 224, 224)
img_pil = to_pil_image(img)
plt.imshow(img_pil)
plt.title(class_names[label])
print(f"Image Size: {img.shape}")
print(f"Label: {label}, label size: {label.shape}")

The following code demonstrates the DepthwiseSeparableConv class, which


inherits from nn.Module. In the constructor (__init__), we define the layers of the
networks using the PyTorch module such as nn.Conv2d(), nn.BatchNorm2d(), and
nn.ReLU. The networks consists of two convolution layers module nn.Conv2d(), two
nn.BatchNorm2d(), and two nn.ReLU.
Depthwise Convolution Code:
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_channels, out_channels, stride):
super(DepthwiseSeparableConv, self).__init__()

# Depthwise Convolution
self.depthwise = nn.Conv2d(in_channels, in_channels, kernel_size=3,
stride=stride, padding=1, groups=in_channels)
self.bn_depthwise = nn.BatchNorm2d(in_channels)
self.relu = nn.ReLU()

# Pointwise Convolution
self.pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1,
stride=1, padding=0)
self.bn_pointwise = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU()

def forward(self, x):


# Depthwise Convolution
x = self.depthwise(x)
x = self.bn_depthwise(x)
x = self.relu(x)

# Pointwise Convolution
x = self.pointwise(x)
x = self.bn_pointwise(x)
x = self.relu(x)

return x

The following code demonstrates the SimpleConv class, which inherits from
nn.Module. In the constructor (__init__), we define the layers of the networks
using the PyTorch module such as nn.Conv2d(), nn.BatchNorm2d(), and nn.ReLU.
The networks consists of one convolution layers module nn.Conv2d(), one
nn.BatchNorm2d(), and one nn.ReLU.
Simple Convolution Code (3x3):
class SimpleConv(nn.Module):
def __init__(self, in_channels, out_channels, stride):
super(SimpleConv, self).__init__()

self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=3,


stride=stride, padding=1)
self.bn = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU()

def forward(self, x):


x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x

MobileNet Code:
class MobileNet(nn.Module):
def __init__(self):
super(MobileNet, self).__init__()
self.model = nn.Sequential(
SimpleConv(3,32,stride=2),

DepthwiseSeparableConv(32,64,stride=1),

DepthwiseSeparableConv(64,128,stride=2),

DepthwiseSeparableConv(128,128,stride=1),

DepthwiseSeparableConv(128,256,stride=2),

DepthwiseSeparableConv(256,256,stride=1),

DepthwiseSeparableConv(256,512,stride=2),

# Repeat the following block 5 times

DepthwiseSeparableConv(512,512,stride=1),
DepthwiseSeparableConv(512,512,stride=1),
DepthwiseSeparableConv(512,512,stride=1),
DepthwiseSeparableConv(512,512,stride=1),
DepthwiseSeparableConv(512,512,stride=1),

DepthwiseSeparableConv(512,1024,stride=2),

DepthwiseSeparableConv(1024,1024,stride=1),

nn.AvgPool2d(7)
)

self.classifier = nn.Sequential(
nn.Linear(1024, 10)
)

Here we define the MobileNet class, which inherits from nn.Module. In the
constructor (__init__), we define the layers of the networks using the PyTorch
module such as nn.Conv2d(), nn.BatchNorm2d(), nn.ReLU, nn.AvgPool2d() and
nn.Linear(). Specially, the networks consists of two convolution layers (self.model
and self.classifier). In self.model layer there are 13 depthwise convolution blocks,
1 simple Conv block and one Avgpooling layer block (AvgPool2d()). In self.classifier
layer there is one fully connected layer. We also use BatchNorm2d() ReLU
activation functions after the depthwise convolution layer and simple convolution
layer.
def forward(self, x):
x = self.model(x)
#print(x.shape)
x = x.view(x.size(0), -1)
#print(x.shape)
x = self.classifier(x)
#print(x.shape)
return x
In the forward method, we define the forward pass of the network. Here we apply
the depthwise convolution, simple convolution and avg-pooling layers, then
reshape the output tensor to 1D tensor using the view method, and finally apply
the fully connected layer. The view method is used to flatten the tensor, so it can
be fed to the fully connected layer. The output of the network is returned.
Check our Model:
torch.manual_seed(42)
model_2 = MobileNet().to(device)

print(model_2)

Here we check our model_2 which we created.


Set up a Loss function and Optimizer:
# Setting the loss function
loss_fn = nn.CrossEntropyLoss()

# Setting the optimizer with model parameters and learning rate


optimizer = torch.optim.SGD(params = model_2.parameters(),
lr = 0.01,
momentum = 0.9)

Here we instantiate the network by creating an object of the MobileNet class. We


also define the loss function and optimizer to be used during training. We use the
cross-entropy loss function (nn.CrossEntropyLoss) and the stochastic gradient
descent optimizer (torch.optim.SGD). The learning rate is set to 0.01 and the
momentum is set to 0.9.
Check the Loss in the Training Batches:
# This is defined to print how many steps are remaining when training

total_step = len(train_loader)

# Set the epoch


num_epochs = 5

for epoch in range(num_epochs):


for i, (images, labels) in enumerate(train_loader):

model_2 = model_2.to(device)
images = images.to(device)
labels = labels.to(device)

# Forward Pass
outputs = model_2(images)
loss = loss_fn(outputs, labels)

# Backward and Optimize


optimizer.zero_grad()
loss.backward()
optimizer.step()

if (i+1) % 400 == 0:
print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1,
num_epochs, i+1, total_step, loss.item()))

In this block of code, we train the network. We loop over the training data for a
specified number of epochs (in this case, 5). In each epoch, we loop over the mini-
batches of the training data. For each mini-batch, we first extract the input data
and corresponding labels. We then set the gradient to zero using the
optimizer.zero_grad() method, which is necessary to avoid accumulation of
gradients from previous mini-batches. We pass the input data through the network
(output = net(inputs)) to obtain the output of the network. We then compute the
loss using the cross-entropy loss function and the output of the network (loss =
criterion(outputs, label)), and compute the gradients using loss.backward().
Finally, we update the network weights using the (optimizer.step()).
We also keep track of the running loss and print it every 2000 mini-batches using
the print statement. Once training is finished, we print a message indicating that
the training has completed.
Check the Accuracy of test images:
import torch
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model_2(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print ('Accuracy of the network on the 10000 test images: {}


%'.format(100 * correct / total))

Finally, we test the network on the test data. We loop over the test data and pass
the input data through the network (output = net(inputs)) to obtain the output of
the network. We then compute the predicted class by finding the index of the
maximum element in the output tensor (_, predicted = torch.max(output.data, 1)).
We compare the predicted class with the true class and count the number of
correct predictions. We then compute the accuracy of the network by dividing the
number of correct predictions by the total number of the test images and
multiplying by 100. We print the accuracy of the network on the test data using the
print statement which is shown above (99.35%).
Prediction Model:
def make_predictions(model: torch.nn.Module, data: list, device:
torch.device = device):
pred_probs = []
model.eval()
with torch.inference_mode():
for sample in data:
# Prepare sample
sample = torch.unsqueeze(sample, dim=0).to(device) # Add an
extra dimension and send sample to device

# Forward pass (model outputs raw logit)


pred_logit = model(sample)

# Get prediction probability (logit -> prediction probability)


pred_prob = torch.softmax(pred_logit.squeeze(), dim=0) # note:
perform softmax on the "logits" dimension, not "batch" dimension (in this
case we have a batch size of 1, so can perform on dim=0)

# Get pred_prob off GPU for further calculations


pred_probs.append(pred_prob.cpu())

# Stack the pred_probs to turn list into a tensor


return torch.stack(pred_probs)

Here we create the make_predictions functions takes a PyTorch model and list of
data samples, performs forward passes through the model in inference mode, and
returns a tensor containing the predicted probabilities for each class.
Check random test_samples and test_labels:
import random
#random.seed(42)
test_samples = []
test_labels = []
for sample, label in random.sample(list(test_dataset), k=9):
test_samples.append(sample)
test_labels.append(label)

# View the first test sample shape and label


print(f"Test sample image shape: {test_samples[0].shape}\nTest sample
label: {test_labels[0]} ({class_names[test_labels[0]]})")

The above code demonstrates that randomly selects a subset of 9 test samples
from a given dataset (test_dataset) stores them in lists, and prints information
about the first sample in terms of its image shape and label. Note that the behavior
might be different if the random.seed(42) line is uncommented, it would set the
random seed for reproducibility.
Making Predictions:
# Move model to GPU
model_2 = model_2.to(device)
test_samples = [tensor.to(device) for tensor in test_samples]

# Make predictions on test samples with model 2


pred_probs= make_predictions(model=model_2,
data=test_samples).to(device)

# View first two prediction probabilities list


pred_probs[:2]

The above code demonstrate that both the model (model-2) and the test samples
are on the GPU, making predictions on the test samples, and then displaying the
predicted probabilities for the first two samples. The use of the .to(device) method
is crucial for moving tensors and the model between CPU and GPU based on the
specified device.
Check Test Labels:
test_labels

Here we check the


Check out Predictions:
# Convert predictions prob to labels
pred_classes = pred_probs.argmax(dim=1)
pred_classes

The above code demonstrates that the predicted probabilities obtained from the
model (pred-probs) are converted to predicted class labels (pred-classes) by
selecting the class with the highest probability for each sample. The
(argmax(dim=1)) method is applied to the predicted probabilities tensor (pred-
probs). This method returns the labels of the maximum values along the specified
dimension (dim=1) which is the class dimension. Each element in (pred-classes)
represents the predicted class label for the corresponding test sample. If (pred-
classes[0]) is 2, it means that the model predicts class 2 for the first test sample.
Plot the Test Images and Predicted Images:
import matplotlib.pyplot as plt
# Plot predictions
plt.figure(figsize=(9,9))
nrows = 3
ncols = 3
for i, sample in enumerate(test_samples):
# Create a subplot
plt.subplot(nrows, ncols, i+1)

# Pot the target image


sample_pil = to_pil_image(sample)
plt.imshow(sample_pil)
plt.title(class_names[label])

# Find the prediction in text form, e.g Sandal


pred_label = class_names[pred_classes[i]]

# Get the truth label


truth_label = class_names[test_labels[i]]

# Create a tittle
title_text = f"Pred: {pred_label} | Turth: {truth_label}"

# Check for equality between pred and truth and change color of title
text
if pred_label == truth_label:
plt.title(title_text, fontsize=10, c="g") #green text if prediction is
same as truth
else:
plt.title(title_text, fontsize=10, c="r")

plt.axis(False)
Results:

You might also like