0% found this document useful (0 votes)
2 views15 pages

02_asl.ipynb (4) - JupyterLab

This document outlines the process of image classification using a dataset of American Sign Language (ASL) hand gestures. It details the steps for data preparation, model creation, and training, including loading the dataset from CSV files, normalizing the image data, and building a neural network model using PyTorch. The document also emphasizes the use of Kaggle as a resource for datasets and deep learning projects.

Uploaded by

Hoàng Bảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views15 pages

02_asl.ipynb (4) - JupyterLab

This document outlines the process of image classification using a dataset of American Sign Language (ASL) hand gestures. It details the steps for data preparation, model creation, and training, including loading the dataset from CSV files, normalizing the image data, and building a neural network model using PyTorch. The document also emphasizes the use of Kaggle as a resource for datasets and deep learning projects.

Uploaded by

Hoàng Bảo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

3/13/25, 9:53 AM 02_asl

Header

2. Image Classification of an American Sign Language Dataset


In this section, we will perform the data preparation, model creation, and model training steps we observed in the last section using
a different dataset: images of hands making letters in American Sign Language.

2.1 Objectives

Prepare image data for training


Create and compile a simple model for image classification
Train an image classification model and observe the results

In [1]: import torch.nn as nn


import pandas as pd
import torch
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


torch.cuda.is_available()

Out[1]: True

2.2 American Sign Language Dataset


The American Sign Language alphabet contains 26 letters. Two of those letters (j and z) require movement, so they are not included
in the training dataset.

No description has been provided for this image

100.28.217.36/lab/lab/tree/02_asl.ipynb 1/15
3/13/25, 9:53 AM 02_asl

2.2.1 Kaggle
This dataset is available from the website Kaggle, which is a fantastic place to find datasets and other deep learning resources. In
addition to providing resources like datasets and "kernels" that are like these notebooks, Kaggle hosts competitions that you can
take part in, competing with others in training highly accurate models.

If you're looking to practice or see examples of many deep learning projects, Kaggle is a great site to visit.

2.3 Loading the Data


This dataset is not available via TorchVision in the same way that MNIST is, so let's learn how to load custom data. By the end of this
section we will have x_train , y_train , x_valid , and y_valid variables.

2.3.1 Reading in the Data


The sign language dataset is in CSV (Comma Separated Values) format, the same data structure behind Microsoft Excel and Google
Sheets. It is a grid of rows and columns with labels at the top, as seen in the train and valid datasets (they may take a moment to
load).

To load and work with the data, we'll be using a library called Pandas, which is a highly performant tool for loading and
manipulating data. We'll read the CSV files into a format called a DataFrame.

Pandas has a read_csv method that expects a csv file, and returns a DataFrame:

In [2]: train_df = pd.read_csv("data/asl_data/sign_mnist_train.csv")


valid_df = pd.read_csv("data/asl_data/sign_mnist_valid.csv")

2.3.2 Exploring the Data


Let's take a look at our data. We can use the head method to print the first few rows of the DataFrame. Each row is an image which
has a label column, and also, 784 values representing each pixel value in the image, just like with the MNIST dataset. Note that
the labels currently are numerical values, not letters of the alphabet:

100.28.217.36/lab/lab/tree/02_asl.ipynb 2/15
3/13/25, 9:53 AM 02_asl

In [3]: train_df.head()

Out[3]: label pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel775 pixel776 pixel777 pixel778 pixel7

0 3 107 118 127 134 139 143 146 150 153 ... 207 207 207 207 2

1 6 155 157 156 156 156 157 156 158 158 ... 69 149 128 87

2 2 187 188 188 187 187 186 187 188 187 ... 202 201 200 199 1

3 2 211 211 212 212 211 210 211 210 210 ... 235 234 233 231 2

4 12 164 167 170 172 176 179 180 184 185 ... 92 105 105 108 1

5 rows × 785 columns

 

2.3.3 Extracting the Labels


Let's store our training and validation labels in y_train and y_valid variables. We can use the pop method to remove a
column from our DataFrame and assign the removed values to a variable.

In [4]: y_train = train_df.pop('label')


y_valid = valid_df.pop('label')
y_train

Out[4]: 0 3
1 6
2 2
3 2
4 12
..
27450 12
27451 22
27452 17
27453 16
27454 22
Name: label, Length: 27455, dtype: int64

100.28.217.36/lab/lab/tree/02_asl.ipynb 3/15
3/13/25, 9:53 AM 02_asl

2.3.4 Extracting the Images


Next, let's store our training and validation images in x_train and x_valid variables. Here we create those variables:

In [5]: x_train = train_df.values


x_valid = valid_df.values
x_train

Out[5]: array([[107, 118, 127, ..., 204, 203, 202],


[155, 157, 156, ..., 103, 135, 149],
[187, 188, 188, ..., 195, 194, 195],
...,
[174, 174, 174, ..., 202, 200, 200],
[177, 181, 184, ..., 64, 87, 93],
[179, 180, 180, ..., 205, 209, 215]])

2.3.5 Summarizing the Training and Validation Data


We now have 27,455 images with 784 pixels each for training...

In [6]: x_train.shape

Out[6]: (27455, 784)

...as well as their corresponding labels:

In [7]: y_train.shape

Out[7]: (27455,)

For validation, we have 7,172 images...

In [8]: x_valid.shape

Out[8]: (7172, 784)

...and their corresponding labels:

100.28.217.36/lab/lab/tree/02_asl.ipynb 4/15
3/13/25, 9:53 AM 02_asl

In [9]: y_valid.shape

Out[9]: (7172,)

2.4 Visualizing the Data


To visualize the images, we will again use the matplotlib library. We don't need to worry about the details of this visualization, but if
interested, you can learn more about matplotlib at a later time.

Note that we'll have to reshape the data from its current 1D shape of 784 pixels, to a 2D shape of 28x28 pixels to make sense of the
image:

In [10]: import matplotlib.pyplot as plt


plt.figure(figsize=(40,40))

num_images = 20
for i in range(num_images):
row = x_train[i]
label = y_train[i]

image = row.reshape(28,28)
plt.subplot(1, num_images, i+1)
plt.title(label, fontdict={'fontsize': 30})
plt.axis('off')
plt.imshow(image, cmap='gray')

2.4.1 Normalize the Image Data


As we did with the MNIST dataset, we are going to normalize the image data, meaning that their pixel values, instead of being
between 0 and 255 as they are currently:

In [11]: x_train.min()

100.28.217.36/lab/lab/tree/02_asl.ipynb 5/15
3/13/25, 9:53 AM 02_asl

Out[11]: 0

In [12]: x_train.max()

Out[12]: 255

In the previous lab, we used ToTensor, but we can also modify our data before turning it into a tensor.

In [13]: x_train = train_df.values / 255


x_valid = valid_df.values / 255

2.4.2 Custom Datasets


We can use PyTorch's Dataset tools in order to create our own dataset. __init__ will run once when the class is initialized.
__getitem__ returns our images and labels.

Since our dataset is small enough, we can store it on our GPU for faster processing. In the previous lab, we sent our data to the GPU
when it was drawn from each batch. Here, we will send it to the GPU in the __init__ function.

In [14]: class MyDataset(Dataset):


def __init__(self, x_df, y_df):
self.xs = torch.tensor(x_df).float().to(device)
self.ys = torch.tensor(y_df).to(device)

def __getitem__(self, idx):


x = self.xs[idx]
y = self.ys[idx]
return x, y

def __len__(self):
return len(self.xs)

A custom PyTorch dataset works just like a prebuilt one. It should be passed to a DataLoader for model training.

In [15]: BATCH_SIZE = 32

train_data = MyDataset(x_train, y_train)

100.28.217.36/lab/lab/tree/02_asl.ipynb 6/15
3/13/25, 9:53 AM 02_asl

train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)


train_N = len(train_loader.dataset)

In [16]: valid_data = MyDataset(x_valid, y_valid)


valid_loader = DataLoader(valid_data, batch_size=BATCH_SIZE)
valid_N = len(valid_loader.dataset)

We can verify the DataLoader works as expected with the code below. We'll make the DataLoader iterable, and then call next to
draw the first hand from the deck.

In [17]: train_loader

Out[17]: <torch.utils.data.dataloader.DataLoader at 0x7f77fd871960>

Try running the below a few times. The values should change each time.

In [18]: batch = next(iter(train_loader))


batch

Out[18]: [tensor([[0.6078, 0.6078, 0.6078, ..., 0.7255, 0.7176, 0.7098],


[0.5137, 0.5294, 0.5490, ..., 0.7882, 0.7412, 0.6471],
[0.5843, 0.6039, 0.6157, ..., 0.2275, 0.1843, 0.1725],
...,
[0.3373, 0.4784, 0.2667, ..., 0.8784, 0.8941, 0.8941],
[0.0863, 0.0941, 0.1059, ..., 0.2745, 0.1922, 0.1412],
[0.6745, 0.6824, 0.6902, ..., 0.8863, 0.8745, 0.8745]],
device='cuda:0'),
tensor([ 2, 16, 20, 5, 9, 20, 23, 4, 3, 13, 4, 11, 14, 11, 11, 9, 23, 7,
21, 7, 8, 2, 3, 12, 22, 0, 14, 22, 14, 9, 14, 11],
device='cuda:0')]

Notice the batch has two values. The first is our x , and the second is our y . The first dimension of each should have 32 values,
which is the batch_size .

In [19]: batch[0].shape

Out[19]: torch.Size([32, 784])

In [20]: batch[1].shape

100.28.217.36/lab/lab/tree/02_asl.ipynb 7/15
3/13/25, 9:53 AM 02_asl

Out[20]: torch.Size([32])

2.5 Build the Model


We've created our DataLoaders, now it's time to build our models.

Exercise

For this exercise we are going to build a sequential model. Just like last time, build a model that:

Has a flatten layer.


Has a dense input layer. This layer should contain 512 neurons amd use the relu activation function
Has a second dense layer with 512 neurons which uses the relu activation function
Has a dense output layer with neurons equal to the number of classes

We will define a few variables to get started:

In [21]: input_size = 28 * 28
n_classes = 24

Do your work in the cell below, creating a model variable to store the model. We've imported the Sequental model class and
Linear layer class to get you started. Reveal the solution below for a hint:

In [22]: model = nn.Sequential(

Solution

In [23]: # SOLUTION
model = nn.Sequential(
nn.Flatten(),
nn.Linear(input_size, 512), # Input
nn.ReLU(), # Activation for input
nn.Linear(512, 512), # Hidden
nn.ReLU(), # Activation for hidden

100.28.217.36/lab/lab/tree/02_asl.ipynb 8/15
3/13/25, 9:53 AM 02_asl

nn.Linear(512, n_classes) # Output


)

This time, we'll combine compiling the model and sending it to the GPU in one step:

In [24]: model = torch.compile(model.to(device))


model

Out[24]: OptimizedModule(
(_orig_mod): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=784, out_features=512, bias=True)
(2): ReLU()
(3): Linear(in_features=512, out_features=512, bias=True)
(4): ReLU()
(5): Linear(in_features=512, out_features=24, bias=True)
)
)

Since categorizing these ASL images is similar to categorizing MNIST's handwritten digits, we will use the same loss_function
(Categorical CrossEntropy) and optimizer (Adam). nn.CrossEntropyLoss includes the softmax function, and is
computationally faster when passing class indices as opposed to predicted probabilities.

In [25]: loss_function = nn.CrossEntropyLoss()


optimizer = Adam(model.parameters())

2.6 Training the Model


This time, let's look at our train and validate functions in more detail.

2.6.1 The Train Function


This code is almost the same as in the previous notebook, but we no longer send x and y to our GPU because our DataLoader
already does that.

Before looping through the DataLoader, we will set the model to model.train to make sure its parameters can be updated. To make
it easier for us to follow training progress, we'll keep track of the total loss and accuracy .

100.28.217.36/lab/lab/tree/02_asl.ipynb 9/15
3/13/25, 9:53 AM 02_asl

Then, for each batch in our train_loader , we will:

1. Get an output prediction from the model


2. Set the gradient to zero with the optimizer 's zero_grad function
3. Calculate the loss with our loss_function
4. Compute the gradient with backward
5. Update our model parameters with the optimizer 's step function.
6. Update the loss and accuracy totals

In [26]: def train():


loss = 0
accuracy = 0

model.train()
for x, y in train_loader:
output = model(x)
optimizer.zero_grad()
batch_loss = loss_function(output, y)
batch_loss.backward()
optimizer.step()

loss += batch_loss.item()
accuracy += get_batch_accuracy(output, y, train_N)
print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))

2.6.2 The Validate Function


The model does not learn during validation, so the validate function is simpler than the train function above.

One key difference is we will set the model to evaluation mode with model.evaluate, which will prevent the model from updating
any parameters.

In [27]: def validate():


loss = 0
accuracy = 0

model.eval()

100.28.217.36/lab/lab/tree/02_asl.ipynb 10/15
3/13/25, 9:53 AM 02_asl

with torch.no_grad():
for x, y in valid_loader:
output = model(x)

loss += loss_function(output, y).item()


accuracy += get_batch_accuracy(output, y, valid_N)
print('Valid - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))

2.6.3 Calculating the Accuracy


Both the train and validate functions use get_batch_accuracy , but we have not defined that in this notebook yet.

Exercise

The function below has three FIXME s. Each one corresponds to the functions input arguments. Can you replace each FIXME with
the correct argument?

It may help to view the documentation for argmax, eq, and view_as.

In [28]: def get_batch_accuracy(output, y, N):


pred = FIXME.argmax(dim=1, keepdim=True)
correct = pred.eq(FIXME.view_as(pred)).sum().item()
return correct / FIXME

Solution

Click the ... below for the solution.

In [29]: # SOLUTION
def get_batch_accuracy(output, y, N):
pred = output.argmax(dim=1, keepdim=True)
correct = pred.eq(y.view_as(pred)).sum().item()
return correct / N

2.6.3 The Training Loop


Let's bring it all together! Run the cell below to train the data for 20 epochs .

100.28.217.36/lab/lab/tree/02_asl.ipynb 11/15
3/13/25, 9:53 AM 02_asl

In [30]: epochs = 20

for epoch in range(epochs):


print('Epoch: {}'.format(epoch))
train()
validate()

100.28.217.36/lab/lab/tree/02_asl.ipynb 12/15
3/13/25, 9:53 AM 02_asl

Epoch: 0
Train - Loss: 1585.5740 Accuracy: 0.3956
Valid - Loss: 323.2296 Accuracy: 0.5409
Epoch: 1
Train - Loss: 729.8412 Accuracy: 0.7089
Valid - Loss: 273.1569 Accuracy: 0.6051
Epoch: 2
Train - Loss: 392.3283 Accuracy: 0.8469
Valid - Loss: 229.9985 Accuracy: 0.6694
Epoch: 3
Train - Loss: 213.5582 Accuracy: 0.9214
Valid - Loss: 223.7834 Accuracy: 0.7375
Epoch: 4
Train - Loss: 133.0160 Accuracy: 0.9525
Valid - Loss: 206.5498 Accuracy: 0.7670
Epoch: 5
Train - Loss: 70.6900 Accuracy: 0.9776
Valid - Loss: 272.6428 Accuracy: 0.7375
Epoch: 6
Train - Loss: 90.4645 Accuracy: 0.9663
Valid - Loss: 267.6985 Accuracy: 0.7270
Epoch: 7
Train - Loss: 40.0967 Accuracy: 0.9885
Valid - Loss: 226.3849 Accuracy: 0.7741
Epoch: 8
Train - Loss: 53.9879 Accuracy: 0.9817
Valid - Loss: 219.3163 Accuracy: 0.7987
Epoch: 9
Train - Loss: 50.4869 Accuracy: 0.9823
Valid - Loss: 220.8031 Accuracy: 0.8133
Epoch: 10
Train - Loss: 2.9627 Accuracy: 0.9999
Valid - Loss: 235.2547 Accuracy: 0.8108
Epoch: 11
Train - Loss: 79.4522 Accuracy: 0.9765
Valid - Loss: 291.3555 Accuracy: 0.7305
Epoch: 12
Train - Loss: 17.9432 Accuracy: 0.9954
Valid - Loss: 247.6815 Accuracy: 0.7829
Epoch: 13
Train - Loss: 57.6867 Accuracy: 0.9796
Valid - Loss: 229.8657 Accuracy: 0.8002

100.28.217.36/lab/lab/tree/02_asl.ipynb 13/15
3/13/25, 9:53 AM 02_asl

Epoch: 14
Train - Loss: 2.0873 Accuracy: 1.0000
Valid - Loss: 237.5475 Accuracy: 0.8115
Epoch: 15
Train - Loss: 67.7856 Accuracy: 0.9788
Valid - Loss: 220.0935 Accuracy: 0.8069
Epoch: 16
Train - Loss: 3.8451 Accuracy: 0.9997
Valid - Loss: 245.0788 Accuracy: 0.8021
Epoch: 17
Train - Loss: 55.6508 Accuracy: 0.9807
Valid - Loss: 222.9318 Accuracy: 0.8150
Epoch: 18
Train - Loss: 1.8120 Accuracy: 1.0000
Valid - Loss: 245.3994 Accuracy: 0.8134
Epoch: 19
Train - Loss: 1.0882 Accuracy: 1.0000
Valid - Loss: 250.8158 Accuracy: 0.8116

2.6.4 Discussion: What happened?


We can see that the training accuracy got to a fairly high level, but the validation accuracy was not as high. What happened here?

Think about it for a bit before clicking on the '...' below to reveal the answer.

# SOLUTION This is an example of the model learning to categorize the training data, but performing poorly against new data that
it has not been trained on. Essentially, it is memorizing the dataset, but not gaining a robust and general understanding of the
problem. This is a common issue called overfitting. We will discuss overfitting in the next two lectures, as well as some ways to
address it.

2.7 Summary
In this section you built your own neural network to perform image classification that is quite accurate. Congrats!

At this point we should be getting somewhat familiar with the process of loading data (including labels), preparing it, creating a
model, and then training the model with prepared data.

100.28.217.36/lab/lab/tree/02_asl.ipynb 14/15
3/13/25, 9:53 AM 02_asl

2.7.1 Clear the Memory


Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on to the next notebook.

In [31]: import IPython


app = IPython.Application.instance()
app.kernel.do_shutdown(True)

Out[31]: {'status': 'ok', 'restart': True}

2.7.2 Next
Now that you have built some very basic, somewhat effective models, we will begin to learn about more sophisticated models,
including Convolutional Neural Networks.

Header

100.28.217.36/lab/lab/tree/02_asl.ipynb 15/15

You might also like