0% found this document useful (0 votes)
13 views26 pages

Image Classification From Scratch

The document outlines a process for training an image classifier from scratch using the Kaggle Cats and Dogs dataset. It details steps including organizing and preprocessing images, teaching a model to recognize cats and dogs, and utilizing data augmentation to enhance the training dataset. The document also provides code snippets for downloading, cleaning, and visualizing the dataset, as well as setting up the necessary tools for model training.

Uploaded by

huighlet.zowonu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views26 pages

Image Classification From Scratch

The document outlines a process for training an image classifier from scratch using the Kaggle Cats and Dogs dataset. It details steps including organizing and preprocessing images, teaching a model to recognize cats and dogs, and utilizing data augmentation to enhance the training dataset. The document also provides code snippets for downloading, cleaning, and visualizing the dataset, as well as setting up the necessary tools for model training.

Uploaded by

huighlet.zowonu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Image Classification from scratch

Author: fchollet
Further Explanation: Huighlet Eyram Zowonu
Date created: 2020/04/27
Last Modified: 2023/11/09
Description: Training an image classifier from scratch on Kaggle Cats and Dogs dataset.

Teaching a Computer to See Cats and Dogs

Imagine you have a robot friend who wants to learn how to recognize pictures of cats and dogs. It’s
never seen a cat or dog before, so it doesn’t know what they look like! Today, we’re going to teach it
by showing it lots of pictures and helping it practise until it gets really good at telling them apart.

We have a big photo album with lots of pictures of cats and dogs. This photo album is called a
dataset, and it’s just like a photo book where each picture is labelled. One side has cats, and the other
side has dogs. This specific album is called the Cats vs. Dogs Dataset, and it’s going to help our robot
learn to tell the difference between these two animals.

Now, we’re starting from scratch, which means our robot has no idea what a cat or dog is yet. Some
robots already know how to recognize objects like cars, trees, or people, but our robot doesn’t have
that experience—it’s a blank slate. So, we’ll be teaching it everything it needs to know about cats and
dogs, one picture at a time.

Organising the Photos

To make things easier for our robot, we’ll use a special tool called image_dataset_from_directory.
Imagine you have folders labelled “Cats” and “Dogs,” and each folder is filled with pictures. This tool
will gather all the photos from these folders and make sure the robot knows which photos are cats and
which ones are dogs. It’s like helping your robot friend keep its photo album nice and organised so it
doesn’t get confused!

Prepping the Pictures

Just like we need to get ready before doing something new, we need to prepare these photos for the
robot. This preparation is called preprocessing. Here’s how we do it:

1. Standardising the Images: First, we make sure all the photos are the same size and colour
format. It’s like cutting out all the photos so they’re the same shape—this makes it much
easier for the robot to look at them without getting distracted by different sizes.
2. Adding Extra Practice Photos (Data Augmentation): If we don’t have enough pictures, we
can make some more by creating variations of each photo. We might rotate or flip a picture of
a cat or dog so the robot sees it from different angles. This is like showing the robot the same
dog but sometimes upside down or turned to the side. The more it practises, the better it gets
at recognizing them from different viewpoints!

Teaching the Robot

Once all the photos are organised and prepared, we can start the real learning! Here’s what happens
next:

● Each time the robot sees a photo, it tries to guess if it’s a cat or a dog.
● At first, it might get a lot of them wrong, but that’s okay! Every time it makes a mistake, it
learns a little more.
● After looking at hundreds of pictures, the robot starts getting better and better at telling cats
from dogs.

Soon, it can look at a new picture, one it’s never seen before, and say with confidence, “This is a cat!”
or “This is a dog!” It becomes a Cat-and-Dog Expert, ready to recognize furry friends in photos!

The Tools We’re Importing


import os

The os module helps us work with files and folders on our computer. Think of it as a helper
that knows where to find all of our cat and dog photos. We’ll use os to help our computer look
in the right places when it needs to find the images.

import numpy as np

numpy (imported as np for short) is a powerful tool for working with numbers. Computers see
pictures as grids of numbers, where each number represents the colour of a tiny dot (or pixel)
in the picture. numpy helps us handle all those numbers easily, which is important when we’re
showing lots of photos to our computer friend.

import keras and from keras import layers

keras is like a recipe book for building "neural networks," which are special types of
computer models that can learn by example. It provides us with all the ingredients we need to
create a model that can recognize images. The layers we’re importing are like building blocks,
allowing us to stack different pieces to build our image-recognizing robot.

from tensorflow import data as tf_data

tensorflow is a powerful library that helps us create and train our models. Here, we’re
importing a tool called data from TensorFlow (and calling it tf_data for short). This tool will
help us load our images in a way that the computer can easily process. Think of it as a
conveyor belt that brings each image to our model so it can learn from them one by one.

import matplotlib.pyplot as plt


matplotlib is like a drawing tool that lets us create graphs and pictures. We’re calling it plt to
make it easier to type. Once our model starts learning, we can use plt to show how well it’s
doing by creating charts, or even by displaying some of the cat and dog images!

Setup
AC1
import os

import numpy as np

import keras

from keras import layers

from tensorflow import data as tf_data

import matplotlib.pyplot as plt

Downloading and Preparing the Data


Downloading the Dataset
!curl -O
https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kag
glecatsanddogs_5340.zip

The curl command is like asking the internet for a specific file. Here, it’s downloading a big
ZIP file (around 786MB) that contains lots of cat and dog images.

The -O flag just means we want to keep the original file name (kagglecatsanddogs_5340.zip).

Note: The ! at the beginning tells the computer this is a command for the shell (not Python),
which is why we can download the file directly.

Unzipping the File

!unzip -q kagglecatsanddogs_5340.zip

Now that we’ve downloaded the ZIP file, it’s like a packed suitcase that we need to unpack to
see everything inside.
The unzip command does this unpacking for us. The -q flag (quiet) keeps the unzipping
process from showing too much information on the screen.

Listing the Files


!ls

This command lets us take a quick look at what’s inside our current folder, so we can confirm
that our images have been unpacked correctly.

Raw data download


AC2
!curl -O
https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-
8368-6DEBA77B919F/kagglecatsanddogs_5340.zip

AC3
!unzip -q kagglecatsanddogs_5340.zip

AC4
!ls

Now that we’ve unzipped the file, we have a folder called PetImages. Inside this folder, we’ll find two
other folders: Cat and Dog. Each of these folders contains images of cats and dogs, respectively.

Listing the PetImages Folder Contents


AC5
!ls PetImages
Here’s a breakdown of what each part of the command does:

● The !ls command lists all the files and folders in a specific location. Here, PetImages is the
folder we’re checking, so this command will show us its contents.
● By running this command, we’ll see two subfolders named Cat and Dog.
Each subfolder contains a bunch of image files (like .jpg files) that we’ll use to teach our model. We
now know where all of our cat and dog photos are.

In this part, we’re cleaning up our cat and dog image folders to make sure every image is usable.
Sometimes, images get corrupted, meaning they’re broken or unreadable by our model. Here’s how
this code finds and removes these corrupted images:

Checking for Corrupted Images


num_skipped = 0

for folder_name in ("Cat", "Dog"):

folder_path = os.path.join("PetImages", folder_name)

for fname in os.listdir(folder_path):

fpath = os.path.join(folder_path, fname)

try:

fobj = open(fpath, "rb") # Open image file in binary mode (as bytes)

is_jfif = b"JFIF" in fobj.peek(10) # Check if "JFIF" appears in the first 10 bytes

finally:

fobj.close()

1. Counting Skipped Files: num_skipped = 0 starts a counter to keep track of how many
corrupted images we delete.
2. Looping Through Folders: The for folder_name in ("Cat", "Dog") loop goes through the Cat
and Dog folders in PetImages.
3. Checking Each Image File:
○ We create folder_path, the path to each image folder, and then fpath, the path to each
image file.
○ We open each file as fobj in "rb" (read-binary) mode, allowing us to look at the raw
data in the file.

Detecting Corruption

The trick here is to check if the file contains a special pattern of characters, JFIF, which usually
appears in valid JPEG images. If it’s missing, the file might be corrupted.

● is_jfif = b"JFIF" in fobj.peek(10) checks if JFIF appears in the first 10 bytes of the file. If it
doesn’t, the image may be broken.

Removing Corrupted Images


if not is_jfif:
num_skipped += 1
os.remove(fpath) # Delete corrupted image

● If an image doesn’t have JFIF, we add one to our num_skipped counter and delete the file.

Finally, we print how many images were deleted:

print(f"Deleted {num_skipped} images.")

This cleaning step is crucial to ensure our model won’t have trouble when trying to "look at" each
picture. After running this, our folders will only contain healthy, readable images, ready for training!

AC6

num_skipped = 0

for folder_name in ("Cat", "Dog"):

folder_path = os.path.join("PetImages", folder_name)

for fname in os.listdir(folder_path):

fpath = os.path.join(folder_path, fname)

try:

fobj = open(fpath, "rb")

is_jfif = b"JFIF" in fobj.peek(10)

finally:

fobj.close()

if not is_jfif:

num_skipped += 1

# Delete corrupted image


os.remove(fpath)

print(f"Deleted {num_skipped} images.")

Now that our cat and dog images are cleaned up, we’re ready to turn them into a dataset that our
model can understand and learn from. Here’s what each line does in this code to make that happen:

Creating the Dataset

image_size = (180, 180)

batch_size = 128

1. Setting Image Size: We define image_size = (180, 180). This will resize every image to 180
pixels by 180 pixels, making each picture the same size. This helps the model process images
consistently, no matter the original dimensions.
2. Batch Size: We define batch_size = 128, which means our images will be grouped into
batches of 128. Processing images in batches helps speed up the training and reduces the
computer's memory usage.

Loading the Dataset

train_ds, val_ds = keras.utils.image_dataset_from_directory(

"PetImages",

validation_split=0.2,

subset="both",

seed=1337,

image_size=image_size,

batch_size=batch_size,

Let’s break down each parameter in image_dataset_from_directory:


1. Directory: "PetImages" tells our program where to find the images.
2. Validation Split: validation_split=0.2 means we’re setting aside 20% of the images for
validation. Validation images are used to check the model’s performance on new data, while
the remaining 80% will be used for training.
3. Subset: "both" tells the function to create both the training and validation datasets at the same
time.
4. Seed: seed=1337 ensures the images are shuffled in a consistent way each time we run the
code, so we get the same train/validation split.
5. Image Size: image_size=image_size resizes each image to the 180x180 size we defined.
6. Batch Size: batch_size=batch_size groups the images into batches of 128.

AC7
image_size = (180, 180)
batch_size = 128

train_ds, val_ds = keras.utils.image_dataset_from_directory(


"PetImages",
validation_split=0.2,
subset="both",
seed=1337,
image_size=image_size,
batch_size=batch_size,
)

Resulting Datasets

After running this, we’ll have two datasets:

● train_ds: This is our training dataset, containing 80% of the images. Our model will use these
images to learn how to identify cats and dogs.
● val_ds: This is our validation dataset, containing 20% of the images. The model won’t see
these images during training, so we’ll use them to test how well it can recognize cats and dogs
in new images.

Now that we have our dataset ready, let’s take a look at some of the images in it! This code snippet
will display the first 9 images in the training dataset, giving us a preview of what our model will be
learning from.
Visualising the First 9 Images in the Training Dataset

plt.figure(figsize=(10, 10))

● We’re creating a figure (or canvas) that’s 10 inches by 10 inches in size. This will be the grid
where we display our images.

for images, labels in train_ds.take(1):

for i in range(9):

ax = plt.subplot(3, 3, i + 1)

plt.imshow(np.array(images[i]).astype("uint8"))

plt.title(int(labels[i]))

plt.axis("off")

1. Accessing Images and Labels:


○ for images, labels in train_ds.take(1): grabs the first batch of images and labels from
train_ds.
○ images is a batch of pictures, and labels tells us whether each image is a cat or a dog.
2. Looping Through the First 9 Images:
○ for i in range(9): loops through the first 9 images in the batch.
○ Each image is displayed in a 3x3 grid layout, created by plt.subplot(3, 3, i + 1).
3. Displaying Each Image:
○ plt.imshow(np.array(images[i]).astype("uint8")) shows each image by converting it to
uint8 format (a standard format for displaying images).
○ plt.title(int(labels[i])) adds a title above each image. Each label is an integer: typically
0 for cats and 1 for dogs.
○ plt.axis("off") removes the axes around each image, giving a cleaner look.

AC8
plt.figure(figsize=(10, 10))

for images, labels in train_ds.take(1):

for i in range(9):

ax = plt.subplot(3, 3, i + 1)
plt.imshow(np.array(images[i]).astype("uint8"))

plt.title(int(labels[i]))

plt.axis("off")
After running this code, we’ll see a 3x3 grid of the first 9 images from the training dataset. Each
image will have a title indicating whether it’s a 0 (cat) or a 1 (dog). This visualisation step helps us
confirm that our dataset is loaded correctly and gives us an idea of what the model will be "seeing"
during training!

To help our model learn better, especially when we have a small dataset, we can use data
augmentation. This means creating slightly modified versions of each image to make our dataset
appear larger and more varied. By flipping, rotating, or otherwise adjusting images, we expose the
model to different "views" of the data, which helps prevent it from "memorizing" specific images—a
problem called overfitting.

Setting Up Data Augmentation

data_augmentation_layers = [

layers.RandomFlip("horizontal"),

layers.RandomRotation(0.1),

1. Random Horizontal Flip: layers.RandomFlip("horizontal") will flip each image horizontally,


meaning left-to-right. This is useful because a dog or cat can face either direction, and
flipping helps the model learn both orientations.
2. Random Rotation: layers.RandomRotation(0.1) rotates each image by up to 10% of a full
circle (36 degrees). Small rotations add variety without changing the image too much.

These small, random changes create new variations of each image without needing additional data.

Creating the Augmentation Function

def data_augmentation(images):

for layer in data_augmentation_layers:

images = layer(images)
return images

● This function data_augmentation(images) applies each transformation in


data_augmentation_layers to a batch of images.
● Each layer (flip or rotation) modifies the images slightly, and then the function returns the
newly augmented images.

Visualising Augmented Images

To see what our augmented images look like, we can apply data_augmentation to the first few images
in the training dataset:

plt.figure(figsize=(10, 10))

for images, _ in train_ds.take(1):

for i in range(9):

augmented_images = data_augmentation(images)

ax = plt.subplot(3, 3, i + 1)

plt.imshow(np.array(augmented_images[0]).astype("uint8"))

plt.axis("off")

1. Setting Up the Figure: plt.figure(figsize=(10, 10)) creates a blank 10x10 grid for our images.
2. Looping Through Images:
○ for images, _ in train_ds.take(1): retrieves one batch of images.
○ for i in range(9): selects the first 9 images from that batch.
3. Applying Data Augmentation:
○ augmented_images = data_augmentation(images) applies our flip and rotation
transformations to the batch.
4. Displaying the Augmented Image:
○ Each transformed image is displayed in a 3x3 grid using plt.subplot(3, 3, i + 1).
○ plt.imshow(np.array(augmented_images[0]).astype("uint8")) converts the image to a
displayable format.
○ plt.axis("off") hides the axes to keep the grid tidy.

AC9
data_augmentation_layers = [
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]

def data_augmentation(images):
for layer in data_augmentation_layers:
images = layer(images)
return images

AC10
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(np.array(augmented_images[0]).astype("uint8"))
plt.axis("off")

After running this code, you'll see 9 images with small random transformations. These transformed
versions help the model learn to recognize cats and dogs in slightly different orientations, which will
make it better at recognizing them in real-world situations!

To help our model understand the image data better, we’ll standardize our images to make sure the
values are in a range that’s easier for the model to work with. We’ll also look at two different ways to
apply data augmentation and standardization.

Why Standardize Image Data?

Currently, each image has pixel values between 0 and 255 for each color channel (red, green, and
blue). Neural networks perform better when input values are smaller, so we’ll scale these values to a
range of 0 to 1. To do this, we’ll use a special layer in Keras called Rescaling.

● Rescaling Layer: Divides each pixel value by 255 so it falls between 0 and 1. This will be
one of the first steps in our model, making the data suitable for learning.
Two Ways to Preprocess the Data

When it comes to combining data augmentation and rescaling, we have two choices:

Option 1: Part of the Model (GPU-Friendly)

In this option, we include data augmentation and rescaling as layers within the model itself:

AC11
inputs = keras.Input(shape=input_shape)

x = data_augmentation(inputs) # Applies augmentation

x = layers.Rescaling(1./255)(x) # Scales pixel values to [0, 1]

# ... continue building the rest of the model

How it Works:

● Data augmentation happens on-device (i.e., on the GPU if available), alongside the rest of the
model.
● The transformations are applied only during training (when calling fit()), not when testing or
predicting (evaluate() or predict()).

This option is great if you have a GPU because it lets data augmentation happen on the same device,
benefiting from GPU acceleration.

Option 2: Outside the Model (CPU-Friendly)

Here, we apply data augmentation separately to the dataset itself, rather than including it in the model.

AC12
augmented_train_ds = train_ds.map(

lambda x, y: (data_augmentation(x, training=True), y))


How it Works:

● The augmented dataset, augmented_train_ds, is prepared asynchronously on the CPU. This


means images are processed and ready in advance without slowing down the model.
● The augmented images are stored in a buffer and fed to the model continuously, ensuring
smooth training.

This option is ideal if you’re training on a CPU, as it keeps the augmentation process separate from
the model and prevents any delays.

To make our model training faster and more efficient, we can configure the dataset for better
performance. This involves two main steps:

1. Applying Data Augmentation: We add slight modifications to each image, like flipping or
rotating them. By applying data_augmentation to our training images, we’re giving the model
more varied examples to learn from, which can improve its generalization.
2. Using Prefetching: Prefetching is like setting up the data in advance so that it’s ready exactly
when the model needs it, reducing delays during training. It helps us pull data from the disk
(storage) without slowing down the training process.

AC13

# Step 1: Apply `data_augmentation` to training images

train_ds = train_ds.map(

lambda img, label: (data_augmentation(img), label),

num_parallel_calls=tf_data.AUTOTUNE, # Automatically chooses the


best number of parallel calls for performance

# Step 2: Prefetching for smoother, faster training

train_ds = train_ds.prefetch(tf_data.AUTOTUNE)

val_ds = val_ds.prefetch(tf_data.AUTOTUNE)
How This Works

● num_parallel_calls=tf_data.AUTOTUNE: This tells the system to decide automatically the


best number of parallel calls to apply data_augmentation on the images. It helps speed up the
loading and processing of data.
● prefetch(tf_data.AUTOTUNE): Prefetching loads a batch of data into memory ahead of
time, so when the model finishes processing one batch, the next one is ready. This reduces
any waiting time, especially when training on a GPU.

Why This Matters

By augmenting and prefetching data, we keep the model constantly supplied with images, maximizing
GPU utilization and speeding up training. This approach makes sure that data loading doesn’t become
a bottleneck.

To build our image classification model, we're creating a smaller version of the well-known Xception
architecture. This model architecture uses special layers and techniques to help it learn patterns in
images efficiently.

Here's how the model is structured:

1. Data Standardization: We start with a Rescaling layer that scales image pixel values to the
[0, 1] range. This is important for helping the model learn effectively, as it makes pixel values
smaller and more manageable for the neural network.
2. Convolutional Blocks: Convolutional layers are the heart of a CNN model. They help the
model learn spatial patterns in images, like edges or textures. In this model, we use a special
type of convolution called Separable Convolution to make it both fast and efficient. Each
block of layers applies these convolutions, followed by:
○ Batch Normalization: Stabilizes the learning process by normalizing the outputs of
convolutional layers.
○ Activation (ReLU): Adds non-linearity, enabling the model to learn more complex
patterns.
○ Max Pooling: Reduces the size of the data, helping the model focus on the most
important features.
3. Residual Connections: Residual connections (also called "skip connections") add the
previous block’s output back into the current layer's output. This helps preserve information
across layers and makes training faster and more stable.
4. Global Pooling & Dropout: After all the convolutional layers, we use a Global Average
Pooling layer to compress each feature map into a single value. This helps reduce the total
number of parameters, making the model faster and more efficient. We also use a Dropout
layer to prevent overfitting by randomly ignoring parts of the model during training.
5. Output Layer: Finally, a Dense layer gives the output for classification. Since we’re working
with two classes (Cats vs. Dogs), we use a single unit in this layer to get our classification
result.

Here’s the code to build this model:


1. Defining the Function make_model

The make_model function constructs the architecture of the CNN. It takes two arguments:

● input_shape: Specifies the shape of the input images, e.g., (180, 180, 3) for
180x180 RGB images.
● num_classes: Number of output classes for classification. Here, it’s set to 2 for a binary
classification problem (cats vs. dogs).

2. Input Layer

inputs = keras.Input(shape=input_shape)

● This line creates the input layer, where we define the expected shape of the input data.
● inputs will be the entry point of data into our model.

3. Image Rescaling

x = layers.Rescaling(1.0 / 255)(inputs)

● Rescaling normalizes the pixel values to be between 0 and 1 (from the original range of 0 to
255).
● This is helpful for the model to learn efficiently, as it standardizes the inputs.

4. First Convolutional Block

x = layers.Conv2D(128, 3, strides=2, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

● Conv2D(128, 3, strides=2, padding="same"): This layer applies 128 filters to


the input, with a kernel size of 3x3, and a stride of 2, meaning it reduces the spatial size of the
output by half. padding="same" ensures that the output has the same dimensions as the
input.
● BatchNormalization(): This layer normalizes the output of the convolution to stabilize
and speed up training.
● Activation("relu"): ReLU (Rectified Linear Unit) introduces non-linearity, allowing
the network to learn complex features.
● The result is stored in previous_block_activation for later use as a residual
connection.

5. Intermediate Convolutional Blocks with Residual Connections

The code then enters a loop that repeats a pattern three times with increasing filter sizes (256, 512,
728).
for size in [256, 512, 728]:

x = layers.Activation("relu")(x)

x = layers.SeparableConv2D(size, 3, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

x = layers.SeparableConv2D(size, 3, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

● SeparableConv2D: This is a depthwise separable convolution, which reduces computation


by separating the spatial and channel dimensions. It’s efficient and is a key feature of
Xception.
● BatchNormalization and ReLU Activation are again used to stabilize and improve
learning.
● MaxPooling2D(3, strides=2, padding="same"): This layer reduces the spatial
dimensions of the output by taking the maximum value from each 3x3 block.

Residual Connection

residual = layers.Conv2D(size, 1, strides=2,


padding="same")(previous_block_activation)

x = layers.add([x, residual]) # Add back residual

previous_block_activation = x

● Residual Connection: This takes the output of a previous layer (stored in


previous_block_activation) and adds it to the output of the current block. It helps
retain information and allows the model to train deeper layers more effectively.
● Conv2D: A 1x1 convolution adjusts the dimensions of the residual to match the current
block’s output for successful addition.

6. Final Convolution and Pooling Layers

x = layers.SeparableConv2D(1024, 3, padding="same")(x)
x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

x = layers.GlobalAveragePooling2D()(x)

● SeparableConv2D(1024, 3): This is the final separable convolution layer, with 1024
filters to extract detailed features.
● GlobalAveragePooling2D: This layer computes the average of each feature map,
resulting in a single number per feature map, reducing the data size significantly and helping
the model generalize well.

7. Dropout for Regularization

x = layers.Dropout(0.25)(x)

● Dropout: Dropout randomly sets 25% of the activations to zero, preventing the model from
overfitting by making it more robust.

8. Output Layer

outputs = layers.Dense(units, activation=None)(x)

● Dense Layer: This is the final layer, with units = 1 since we have two classes (cats and
dogs). The activation=None setting outputs raw logits instead of applying a function
like sigmoid or softmax.

9. Model Creation

model = make_model(input_shape=image_size + (3,), num_classes=2)

● Model Instantiation: Here, we create the model using our defined function make_model.

10. Visualizing the Model

keras.utils.plot_model(model, show_shapes=True)

● This line generates a visual representation of the model’s structure, displaying the shapes of
each layer’s output.

AC14
from keras import layers, models
def make_model(input_shape, num_classes):

inputs = keras.Input(shape=input_shape)

# Entry block

x = layers.Rescaling(1.0 / 255)(inputs)

x = layers.Conv2D(128, 3, strides=2, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

previous_block_activation = x # Set aside residual

# Convolutional blocks with residual connections

for size in [256, 512, 728]:

x = layers.Activation("relu")(x)

x = layers.SeparableConv2D(size, 3, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

x = layers.SeparableConv2D(size, 3, padding="same")(x)

x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

# Residual connection

residual = layers.Conv2D(size, 1, strides=2, padding="same")(

previous_block_activation

x = layers.add([x, residual]) # Add back residual

previous_block_activation = x # Set aside next residual

# Final convolution layers

x = layers.SeparableConv2D(1024, 3, padding="same")(x)

x = layers.BatchNormalization()(x)

x = layers.Activation("relu")(x)

x = layers.GlobalAveragePooling2D()(x)

# Configure units for binary classification (Cats vs Dogs)

units = 1 if num_classes == 2 else num_classes

x = layers.Dropout(0.25)(x) # Dropout for regularization

outputs = layers.Dense(units, activation=None)(x)


return keras.Model(inputs, outputs)

# Create model with input shape and number of classes

model = make_model(input_shape=image_size + (3,), num_classes=2)

# Visualize the model structure

keras.utils.plot_model(model, show_shapes=True)

In this model:

● The Conv2D and SeparableConv2D layers do most of the feature extraction.


● BatchNormalization and ReLU activations help the model learn more stably.
● The residual connections and dropout help the model generalise well on new images.

This model is now ready to be trained on our Cats vs. Dogs dataset!

Train the model


1. Setting up Epochs and Callbacks

epochs = 25

callbacks = [

keras.callbacks.ModelCheckpoint("save_at_{epoch}.keras"),

● epochs = 25: The model will be trained for 25 epochs, meaning the model will go through
the entire training dataset 25 times.
● callbacks: This defines a list of callback functions that will be executed during training.
○ ModelCheckpoint: This callback saves the model weights after each epoch. The
model is saved in a file with the format save_at_{epoch}.keras, where
{epoch} is replaced by the epoch number. This is useful in case training is
interrupted and you want to resume from the last saved checkpoint.

2. Compiling the Model

python

Copy code

model.compile(

optimizer=keras.optimizers.Adam(3e-4),

loss=keras.losses.BinaryCrossentropy(from_logits=True),

metrics=[keras.metrics.BinaryAccuracy(name="acc")],

● optimizer=keras.optimizers.Adam(3e-4): The Adam optimizer is used to


minimize the loss function. The learning rate is set to 3e-4 (0.0003), which is a typical
starting value for training.
● loss=keras.losses.BinaryCrossentropy(from_logits=True): Since we
are dealing with a binary classification problem (cats vs. dogs), the loss function used is
binary cross-entropy. The from_logits=True argument indicates that the model outputs
raw logits (before applying activation like sigmoid), so this option ensures proper
computation of the binary cross-entropy loss.
● metrics=[keras.metrics.BinaryAccuracy(name="acc")]: This sets the
evaluation metric to binary accuracy, which is the percentage of correct classifications for
binary classification tasks.

3. Training the Model

model.fit(

train_ds,

epochs=epochs,

callbacks=callbacks,

validation_data=val_ds,

● train_ds: This is the training dataset that yields batches of augmented and rescaled
images.
● epochs=epochs: The number of epochs the model will train for (25 epochs as set earlier).
● callbacks=callbacks: The ModelCheckpoint callback is passed here so the model
weights will be saved after every epoch.
● validation_data=val_ds: This specifies the validation dataset that will be used to
evaluate the model at the end of each epoch. It helps monitor how well the model generalizes
to unseen data during training.

AC15
epochs = 25

callbacks = [

keras.callbacks.ModelCheckpoint("save_at_{epoch}.keras"),

model.compile(

optimizer=keras.optimizers.Adam(3e-4),

loss=keras.losses.BinaryCrossentropy(from_logits=True),

metrics=[keras.metrics.BinaryAccuracy(name="acc")],

model.fit(

train_ds,

epochs=epochs,

callbacks=callbacks,

validation_data=val_ds,
)
Run inference on new data
Note that data augmentation and dropout are inactive at inference time.

1. Loading the Image

img = keras.utils.load_img("PetImages/Cat/6779.jpg",
target_size=image_size)

plt.imshow(img)

● keras.utils.load_img(): This function loads an image from the given


file path ("PetImages/Cat/6779.jpg"). It resizes the image to the target
size of image_size (180x180 pixels in this case).
● target_size=image_size: Ensures that the image is resized to
180x180 pixels to match the input size expected by the model.
● plt.imshow(img): Displays the image using matplotlib.

2. Preparing the Image for Prediction

img_array = keras.utils.img_to_array(img)

img_array = keras.ops.expand_dims(img_array, 0) # Create


batch axis

● keras.utils.img_to_array(img): Converts the image (img) into a


NumPy array, which is needed for model input. This converts the image into an
array with shape (height, width, channels), where height and
width are the dimensions of the image (180x180), and channels is 3 for
RGB images.
● keras.ops.expand_dims(img_array, 0): Expands the array by
adding a batch dimension at the start. This is necessary because models expect
batches of images as input, even if the batch size is 1. The resulting shape will
be (1, 180, 180, 3), indicating one image with 180x180 pixels and 3
channels (RGB).

3. Making Predictions
predictions = model.predict(img_array)

● model.predict(img_array): This makes a prediction using the trained


model. The model takes the image (now a batch of size 1) and returns a
prediction. Since this is a binary classification task (cat or dog), the model will
output a score (logits) indicating how likely the image is a "dog."
● The shape of predictions will be (1, 1) because it's a single prediction
for one image, where the single value represents the model's confidence that
the image is a dog.

4. Post-processing the Prediction

score = float(keras.ops.sigmoid(predictions[0][0]))

● keras.ops.sigmoid(predictions[0][0]): The model's output is a


raw logit (a real number). To convert this logit into a probability between 0 and
1, the sigmoid activation function is applied. The sigmoid function squashes
the logit into a range of [0, 1].
● predictions[0][0]: The prediction for the single image is accessed from
the result. Since the result is an array of shape (1, 1),
predictions[0][0] extracts the scalar value.
● float(): Converts the result into a floating-point number.

5. Displaying the Results

print(f"This image is {100 * (1 - score):.2f}% cat and


{100 * score:.2f}% dog.")

● 100 * (1 - score): The probability that the image is a cat is 1 -


score. This is because the model output corresponds to the likelihood of the
image being a dog, so 1 - score gives the probability of it being a cat.
● 100 * score: The probability that the image is a dog.
● print(f"..."): This prints the result as a percentage. The format :.2f
ensures that the values are displayed with two decimal places.
AC16
img = keras.utils.load_img("PetImages/Cat/6779.jpg",
target_size=image_size)

plt.imshow(img)

img_array = keras.utils.img_to_array(img)

img_array = keras.ops.expand_dims(img_array, 0) # Create batch axis

predictions = model.predict(img_array)

score = float(keras.ops.sigmoid(predictions[0][0]))
print(f"This image is {100 * (1 - score):.2f}% cat and {100 * score:.2f}%
dog.")

Our Robot’s Journey Continues

And that’s how we taught a computer to tell cats and dogs apart! Now, anytime it sees a picture of a
cat or dog, it can shout out its answer. Thanks to our careful preparation and practice, our robot has
learned to see in a whole new way. Who knows what it’ll learn to recognize next!

You might also like