How can Tensorflow be used to standardize the data using Python?

Last Updated : 11 Jul, 2022

In this article, we are going to see how to use standardize the data using Tensorflow in Python.

What is Data Standardize?

The process of converting the organizational structure of various datasets into a single, standard data format is known as data standardization. It is concerned with the modification of datasets following their collection from various sources and before their loading into target systems. It requires a significant amount of time and iteration to complete, resulting in extremely accurate, efficient, time-consuming integration and development effort.

How can Tensorflow be used to standardize the data?

We are using the flower dataset for understanding how can Tensorflow be used to standardize the data using Python. That Flower dataset contains several thousands of images of flowers with proper naming. There is one sub-directory for each class inside its five sub-directories. The flower dataset will be loaded into the environment for use after being downloaded using the 'get_file' method.
Now, let's try to understand how we can download the flower dataset but before downloading we need to import some of the python libraries, and to run the code below, we use Google Collaborate.

Import libraries

In the first step, we import some of the important tensorflow and python libraries that we are going to use in the further process.

Python

import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import pathlib as pt

Download the Dataset

we are using a Flower dataset that contains five sub-directories and one for each class. so, for using that dataset we need to download it first. and for downloading the dataset we need get_file() method.

Python3

dataset_url = "https://storage.googleapis.com/\
download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', 
                                   origin=dataset_url, 
                                   untar=True)
data_dir = pt.Path(data_dir)

You should now have a copy of the dataset after downloading. There are a total of 3,670 images. and you can count the images on the dataset by using the code below:

Python3

img_count = len(list(data_dir.glob('*/*.jpg')))
print(img_count)

Output:

In the dataset we have 5 categories of flowers available roses, tulips, daisy, dandelion, and sunflowers. so you can check according to their category name and using the code below:

Python3

roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[0]))

Load the Dataset

For loading the dataset you need to define some parameters for the loader. Now, we need to split the dataset and by default, we are using 60% of the flower dataset as training and 40% for testing.

Python3

batch_size = 32
img_height = 180
img_width = 180

train_ds = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    validation_split=0.4,
    subset="training",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size)

Output:

Found 3670 files belonging to 5 classes.
Using 2202 files for training.

Standardize the dataset

The RGB channel values are between 0 and 255. This is not ideal for a neural network; in general, try to keep your input values as minimal as possible.

We can standardize values to fall between [0, 1] by using a rescaling layer(tensorflow.keras.layers.Rescaling)