How can Tensorflow be used to standardize the data using Python?
Last Updated :
11 Jul, 2022
In this article, we are going to see how to use standardize the data using Tensorflow in Python.
What is Data Standardize?
The process of converting the organizational structure of various datasets into a single, standard data format is known as data standardization. It is concerned with the modification of datasets following their collection from various sources and before their loading into target systems. It requires a significant amount of time and iteration to complete, resulting in extremely accurate, efficient, time-consuming integration and development effort.
How can Tensorflow be used to standardize the data?
We are using the flower dataset for understanding how can Tensorflow be used to standardize the data using Python. That Flower dataset contains several thousands of images of flowers with proper naming. There is one sub-directory for each class inside its five sub-directories. The flower dataset will be loaded into the environment for use after being downloaded using the 'get_file' method.
Now, let's try to understand how we can download the flower dataset but before downloading we need to import some of the python libraries, and to run the code below, we use Google Collaborate.
Import libraries
In the first step, we import some of the important tensorflow and python libraries that we are going to use in the further process.
Python
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import pathlib as pt
Download the Dataset
we are using a Flower dataset that contains five sub-directories and one for each class. so, for using that dataset we need to download it first. and for downloading the dataset we need get_file() method.
Python3
dataset_url = "https://storage.googleapis.com/\
download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos',
origin=dataset_url,
untar=True)
data_dir = pt.Path(data_dir)
You should now have a copy of the dataset after downloading. There are a total of 3,670 images. and you can count the images on the dataset by using the code below:
Python3
img_count = len(list(data_dir.glob('*/*.jpg')))
print(img_count)
Output:
3670
In the dataset we have 5 categories of flowers available roses, tulips, daisy, dandelion, and sunflowers. so you can check according to their category name and using the code below:
Python3
roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[0]))
Load the Dataset
For loading the dataset you need to define some parameters for the loader. Now, we need to split the dataset and by default, we are using 60% of the flower dataset as training and 40% for testing.
Python3
batch_size = 32
img_height = 180
img_width = 180
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.4,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
Output:
Found 3670 files belonging to 5 classes.
Using 2202 files for training.
Standardize the dataset
The RGB channel values are between 0 and 255. This is not ideal for a neural network; in general, try to keep your input values as minimal as possible.Â
We can standardize values to fall between [0, 1] by using a rescaling layer(tensorflow.keras.layers.Rescaling)
Python3
# create normalization layer
nrmzln_layer = layers.experimental.preprocessing.Rescaling(1./255)
print("The map function is used to apply \
this layer to the dataset. ")
nrmlztn_ds = train_ds.map(lambda x,
y: (nrmlztn_layer(x), y))
image_batch, labels_batch = next(iter(nrmlztn_ds))
first_image = image_batch[0]
# pixel values are in the range of [0,1].
print("minimum pixel value:", np.min(first_image),
" maximum pixel value:", np.max(first_image))
Output:
The map function is used to apply this layer to the dataset.Â
minimum pixel value: 0.0
maximum pixel value: 0.87026095
Similar Reads
How can Tensorflow be used to configure the dataset for performance? Tensorflow is a popular open-source platform for building and training machine learning models. It provides several techniques for loading and preparing the dataset to get the best performance out of the model. The correct configuration of the dataset is crucial for the overall performance of the mo
8 min read
Using the SavedModel format in Tensorflow TensorFlow is a popular deep-learning framework that provides a variety of tools to help users build, train, and deploy machine-learning models. One of the most important aspects of deploying a machine learning model is saving and exporting it to a format that can be easily used by other programs an
4 min read
Python | Creating tensors using different functions in Tensorflow Tensorflow is an open-source machine learning framework that is used for complex numerical computation. It was developed by the Google Brain team in Google. Tensorflow can train and run deep neural networks that can be used to develop several AI applications. What is a Tensor? A tensor can be descri
5 min read
One Hot Encoding using Tensorflow In this post, we will be seeing how to initialize a vector in TensorFlow with all zeros or ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array full of zeros and ones accordingly. Code: pyt
2 min read
How to Normalize Data Using scikit-learn in Python Data normalization is a crucial preprocessing step in machine learning. It ensures that features contribute equally to the model by scaling them to a common range. This process helps in improving the convergence of gradient-based optimization algorithms and makes the model training process more effi
4 min read
How to convert torch tensor to pandas dataframe? When working with deep learning models in PyTorch, you often deal with tensors. However, there are situations where you may need to convert these tensors into a Pandas DataFrame, especially when you're preparing data for analysis or visualization. In this article, we'll explore how to convert a PyTo
5 min read
Python - tensorflow.math.scalar_mul() TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks. scalar_mul() is used to multiply a tensor with a scalar. Syntax: tf.math.scalar_mul( scalar, x, name ) Parameters: scalar: It is a 0-D scalar tensor of known shape.x: It'
2 min read
How to Standardize Data in a Pandas DataFrame? Standardization is an essential step when preparing data for machine learning and analysis. Real-world datasets often contain columns or features with different scales for example, one feature might be age ranging from 20 to 70, while another might be income ranging from â¹30,000 to â¹10,00,000. If we
3 min read
TensorFlow - How to create a tensor of all ones that has the same shape as the input tensor TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning  neural networks. Method Used: ones_like: This method accepts a Tensor as input and returns a Tensor with same shape having all values set to one. Example 1: Python3 # importing the libra
1 min read
How to normalize a tensor to 0 mean and 1 variance in Pytorch? A tensor in PyTorch is like a NumPy array with the difference that the tensors can utilize the power of GPU whereas arrays can't. To normalize a  tensor, we transform the tensor such that the mean and standard deviation become 0 and 1 respectively. As we know that the variance is the square of the s
3 min read