Tensorflow in A Nutshell
Tensorflow in A Nutshell
Thanks to
TENSORFLOW IN A
NUTSHELL!
ADAPTED FROM UDACITY SELF-DRIVING CAR ENGINEERING
NANODEGREE CLASS
CHRISTOS KYRKOU, PHD
Christos Kyrkou, PhD Tensorflow Quick Guide
Install
OS X or Linux
Prerequisites
Intro to TensorFlow requires Python 3.4 or higher and Anaconda. If you don't meet all of
these requirements, please install the appropriate package(s).
Install TensorFlow
You're going to use an Anaconda environment. If you're unfamiliar with Anaconda
environments, check out the official documentation.
Run the following commands to setup your environment:
Windows
Install Docker
Download and install Docker from the official Docker website.
Hello, world!
Try running the following code in your Python console to make sure you have
TensorFlow properly installed. The console will print "Hello, world!" if TensorFlow is
installed. Don’t worry about understanding what it does. You’ll learn about it in the next
section.
1
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
import tensorflow as tf
Errors
If you're getting the
error tensorflow.python.framework.errors.InvalidArgumentError: Placeholder:0
is both fed and fetched, you're running an older version of TensorFlow. Uninstall
TensorFlow, and reinstall it using the instructions above. For more solutions, check out
the Common Problems section.
import tensorflow as tf
Tensor
In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated
in an object called a tensor. In the case of hello_constant = tf.constant('Hello
World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety
of sizes as shown below:
# A is a 0-dimensional int32 tensor
A = tf.constant(1234)
# B is a 1-dimensional int32 tensor
B = tf.constant([ [123,456,789] ])
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])
tf.constant() is one of many TensorFlow operations you will use in this lesson. The
tensor returned by tf.constant() is called a constant tensor, because the value of the
tensor never changes.
Session
TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a
mathematical process. Let’s take the TensorFlow code you ran and turn that into a graph:
2
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Input
In the last section, you passed a tensor into a session and it returned the result. What if
you want to use a non tensor? This is where tf.placeholder() and feed_dict come
into place. In this section, you'll go over the basics of feeding data into TensorFlow.
tf.placeholder()
Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time
you'll want your TensorFlow model to take in different datasets with different parameters.
You need tf.placeholder()!
tf.placeholder() returns a tensor that gets its value from data passed to
the tf.session.run()function, allowing you to set the input right before the session
runs.
Session’s feed_dict
x = tf.placeholder(tf.string)
3
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
TensorFlow Math
Getting the input is great, but now you need to use it. You're going to use basic math
functions that everyone knows and loves - add, subtract, multiply, and divide - with
tensors. (There's many more math functions you can check out in the documentation.)
Addition
x = tf.add(5, 2) #7
You’ll start with the add function. The tf.add() function does exactly what you expect it
to do. It takes in two numbers, two tensors, or one of each, and returns their sum as a
tensor.
x = tf.sub(10, 4) # 6
y = tf.mul(2, 5) # 10
The x tensor will evaluate to 6, because 10 - 4 = 6. The y tensor will evaluate to 10,
because 2 * 5 = 10. That was easy!
tf.Variable()
x = tf.Variable(5)
The tf.Variable class creates a tensor with an initial value that can be modified, much like
a normal Python variable. This tensor stores its state in the session, so you must initialize
the state of the tensor manually. You'll use
the tf.global_variables_initializer() function to initialize the state of all the
Variable tensors.
Initialization
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
The tf.global_variables_initializer() call returns an operation that will initialize all
TensorFlow variables from the graph. You call the operation using a session to initialize all
the variables as shown above. Using the tf.Variable class allows us to change the weights
and bias, but an initial value needs to be chosen.
Initializing the weights with random numbers from a normal distribution is good practice.
Randomizing the weights helps the model from becoming stuck in the same place every
time you train it. You'll learn more about this in the next lesson, when you study gradient
descent.
4
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Similarly, choosing weights from a normal distribution prevents any one weight from
overwhelming other weights. You'll use the tf.truncated_normal() function to generate
random numbers from a normal distribution.
tf.truncated_normal()
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
The tf.truncated_normal() function returns a tensor with random values from a normal
distribution whose magnitude is no more than 2 standard deviations from the mean.
Since the weights are already helping prevent the model from getting stuck, you don't need
to randomize the bias. Let's use the simplest solution, setting the bias to 0.
tf.zeros()
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))
The tf.zeros() function returns a tensor with all zeros.
Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.
The images above are trained weights for each label (0, 1, and 2). The weights display the
unique properties of each digit they have found.
5
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Linear Function
Function y = Wx
Let’s derive the function y = Wx + b. We want to translate our input, x, to labels, y.
For example, imagine we want to classify images as digits.
x would be our list of pixel values, and y would be the logits, one for each digit. Let's
take a look at y = Wx, where the weights, W, determine the influence of x at predicting
each y.
y = Wx allows us to segment the data into their respective labels using a line.
However, this line has to pass through the origin, because whenever x equals 0,
then y is also going to equal 0.
We want the ability to shift the line away from the origin to fit more complex data. The
simplest solution is to add a number to the function, which we call “bias”.
Function y = Wx + b
6
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Softmax Function
Softmax
The next step is to assign a probability to each label, which you can then use to classify the
data. Use the softmax function to turn your logits into probabilities.
In the one dimensional case, the array is just a single set of logits. In the two dimensional
case, each column in the array is a set of logits. The softmax(x) function should return a
NumPy array of the same shape as x.
TensorFlow Mini-batching
In order to use mini-batching, you must first divide your data into batches.
Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal
size. For example, imagine you'd like to create batches of 128 samples each from a
dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with
7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)
In that case, the size of the batches would vary, so you need to take advantage of
TensorFlow's tf.placeholder() function to receive the varying batch sizes.
Continuing the example, if each sample had n_input = 784 features and n_classes =
10 possible labels, the dimensions for features would be [None,
n_input] and labels would be [None, n_classes].
7
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
TensorFlow ReLUs
8
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Step by Step
In the following walkthrough, we'll step through TensorFlow code written to classify the
letters in the MNIST database. You can find this and many more examples of TensorFlow
at Aymeric Damien's GitHub repository.
Code
TensorFlow MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
You'll use the MNIST dataset provided by TensorFlow, which batches and One-Hot encodes
the data for you.
Learning Parameters
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128 # Decrease batch size if you don't have enough memory
display_step = 1
9
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Input
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])
Multilayer Perceptron
Optimizer
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
.minimize(cost)
Session
# Initializing the variables
init = tf.global_variables_initializer()
10
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
That's it! Going from one layer to two is easy. Adding more layers to the network allows you
to solve more complicated problems. In the next video, you'll see how changing the number
of layers can affect your network.
Fortunately, TensorFlow gives you the ability to save your progress using a class
called tf.train.Saver. This class provides the functionality to save any tf.Variable to
your file system.
Saving Variables
Let's start with a simple example of saving weights and bias Tensors. For the first
example, you'll just save two variables. Later examples will save all the weights in a
practical model.
import tensorflow as tf
11
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
Bias:
Loading Variables
Now that the Tensor Variables are saved, let's load them back into a new model.
# Remove the previous weights and bias
tf.reset_default_graph()
Weights:
Bias:
You'll notice you still need to create the weights and bias Tensors in Python.
The tf.train.Saver.restore() function loads the saved data into weights and bias.
Since tf.train.Saver.restore() sets all the TensorFlow Variables, you don't need to
call tf.global_variables_initializer().
12
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
learning_rate = 0.001
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Let's train that model, then save the weights:
import math
save_file = 'train_model.ckpt'
batch_size = 128
n_epochs = 100
saver = tf.train.Saver()
# Training cycle
for epoch in range(n_epochs):
total_batch = math.ceil(mnist.train.num_examples / batch_size)
13
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
test_accuracy = sess.run(
accuracy,
feed_dict={features: mnist.test.images, labels: mnist.test.labels}
)
That's it! You now know how to save and load a trained model in TensorFlow. Let's look at
loading weights and biases into modified models in the next section.
14
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
However, loading saved Variables directly into a modified model can generate errors. Let's
go over how to avoid these problems.
Naming Error
TensorFlow uses a string identifier for Tensors and Operations called name. If a name is not
given, TensorFlow will create one automatically. TensorFlow will give the first node the
name <Type>, and then give the name <Type>_<number> for the subsequent nodes. Let's see
how this can affect loading a model with a different order of weights and bias:
import tensorflow as tf
save_file = 'model.ckpt'
saver = tf.train.Saver()
saver = tf.train.Saver()
15
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
...
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to
match.
...
You'll notice that the name properties for weights and bias are different than when you
saved the model. This is why the code produces the "Assign requires shapes of both tensors
to match" error. The code saver.restore(sess, save_file) is trying to load weight data
into bias and bias data into weights.
Instead of letting TensorFlow set the name property, let's set it manually:
import tensorflow as tf
tf.reset_default_graph()
save_file = 'model.ckpt'
saver = tf.train.Saver()
saver = tf.train.Saver()
16
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
TensorFlow Dropout
Figure 1: Taken from the paper "Dropout: A Simple Way to Prevent Neural Networks from
Overfitting" (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)
Dropout is a regularization technique for reducing overfitting. The technique temporarily
drops units (artificial neurons) from the network, along with all of those units' incoming
and outgoing connections. Figure 1 illustrates how dropout works.
TensorFlow provides the tf.nn.dropout() function, which you can use to implement
dropout.
Let's look at an example of how to use tf.nn.dropout().
keep_prob = tf.placeholder(tf.float32) # probability to keep units
keep_prob allows you to adjust the number of units to drop. In order to compensate for
dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped)
by 1/keep_prob.
During training, a good starting value for keep_prob is 0.5.
During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of
the model.
...
...
17
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
sess.run(optimizer, feed_dict={
features: batch_features,
labels: batch_labels,
keep_prob: 0.5})
Dimensionality
From what we've learned so far, how can we calculate the number of neurons of each
layer in our CNN?
Given our input layer has a volume of W, our filter has a volume (height * width *
depth) of F, we have a stride of S, and a padding of P, the following formula gives us the
volume of the next layer: (W−F+2P)/S+1.
Knowing the dimensionality of each additional layer helps us understand how large our
model is and how our decisions around filter size and stride affect the size of our
network.
# Image Properties
image_width = 10
image_height = 10
color_channels = 3
# Convolution filter
filter_size_width = 5
filter_size_height = 5
# Input/Image
input = tf.placeholder(
tf.float32,
shape=[None, image_width, image_height, color_channels])
18
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAM
E')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)
The code above uses the tf.nn.conv2d() function to compute the convolution
with weight as the filter and [1, 2, 2, 1] for the strides. TensorFlow uses a stride for
each input dimension, [batch, input_height, input_width, input_channels]. We
are generally always going to set the stride for batch and input_channels (i.e. the first and
fourth element in the strides array) to be 1.
You'll focus on changing input_height and input_width while
setting batch and input_channels to 1. The input_height and input_width strides are
for striding the filter over input. This example code uses a stride of 2 with 5x5 filter
over input.
The tf.nn.bias_add() function adds a 1-d bias to the last dimension in a matrix.
19
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
...
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAM
E')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
conv_layer,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='SAME')
The tf.nn.max_pool() function performs max pooling with the ksize parameter as the
size of the filter and the strides parameter as the length of the stride. 2x2 filters with a
stride of 2x2 are common in practice.
The ksize and strides parameters are structured as 4-element lists, with each element
corresponding to a dimension of the input tensor ([batch, height, width, channels]).
For both ksize and strides, the batch and channel dimensions are typically set to 1.
The structure of this network follows the classic structure of CNNs, which is a mix of
convolutional layers and max pooling, followed by fully-connected layers.
The code you'll be looking at is similar to what you saw in the segment on Deep Neural
Network in TensorFlow, except we restructured the architecture of this network as a CNN.
Just like in that segment, here you'll study the line-by-line breakdown of the code. If you
want, you can even download the code and run it yourself.
Thanks to Aymeric Damien for providing the original TensorFlow model on which this
segment is based.
Time to dive in!
Dataset
You've seen this section of code from previous lessons. Here we're importing the MNIST
dataset and using a convenient TensorFlow function to batch, scale, and One-Hot encode the
data.
import tensorflow as tf
# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128
# Network Parameters
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units
20
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([n_classes]))}
Convolutions
The middle two elements are the strides for height and width respectively. I've mentioned
stride as one number because you usually have a square stride where height = width.
When someone says they are using a stride of 3, they usually mean tf.nn.conv2d(x, W,
strides=[1, 3, 3, 1]).
21
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
To make life easier, the code is using tf.nn.bias_add() to add the bias.
Using tf.add() doesn't work when the tensors aren't the same shape.
Max Pooling
Model
22
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
In the code below, we're creating 3 layers alternating between convolutions and max
pooling followed by a fully connected and output layer. The transformation of each layer to
new dimensions are shown in the comments. For example, the first layer shapes the images
from 28x28x1 to 28x28x32 in the convolution step. Then next step applies max pooling,
turning each sample into 14x14x32. All the layers are applied from conv1to output,
producing 10 class predictions.
def conv_net(x, weights, biases, dropout):
# Layer 1 - 28*28*1 to 14*14*32
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
conv1 = maxpool2d(conv1, k=2)
Session
Now let's run it!
# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)
# Model
logits = conv_net(x, weights, biases, keep_prob)
# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
23
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class
Christos Kyrkou, PhD Tensorflow Quick Guide
24
Adapted from UDACITY Self-Driving Car Engineer Nanodegree class