Introduction to TensorFlow

TensorFlow is an open-source software library from Google. It was meant for dataflow programming across a range of tasks. It is a symbolic math library, and is largely used for machine learning applications such as neural networks. Originally it was developed by the Google Brain team for internal Google use. As the AI research community got more and more collaborative, TensorFlow was released under the Apache 2.0 open source license.

TensorFlow and its component Keras, are vastly used in implementing Deep Learning algorithms. Like most machine learning libraries, TensorFlow is "concept-heavy and code-lite". The syntax is not very difficult to learn. But its concepts are very important.

What is a Tensor?

According to the Wikipedia, "A tensor is a geometric object that maps in a multi-linear manner geometric vectors, scalars, and other tensors to a resulting tensor. Thereby, vectors and scalars themselves, often used already in elementary physics and engineering applications, are considered as the simplest tensors. Additionally, vectors from the dual space of the vector space, which supplies the geometric vectors, are also included as tensors. Geometric in this context is chiefly meant to emphasize independence of any selection of a coordinate system."

Don't worry it is not that complicated. When working on a problem with multiple variables, it is often convenient to collect them together in form of a vector or a matrix, so that it is easier to perform linear operations on them. Most of machine learning is based on such matrix operations - where a set of input values is processed together to get a set of output values.

For example, in the good old loan sanction problem, we consider several parameters of the subject (amount of loans taken in past, time taken to return, etc.) and sum them up with appropriate weights - to get an output number called the credit rating. This is implemented using simple matrix multiplication.

Such matrix multiplication gives us the result for just one case. When we want to train a neural network with data for a million such cases, we cannot multiply them one by one. That is where Tensors are used. Tensor is a data structure that represents a collection of matrices or vectors that allows us to perform operation on all the data samples at the same time. This gives us a great performance improvement.

Tensors could be an input value, a constants, a variables or just a reference to a mathematical operation on some other tensors.

Tensor may be 3D (collection of matrices) or 2D (collection of vectors) or 1D (collection of numbers) or even 0D (a single number). The number of dimensions do not make a Tensor - what is important is the concept of simultaneous operations on multiple entities.

Rank of a Tensor

The number of dimensions of the Tensor, is called the Rank of the Tensor. Hence we have several ranks possible.

RankMath entity
0Scalar (magnitude only)
1Vector (magnitude and direction)
2Matrix (table of numbers)
33-Tensor (cube of numbers)
nn-Tensor (n dimensional structure)

Constant Tensor

The simplest Tensor is one with a constant value. We can define with the explicit values or using methods defined for the frequently used ones.

t1 = tf.constant(1729)        # Constant Tensor with one element with value 1729
t2 = tf.constant([1,8,27,64])         # Constant Tensor of shape 4 with the given values 

t3 = tf.zeros([10,20], tf.float32)    # Constant Tensor of shape [10,20] - with each element set to 0
t4 = tf.zeros_like(t2)                # Constant Tensor of shape and datatype same as t2, with all elements set to 0

t5 = tf.ones([5,6], tf.int32)   # Constant Tensor of shape [5,6] with each value set to 1
t6 = tf.ones_like(t3)           # Constant Tensor of shape and datatype same as t3, with each element set to 1

t7 = tf.eye(10)                 # Identity matrix of size 10

t8 = tf.linspace(1.0, 3.0,5)    # [1.0, 1.5, 2.0, 2.5, 3.0] - Constant tensor with 5 equally spaced values from 1.0 to 3.0
t9 = tf.range(1.0,3.5, 0.5)     # [1.0, 1.5, 2.0, 2.5, 3.0] - Same as Python range. Note that Range excludes last element

t11 = tf.random_normal([4,5], mean = 5.0, stddev = 4, seed=1)   # Constant Tensor of shape [4,5] with random values of defined normal distribution
t12 = tf.random_uniform([4,5], maxval = 4.0, seed = 1)          # Constant Tensor with random values with defined uniform distribution.

Note that this does not assign the constant values to the Tensor. It only creates the Tensor that can be evaluated when required.

Variable Tensors

Constants allow us to create a predefined values that can be used in computations. But no computation is complete without variables. Training a neural network requires variables that can represent weights to be learnt in the process. These variables can be generated using the class tf.Variable.

weights = tf.Variable(tf.random_normal([10,10], stddev=1))

This generates a Variable Tensor - weights - that can be trained. But, we can also have Variable Tensors that cannot be altered - just like a constant.

constant = tf.Variable(tf.zeros([10,10]), trainable=False)

What is the use of doing something like that? Why not just define a constant? For a Tensor of 10x10, it makes more sense to create a constant Tensor. But when working with huge sizes, one should prefer variables. That is because variables are a lot more efficiently managed.


Constant Tensors and Variable Tensors are intuitively similar to constants and variables in any programming language. That does not take time to understand. The placeholders define Tensors that would get a value just before the code runs. In that sense, a placeholder can be compared to an input parameter.

x = tf.placeholder(tf.int32)

This generates a Tensor x - with an assurance that its value will be provided just before the code actually runs.

Lazy Execution

By design, TensorFlow is based on lazy execution (though we can force eager execution). That means, it does not actually process the data available till it has to. It just gathers all the information that we feed into it. It processes only when we finally ask it to process.

Such laziness (ironically) provides a huge improvement in the processing speed. To understand how, we need to understand the Nodes and Graphs of TensorFlow.


This is the textbook view of a neural network. As we can see, we have several inputs X1 - Xn. These form the first layer of the network. The second (hidden) layer is obtained as a dot product of each of these with the weight matrix, followed by the activation function like sigmoid or relu.

The third layer is just one value that is obtained as a dot product of its weight matrix with the output of the second layer.

For TensorFlow, each of these individual entities is a Node. The the first layer has n+1 nodes (n inputs and 1 constant). The second layer has k+1 nodes and the third layer has 1 node. Each of these nodes is represented by a Tensor.


We can see that some nodes have a constant value (e.g. the bias 1). Some of them have variable values like the weight matrix - we start with a random initialization and tune it through the process. And we have some nodes whose value is just based on some computation on the other nodes - these are dependent nodes - we cannot get their values until we have the values of the previous nodes.

In this network, we have k nodes in the middle layer and 1 node in the last layer that depend on other nodes - we have k+1 dependent nodes and k variables that we need to tune.


When we create individual Tensors, we just create individual nodes and assign the define the relations - these relations are yet to be implemented. Once we are done with the definitions, we initiate the compile() method, that identifies this graph connecting the nodes.

This is an important step in the whole process. If we have circular dependencies or any other reason that could break the graph, the error is identified at this point.


The TensorFlow computations are always executed in a "session". A Session is essentially an environment with a status of its own. Session is not a thread, but if we have two independent computations that need to run together - without influencing each other, we can use sessions.

Here, A and C will run under session 1, and will see one environment and B and D will run in the session 2 - and see another environment.

Once we have defined the nodes and compiled the graph, we can finally run the command to get value of a particular node in the graph. When we do so, TensorFlow looks back to check all the nodes that are required for this requested node. Only those nodes are evaluated in the appropriate order. Thus, a node in the graph is evaluated only when needed; only if it is needed.

This has a great impact on the processing speed and is a major advantage of TensorFlow.

TensorFlow Code Example

To understand TensorFlow it is very important to understand the core concepts of Constants, Variables, Placeholders and Sessions. Let us now work out an example that can display all of these concepts at once.

  • Ofcourse, we start by importing the TensorFlow module
import tensorflow as tf

Now, let us define a few Tensors. Here, t1 and t2 are constants, t3 is a placeholder and t4 is a variable.

t1 = tf.ones([4,5])
t2 = tf.random_uniform([5,4], maxval = 4.0, seed = 2)
t3 = tf.placeholder(tf.float32)
t4 = tf.get_variable("t4", [4,4], initializer = tf.ones_initializer)

Here, we define t1 as a Constant Tensor of size 4x5, with all the values set to 1. t2 is a Constant Tensor of size 5x4, with random values.

t3 is a placeholder with 0 dimensions - a single number with float32.

Along with this, we define a variable t4 of shape 4x4. The initializer is set to ones_initializer. That means, whenever we initialize the variable, its values will be set to 1. Note that this will happen only when we initialize the variable - not now.

  • Next, we can define the Tensor expression
exp = tf.assign(t4, tf.matmul(t1,t2) * t3 + t4)

This code takes a dot product of t1 and t2, multiplies it with the scalar t3 and then adds it to t4. The outcome of this is then assigned to t4. Thus, the value of t4 changes on every execution of this expression. Note that t3 is a placeholder, so we will have to provide the value of t3 when we want to process this expression.

Again, this code only defines the expression. It is not executed right away.

  • With everything in place, we can now get the session and start working with the Tensors
with tf.Session() as sess:
    print("Initialized variables\n----------------------------------------------")
    print(t4.eval()), feed_dict = {t3:1})
    print("\nAfter First Run\n--------------------------------------------------")
    print(t4.eval()), feed_dict = {t3:1})
    print("\nAfter Second Run\n-------------------------------------------------")
    print(t4.eval()), feed_dict = {t3:1})
    print("\nAfter Third Run\n--------------------------------------------------")

Here, we actually run the code. We start with initiating the session. We have to initialize the variables. So far, t4 was just declared, not initialized. Here, we actually initialize it. The initialize code is executed. In this case, it happens to be "tf.ones_initializer". So, t4 starts as a 4x4 Tensor with all values set to 1.

Next, we run the expression along with the feed_dict. Remember that the expression has a placeholder t3. It will not evaluate unless we give it a value for t3. This value is passed through feed_dict. Each run updates t4 and assigns a new value to it.

The above code generates this output:

Initialized variables
    [[1. 1. 1. 1.]
    [1. 1. 1. 1.]
    [1. 1. 1. 1.]
    [1. 1. 1. 1.]]

After First Run
    [[11.483105 10.39291  11.380319  9.601738]
    [11.483105 10.39291  11.380319  9.601738]
    [11.483105 10.39291  11.380319  9.601738]
    [11.483105 10.39291  11.380319  9.601738]]

After Second Run
    [[20.483215 16.11417  19.363663 15.015686]
    [20.483215 16.11417  19.363663 15.015686]
    [20.483215 16.11417  19.363663 15.015686]
    [20.483215 16.11417  19.363663 15.015686]]

After Third Run
    [[32.022038 26.227888 28.65984  23.72137 ]
    [32.022038 26.227888 28.65984  23.72137 ]
    [32.022038 26.227888 28.65984  23.72137 ]
    [32.022038 26.227888 28.65984  23.72137 ]]

One can evaluate the expression three times to assert the output is same as we expected.

A Simple Network

Sometimes, a few lines of code say a lot more than many pages of theory. Specially in a subject like Machine Learning, one is often left with a feeling that "all sounds great, but how do I do it!". Lets have a look at a simple implementation of the classification algorithm using TensorFlow. Classification is one of the frequent problems we work in AI. Typically we have a set of inputs that have to be classified into different categories. We can use TensorFlow to train a model for this task. Let us see how.


Like any Python module, TensorFlow is an external library that needs to be imported into the script before we can use it.

import tensorflow as tf

Along with TensorFlow, we typically import a few other libraries that make our life simpler. Keras is a part of TensorFlow. It helps us develop high order models very easily. We can create the models using TensorFlow alone. However, Keras simplifies our job.

from tensorflow import keras

NumPy is a default import in any machine learning task. We cannot life without it. Almost all data manipulations require NumPy.

import numpy as np

Another important module is the matplotlib. It is very important that we visualize the available data, to get a feel of what hides in it. Any amount of algorithmic analysis can not give us what we get by just looking at the data in a graphical form.

import matplotlib.pyplot as plt

TensorFlow has gone through a lot of changes over the versions. The concepts did not change much, but some of the methods have. It is a good practice to check the version we use. If we have a problem, we can check the help for the specific version. Many developers have faced the problem due to version conflict so it is a lot simpler to search forums for problems in specific versions.


All my code is based on version 1.12.0

MNIST Dataset

To share a brief introduction to the basic ideas, we can look into an implementation of a simple problem. MINST (Modified National Institute of Standards and Technology) gives a good dataset for of handwritten digits from 0 to 9. We can use this to train a neural network - and build a model that can read and decode handwritten digits.

This problem is often called the "Hello World" of Deep Learning. Of course, we need a lot more for developing "real" applications. This is good to give you an introduction to the topic. Of course, there is a lot more to TensorFlow. If you are interested in a detailed study, you can take up an online course.

Load Data

The first step is to load the available data. TensorFlow provides us with a good set of test datesets that we can use for learning and testing. The MINST dataset is also available in these. So the job of fetching the training and test data is quite simple in this case. In real life problems, accumulating, cleaning and loading such data is a major part of the work. Here we do it in just one line of code.

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

This gives us four Tensors - train_images, train_labels, test_images and test_labels. The load_data() method itself takes care of splitting the available data into the train and test sets. In a real life problem, we have to do this ourselves. But, for the example code, we can utility methods available along with the TensorFlow dataset.

Check the Data

As a best practice, one should always take a peek at the data available.

(60000, 28, 28)
(10000, 28, 28)

We know that we are working on an image classification problem. So let us try to check out how these images look.


Data Sanitization

The next step is to alter the available data to make it more suitable for training the model. There is a lot that we can do in this stage. But it makes very little sense for a data like this that is already cleaned and sanitized.

The image data is naturally two dimensional. That may be very good for viewing graphics. But, for training a neural network, we need single dimensional records. This requires "flattening" the data. Keras provides us easy ways to flatten the data within the model. But for generic preprocessing, it is much better to flatten it right away. TensorFlow also provides for that.

train_images = train_images.reshape(-1,784)
        test_images = test_images.reshape(-1,784)

The labels that we have are in terms of numbers 0-9. But, as an output of a neural network, these values have no numerical sequence. That is, 1 is less than 9. But when we read the images, this relation is not significant at all. The numerical relation between these outputs is just incidental. For the application, they are just 10 different labels. They are categorical output.

To work with this, we need to map the labels into 10 different independent arrays of 1's and 0's. We need to map each label to an array of 10 binary numbers - 0 for all and 1 for the specific value. For example, 1 will be mapped to [0,1,0,0,0,0,0,0,0,0]; 8 will be mapped to [0,0,0,0,0,0,0,0,0,1,0] and so on. That is called "Categorical" output.

test_labels = tf.keras.utils.to_categorical(test_labels)
        train_labels = tf.keras.utils.to_categorical(train_labels)

Another very important task is to normalize the data. The activation functions - relu or sigmoid or tanh.. all of them work optimally when the numbers are less than 1. This an important step in any neural network. Missing this step has a very bad impact on the model efficiency.

train_images = train_images / 255.0
        test_images = test_images / 255.0

This is very simple but important step in training a neural network.

Data Augmentation

The resources we have are always less than what we need. Data is not an exception. To achieve better and better outcomes, we need a lot more than what we have. And we can generate more data from what is available to us - using our knowledge about the data. For example, we know that the number does not change if the entire image is shifted a pixel on either side.

Thus, each image in the input set can generate four more images by shifting by one pixel on each side. The images appear almost unchanged to our eye. But for the neural network model, it is a new input set. This simple information can give us 5 times as much data. Let's do that.

A = np.delete(train_images, np.s_[:28:], 1)
A = np.insert(A, [A.shape[1]] * 28, [0]*28, 1)
augmented_images = np.append(train_images, A, axis=0)
augmented_labels = np.append(train_labels, train_labels, axis=0)
A = np.delete(train_images, np.s_[-28:], 1)
A = np.insert(A, [0] * 28, [0] * 28, 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)
A = np.delete(train_images, np.s_[-2:], 1)
A = np.insert(A, [0, 0], [0, 0], 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)
A = np.delete(train_images, np.s_[:2:], 1)
A = np.insert(A, [A.shape[1],A.shape[1]], [0, 0], 1)
augmented_images = np.append(augmented_images, A, axis=0)
augmented_labels = np.append(augmented_labels, train_labels, axis=0)

Don't worry if your are not able to understand the code above. Refer to the NumPy Blogs for details of working with NumPy arrays.

Essentially, this code just deletes the cells from one side of the image and inserts 0's on the opposite side. It does it from all four sides of the each image in the training data, and then appends it to the new array called augmented_images. Along with that, it also builds the augmented_lables array.

Now, we can use the augmented_images and augmented_labels instead of train_images and train_labels for training our model. But, wait a minute. If we think over this, the data is not random anymore. We have a huge chunk of data with images in the center followed by a huge chunk with images shifted in each direction. Such data does not create good models. We need to improve this by shuffling the data well.

But this is not so simple. Now, we have an array of images an an array of labels. We have to shuffle either of them. But, the correspondence should not be lost. After the shuffle, an image of 5 should point to the label 5!

NumPy does provide us an elegant way of doing this.

train_data = np.c_[augmented_images.reshape(len(augmented_images), -1), augmented_labels.reshape(len(augmented_labels), -1)]
augmented_images = train_data[:, :augmented_images.size//len(augmented_images)].reshape(augmented_images.shape)
augmented_labels = train_data[:, augmented_images.size//len(augmented_images):].reshape(augmented_labels.shape)

Essentially, we combine the two into a single entity and then shuffle it in a way that the images and labels move together.

Train the Model

With everything in place, we can now start off with training a model. We start with creating a Keras Sequential model. We can check out the details about different types of models in the following blogs:

model = tf.keras.Sequential()

Let us add three layers to this model. Keras allows us to add many layers. But for a problem like this, 3 layers should be good enough.

model.add(tf.keras.layers.Dense(50, activation=tf.nn.relu, input_shape=(784,)))
model.add(tf.keras.layers.Dense(25, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

The input shape of 784 is defined by the size of each entity in the input data. We have images of 784 each. So that is where we start. And the output size has to be 10 - because we have 10 possible outcomes. A typical network has a softmax activation at the last layer and relu in the inner hidden layers.

The size of each layer is just an estimate based on judgement. We can develop this judgement with practice and experience and understanding of how the Neural Networks work. You can try playing around with these to see the effect each has on the model efficiency.

Next, we compile and train the model with the data available.

model.compile(loss="categorical_crossentropy", optimizer=tf.train.AdamOptimizer(), metrics=['accuracy']), augmented_labels, epochs=20)

The parameters to the compiler - loss="categorical_crossentropy" and optimizer=tf.train.AdamOptimizer() may seem hazy. You can check up the blogs on Deep Learning to understand them better.

The method does the actual work of training the model with the data we have.

This generates an output:

Epoch 1/20
300000/300000 [==============================] - 18s 58us/step - loss: 0.2248 - acc: 0.9335
Epoch 2/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.1131 - acc: 0.9661
Epoch 3/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0926 - acc: 0.9722
Epoch 4/20
300000/300000 [==============================] - 18s 59us/step - loss: 0.0814 - acc: 0.9751
Epoch 5/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0737 - acc: 0.9775
Epoch 6/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0676 - acc: 0.9792
Epoch 7/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0636 - acc: 0.9804
Epoch 8/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0596 - acc: 0.9811
Epoch 9/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0573 - acc: 0.9821
Epoch 10/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0543 - acc: 0.9831
Epoch 11/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0518 - acc: 0.9837
Epoch 12/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0498 - acc: 0.9843
Epoch 13/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0482 - acc: 0.9848
Epoch 14/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0466 - acc: 0.9854
Epoch 15/20
300000/300000 [==============================] - 16s 55us/step - loss: 0.0454 - acc: 0.9855
Epoch 16/20
300000/300000 [==============================] - 17s 55us/step - loss: 0.0440 - acc: 0.9860
Epoch 17/20
300000/300000 [==============================] - 17s 55us/step - loss: 0.0427 - acc: 0.9864
Epoch 18/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0416 - acc: 0.9866
Epoch 19/20
300000/300000 [==============================] - 17s 57us/step - loss: 0.0397 - acc: 0.9872
Epoch 20/20
300000/300000 [==============================] - 17s 56us/step - loss: 0.0394 - acc: 0.9873

We can see the loss reducing and accuracy increasing with each iteration. Note that the outputs may not always match accurately - that is because of the random nature of the shuffle and training. But the trend should be similar.

Evaluate the Model

Now that we have a trained model, we need to evaluate how good it is. The first simple step is to check out using TensorFlow's own evaluation methods.

300000/300000 [==============================] - 7s 25us/step
[0.032485380109701076, 0.9895766666666667]

That is quite good. Considering the amount of data we had, this is a good accuracy. But, this test is not enough. Very high accuracy could mean overfitting as well. So we must check it with the test data.

10000/10000 [==============================] - 0s 28us/step
[0.062173529548419176, 0.9834]

One can notice that the accuracy is slightly less in the test data. This means slight overfitting. But that is not so bad. So we can live with it for now. In a real life example, depending upon the requirement, one could try to tweak the model shape and other hyperparameters to get better results.


That may not give us all the confidence we need. We can manually check a few samples to see how our model has performed.

First, create the prediction for all the test images.

prediction = model.predict(test_images)

Now, prediction is an array with output for all the images in the test set.

array([4.2578703e-14, 3.1028571e-13, 8.5658702e-10, 9.5759439e-08,
        1.0654272e-18, 2.6685609e-10, 7.1893772e-19, 9.9999988e-01,
        1.4853034e-10, 1.4370636e-08], dtype=float32)

We can see that each element in this set is an array of 10 - that shows the probability that the input image belongs to a given label. When we check for the zeroth element in the prediction set, we can see that the probability for all the elements is very low except for the element 7 - that is almost 1. Hence, for the image 0, one would predict 7.

Let us now see what the test image looks like. Before that, we need to reshape the images array (remember we had made a single dimensional array for building the model?

prediction_images = test_images.reshape(10000,28,28)

Now let us check the zeroth element


This is 7 indeed!

Having one correct answer is certainly not enough to verify the accuracy of the model. But, one point that we can check at this point is the values in the prediction[0] array. The value in element 7 is far more than the other values. That means, the model is absolutely certain about the outcome. There is no doubt or confusion. This is an important symptom of a good model.