Introduction to PyTorch

PyTorch is an open source machine learning library used for developing and training neural network based deep learning models. It is primarily developed by Facebook's AI research group. PyTorch can be used with Python as well as a C++. Naturally, the Python interface is more polished. Pytorch (backed by biggies like Facebook, Microsoft, SalesForce, Uber) is immensely popular in research labs. Yet it is not so popular on production servers - that are ruled by fromeworks like TensorFlow (Backed by Google). But Pytorch is picking up fast.

Unlike most other popular deep learning frameworks like TensorFlow, which use static computation graphs, PyTorch uses dynamic computation, which allows greater flexibility in building complex architectures. Pytorch uses core Python concepts like classes, structures and conditional loops - that are a lot familiar to our eyes, hence a lot more intuitive to understand. This makes it a lot simpler than other frameworks like TensorFlow that bring in their own programming style.

To get a feel of this library, let us have a look at a basic implementation with PyTorch.

Install PyTorch

PyTorch does not yet have a good version for Windows. In fact, if you are really interested AI you better grow out of your Windows. Jump over to SageMaker. And if you don't want to spend a penny, try out Google Colab.

But if you must, you can install PyTorch on your Anaconda using

conda install pytorch-cpu -c pytorch

Now, let us start off with building a neural network. Let us pick up our good old MNIST data set.

Import the Modules

The first step is ofcourse to import the relevant libraries.

import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

Gather the Data

Ofcourse, the first step in the process is to gather the data required to train the model. Here, we will use the MNIST dataset packed into the PyTorch.

train_loader =
    datasets.MNIST('../data', train=True, download=True,
                       transforms.Normalize((0.1307,), (0.3081,))
    batch_size=200, shuffle=True, **kwargs)

test_loader =
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.Normalize((0.1307,), (0.3081,))
    batch_size=100, shuffle=True)

These two lines are code are packed with functionality. Essentially, it picks the dataset from the torchvision.dataset module. This data is normalized and then loaded into a tensor. Based on this, it is packed into data loader.

Build the Network

Having done this, we start off with the real code. As mentioned before, PyTorch uses the basic, familiar programming paradigms rather than inventing its own. A neural network in PyTorch is an object. It is an instance of a class that defines this network - and inherits from the torch.nn.Module

The torch.nn.Module provides for the core functionality required for developing and training, testing and deploying a neural network. When we subclass it, we typically override two of these methods. The framework takes care of the rest.

Here is what I mean.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

As we can see, the two methods above give us all that we need to define a neural network. This network has four layers (two hidden). The first two are convolution and the next two are linear, fully connected layers. The activation function for the first three layers is Relu and the last one is SoftMax.

The constructor builds the network. The forward() method defines how it moves forward.

Train the Network

Now that the model is ready, we have to work on training the model with the data available to us. This is done by a method train()

def train(model, device, train_loader, optimizer, epoch):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target =,
        output = model(data)
        loss = F.nll_loss(output, target)
        if batch_idx % 50 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

The parameters is receives are the model (the network model we instantiate), device (the kind of device gpu/cpu that is running the load), train_loader (for the training data), the optimizer (used to train the model) along with the epoch number (used only to display in the logs).

The first line in the method above, invokes the model.train() - a method inherited from the nn.Module. This initiates the training process. Next, we loop over the over the training data batches we extract out of the loader.

We then initialize the optimizer. The next line takes care of the forward propagation - we calculate the output for the input data, based on the current model.

The next two lines take care of the backward propagation. We calculate the loss - by comparing the output and the target. And then, we update the model based on this loss. We do this for the the entire data set.

Test the Network

Similarly, we have a test method that verifies the performance of the network based on the given test data set.

def test(model, device, test_loader):
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target =,
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Essentially this method just loops over the entire test data, to find out the loss over all. It counts the number of correct and incorrect predictions and the prints a formatted log.

Put it Together

With the skeleton in place, we have to start with stitching these pieces into an application that can build, train and validate the neural network model

We start with seeding the module.


Ofcourse, PyTorch (or any ML library), is meant for GPU's, but it can be trimmed down for a CPU's as well. To indicate the mode of execution, we need to instantiate the device

device = torch.device("cpu")

The next step is to create an instance of the Neural Network model we defined above.

model = Net().to(device)

The next step is to create an optimizer instance. For this example, we will use a stochastic gradient descent for training this model. We use the learning rate of 0.01 and momentum of 0.9

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

With the stage set, all that remains is to train the network with the data we have.

Just as we learnt in the text books, this is a for loop that runs the forward and backward propagation over and over. At each epoch, we test the model that we have generated.

for epoch in range(1, 10 + 1):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

This is certainly a lot more intuitive compared to the TensorFlow. It has a minor performance cost, but researchers prefer the clarity that PyTorch offers.

When we run the above code, we get an output similar to this:

Train Epoch: 1 [0/60000 (0%)]	Loss: 2.307263
Train Epoch: 1 [10000/60000 (17%)]	Loss: 0.333380
Train Epoch: 1 [20000/60000 (33%)]	Loss: 0.251793
Train Epoch: 1 [30000/60000 (50%)]	Loss: 0.114570
Train Epoch: 1 [40000/60000 (67%)]	Loss: 0.098489
Train Epoch: 1 [50000/60000 (83%)]	Loss: 0.119806

Test set: Average loss: 0.0820, Accuracy: 9741/10000 (97%)

Train Epoch: 2 [0/60000 (0%)]	Loss: 0.097875
Train Epoch: 2 [10000/60000 (17%)]	Loss: 0.072659
Train Epoch: 2 [20000/60000 (33%)]	Loss: 0.047402
Train Epoch: 2 [30000/60000 (50%)]	Loss: 0.072867
Train Epoch: 2 [40000/60000 (67%)]	Loss: 0.049930
Train Epoch: 2 [50000/60000 (83%)]	Loss: 0.089879

Test set: Average loss: 0.0460, Accuracy: 9844/10000 (98%)

Train Epoch: 3 [0/60000 (0%)]	Loss: 0.035898
Train Epoch: 3 [10000/60000 (17%)]	Loss: 0.061759
Train Epoch: 3 [20000/60000 (33%)]	Loss: 0.073666
Train Epoch: 3 [30000/60000 (50%)]	Loss: 0.053371
Train Epoch: 3 [40000/60000 (67%)]	Loss: 0.093279
Train Epoch: 3 [50000/60000 (83%)]	Loss: 0.086022

Test set: Average loss: 0.0457, Accuracy: 9845/10000 (98%)

Train Epoch: 4 [0/60000 (0%)]	Loss: 0.008744
Train Epoch: 4 [10000/60000 (17%)]	Loss: 0.032438
Train Epoch: 4 [20000/60000 (33%)]	Loss: 0.015062
Train Epoch: 4 [30000/60000 (50%)]	Loss: 0.016227
Train Epoch: 4 [40000/60000 (67%)]	Loss: 0.005343
Train Epoch: 4 [50000/60000 (83%)]	Loss: 0.077479

Test set: Average loss: 0.0460, Accuracy: 9853/10000 (99%)

Train Epoch: 5 [0/60000 (0%)]	Loss: 0.018869
Train Epoch: 5 [10000/60000 (17%)]	Loss: 0.030547
Train Epoch: 5 [20000/60000 (33%)]	Loss: 0.023935
Train Epoch: 5 [30000/60000 (50%)]	Loss: 0.049631
Train Epoch: 5 [40000/60000 (67%)]	Loss: 0.043962
Train Epoch: 5 [50000/60000 (83%)]	Loss: 0.062427

Test set: Average loss: 0.0315, Accuracy: 9894/10000 (99%)

Train Epoch: 6 [0/60000 (0%)]	Loss: 0.065866
Train Epoch: 6 [10000/60000 (17%)]	Loss: 0.006404
Train Epoch: 6 [20000/60000 (33%)]	Loss: 0.018278
Train Epoch: 6 [30000/60000 (50%)]	Loss: 0.004236
Train Epoch: 6 [40000/60000 (67%)]	Loss: 0.007372
Train Epoch: 6 [50000/60000 (83%)]	Loss: 0.036266

Test set: Average loss: 0.0320, Accuracy: 9896/10000 (99%)

Train Epoch: 7 [0/60000 (0%)]	Loss: 0.007709
Train Epoch: 7 [10000/60000 (17%)]	Loss: 0.026968
Train Epoch: 7 [20000/60000 (33%)]	Loss: 0.033189
Train Epoch: 7 [30000/60000 (50%)]	Loss: 0.014810
Train Epoch: 7 [40000/60000 (67%)]	Loss: 0.026795
Train Epoch: 7 [50000/60000 (83%)]	Loss: 0.069040

Test set: Average loss: 0.0289, Accuracy: 9906/10000 (99%)

Train Epoch: 8 [0/60000 (0%)]	Loss: 0.005131
Train Epoch: 8 [10000/60000 (17%)]	Loss: 0.016827
Train Epoch: 8 [20000/60000 (33%)]	Loss: 0.016882
Train Epoch: 8 [30000/60000 (50%)]	Loss: 0.004744
Train Epoch: 8 [40000/60000 (67%)]	Loss: 0.010695
Train Epoch: 8 [50000/60000 (83%)]	Loss: 0.007329

Test set: Average loss: 0.0288, Accuracy: 9907/10000 (99%)

Train Epoch: 9 [0/60000 (0%)]	Loss: 0.023235
Train Epoch: 9 [10000/60000 (17%)]	Loss: 0.012297
Train Epoch: 9 [20000/60000 (33%)]	Loss: 0.020852
Train Epoch: 9 [30000/60000 (50%)]	Loss: 0.007066
Train Epoch: 9 [40000/60000 (67%)]	Loss: 0.005871
Train Epoch: 9 [50000/60000 (83%)]	Loss: 0.014924

Test set: Average loss: 0.0264, Accuracy: 9915/10000 (99%)

Train Epoch: 10 [0/60000 (0%)]	Loss: 0.028316
Train Epoch: 10 [10000/60000 (17%)]	Loss: 0.012635
Train Epoch: 10 [20000/60000 (33%)]	Loss: 0.005901
Train Epoch: 10 [30000/60000 (50%)]	Loss: 0.014768
Train Epoch: 10 [40000/60000 (67%)]	Loss: 0.012478
Train Epoch: 10 [50000/60000 (83%)]	Loss: 0.007248

Test set: Average loss: 0.0299, Accuracy: 9900/10000 (99%)

We can see the model training with the data passed in. We can also note that the model improved consistently up to the 9th step and then it saturated - perhaps overfitting the training set. We can tweak around the hyperparameters, to improve this model. But I guess 99% should be good for our purpose. We could use "early stopping" to pick up the model at step 9 itself.

Finally, we save the trained model to be deployed on the field.,"")

Ofcourse, this was just a glimpse. PyTorch is loaded with functionality for different kinds of models and networks and different kinds of use cases. Ton's of books and blogs and videos that can give you more and more detail about it.

Their own website is quite good for learning the subject.