AWS SageMaker


SageMaker is one of the fundamental offerings for AWS that helps us through all stages in the machine learning pipeline - build, train, tune and deploy. It provides us with simple Jupyter Notebook UI that can be used to script basic Python code. It can build models trained by data dumped into the S3 buckets, or a streaming data source like Kinesis shards. Once models are trained, SageMaker allows us to deploy them into production without any effort.

Setup


Let us now try out an example - that can show us the real power of SageMaker. We will work on the good old MNINST data set; going through all the phases of development on the SageMaker - right from training the model using the downloaded data - up to deploying a Lambda function that can leverage the predictor endpoint developed by SageMaker.

S3 Bucket


The SageMaker uses an S3 bucket to dump its model as it works. It is also convenient to dump the data into an S3 bucket as we train the model. Hence, the first step is to choose an S3 bucket that we use for the purpose. I have a bucket dedicated to SageMaker. I call it sagemaker-dumps. You can choose a name that suites your purpose.

This bucket need not have any public access as we work on it from within SageMaker. The life-cycle rule can move the files into IA or Glacier - because we rarely use the training data after the model is finalized.

Notebook Instance


After finalizing the S3 bucket, let's get back to SageMaker. The next step is to provision and create a Notebook instance. Let us start with that.

Open the SageMaker service from the AWS console and navigate to the Notebook Instances. Click on the button "Create Notebook Instance" - to get started. It will prompt for Notebook Instance Settings. Give it a good name and choose the ml.m2.instance. That is the lowest and will serve the our purpose for now. We can choose higher ones when working on "real" models. Don't worry about the cost. AWS promises us pay per use pricing.

Leave the rest on the default and move forward.

Next, we need to create an IAM role for our Notebook. Remember that our data and models need to live in an S3 bucket that we just created. So the Notebook should have an access to that bucket. We do not need any other particular privilege for the Notebook.

Now, we can go ahead and "Create". AWS takes some time to get the Notebook ready. We can see on the console that the Notebook Instance is "Pending"

Let it work its way, as AWS provisions the underlying instance and prepares it for our purpose. Once it is ready, we can see the console says the Notebook instance is "InService"

JupyterLab


Once we are ready with a Notebook instance, we can move further to create a JupyterLab. On the Notebook instance console, we can see two links in the Actions column - Open Jupyter and Open JupyterLab. Anyone working on AI certainly knows what is a Jupyter Notebook. JupyterLab is just that, with some AWS flavor on it. So, lets click on JupyterLab.

That opens a pop-up like this:

It provides several ready to use templates, and also allows us to create our own. At this time, we would like to build our own template. So let us take that way forward. Click on File -> New -> Notebook

It will prompt you to choose a Kernel for the Jupyter Notebook that opens. I love Anaconda Python 3. So I choose that. It should be good enough for the purpose of this example. The curious can try playing around with the other Kernels.

With this, the familiar Jupyter Notebook shows up. I do not like the "Untitled" Notebooks. So I rename it to something more meaningful.

Prepare the Data


With the framework in place, we can now go ahead with building the model. The first step in any Python code is ofcourse, importing the relevant modules.

# The Usual Python Modules 
import os
import re
import copy
import time
import struct
import io
import csv
from time import gmtime, strftime

import pickle, gzip, urllib.request, json
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

# AWS Python SDK modules
import boto3
import sagemaker
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker import get_execution_role

Next, we can create a few placeholder variables that make our life simpler.

role = get_execution_role()
region = boto3.Session().region_name

bucket='sagemaker-dumps' # Put your s3 bucket name here
prefix = 'sagemaker/learn-mnist2' # Used as part of the path in the bucket where you store data
# customize to your bucket where you will store data
bucket_path = 'https://s3-{}.amazonaws.com/{}'.format(region,bucket)

Now that the ground work is ready, we can download our the data we use to build our models. We download the data from the "deeplearning.net". Once we have tar.gz file, we extract and split the data into our train, test and validation sets.

# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Let us now have a look at the the data we have gathered.

print(train_set[0].shape)
print(train_set[1].shape)

This gives us an output

(50000, 784)
(50000, )

We have 50,000 records in the train set. Each of these is an image of 28x28. And each of these images maps to a value that is a number represented by the image.

plt.rcParams["figure.figsize"] = (2,10)

img = train_set[0][0]
label = train_set[1][0]
img_reshape = img.reshape((28,28))
imgplot = plt.imshow(img_reshape, cmap='gray')
print('This is a {}'.format(label))
plt.show()

This shows the output:

This is a 5

Wow! We have the data to train the model. But this data is not good for our job. We need to play around a bit with the data structures - to make this ready for training our model.

Having done that, we dump it into our S3 bucket.

data_partitions = [('train', train_set), ('validation', valid_set), ('test', test_set)]

for data_partition_name, data_partition in data_partitions:
    labels = [t.tolist() for t in data_partition[1]]
    features = [t.tolist() for t in data_partition[0]]
      
    if data_partition_name != 'test':
        examples = np.insert(features, 0, labels, axis=1)
    else:
        examples = features

    np.savetxt('data.csv', examples, delimiter=',')
        
    key = "{}/{}/examples".format(prefix,data_partition_name)
    url = 's3n://{}/{}'.format(bucket, key)
    boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_file('data.csv')

Build the Model


Now that the data is ready, let us proceed with building and deploying the model that we can use for our job.

In the code above, we dumped our data into the S3 bucket. We can use a reference to that data to feed into our mode. So let us start with creating the data channels that SageMaker will use for the purpose.

train_data = 's3://{}/{}/{}'.format(bucket, prefix, 'train')
validation_data = 's3://{}/{}/{}'.format(bucket, prefix, 'validation')
s3_output_location = 's3://{}/{}/{}'.format(bucket, prefix, 'learn_model_sdk')

train_channel = sagemaker.session.s3_input(train_data, content_type='text/csv')
valid_channel = sagemaker.session.s3_input(validation_data, content_type='text/csv')

data_channels = {'train': train_channel, 'validation': valid_channel}

Note that the "data_channels" holds reference to all the data we need for training our model.

AWS helps us with a channel that connects SageMaker with the S3. That makes our job pretty simple. We do not have to burden the instance RAM for it.

SageMaker also helps us with building the common training algorithms. These come predeployed on the Docker containers in the ECS. SageMaker helps us with a direct link to these containers. All we need is an API to get one for ourselves!

So, all we do is: get a container, instantiate a model, and train it.

container = get_image_uri(boto3.Session().region_name, 'xgboost')

xgb_model = sagemaker.estimator.Estimator(container,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.m4.10xlarge',
                                         train_volume_size = 5,
                                         output_path=s3_output_location,
                                         sagemaker_session=sagemaker.Session())

xgb_model.set_hyperparameters(max_depth = 5,
                              eta = .2,
                              gamma = 4,
                              min_child_weight = 6,
                              silent = 0,
                              objective = "multi:softmax",
                              num_class = 10,
                              num_round = 10)

xgb_model.fit(inputs=data_channels,  logs=True)

The final xgb_model.fit() generates the training logs as the model is built over iterations. And finally the estimator is ready for us.

Deploy Model


Deploying the model is as easy as the rest of the process. All we do is invoke one command within the Notebook.

xgb_predictor = xgb_model.deploy(initial_instance_count=1,
                                instance_type='ml.m4.xlarge',
                                )

This deploys the model and generates the Endpoints that can be invoked from our code.

We can navigate to the SageMaker - Models console. There, we can see our model ready for us.

We can also check the SageMaker - Endpoint console. Here, we can see the Endpoint waiting for us.

Lambda Function


The rest is trivial. Once we have the model and endpoint deployed, we can create a Lambda Function that could be invoked via the API Gateway. The Lambda Function could be as simple as this:

import os
import boto3
import json

# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    
    data = json.loads(json.dumps(event))
    payload = data['data']
    print(payload)
    
    response = runtime_client.invoke_endpoint(EndpointName = ENDPOINT_NAME,
                                              ContentType = 'text/csv',
                                              Body = payload)

    result = response['Body'].read().decode('ascii')
  
    return 'Predicted label is {}.'.format(result)

This is feasible in our case because the images we used are small - 784 pixels. It may not be healthy to send out hefty images as parameters to lambda functions. In such a case, we can create a mini architecture where we dump the images into an S3 bucket - that triggers the Lambda function to decode it and then trigger an SNS notification back to the invoking Amplify.