0% found this document useful (0 votes)
61 views8 pages

Documentation Sagemaker

This document describes using AWS Sagemaker, S3, and Lambda to build a machine learning deployment pipeline that runs batch predictions on a regular basis. It details building a Docker container with prediction code, creating a Sagemaker model, and setting up a Lambda function to trigger batch transformations when new data is uploaded to S3.

Uploaded by

wpairo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views8 pages

Documentation Sagemaker

This document describes using AWS Sagemaker, S3, and Lambda to build a machine learning deployment pipeline that runs batch predictions on a regular basis. It details building a Docker container with prediction code, creating a Sagemaker model, and setting up a Lambda function to trigger batch transformations when new data is uploaded to S3.

Uploaded by

wpairo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning Model Deployment Using AWS Sagemaker Batch

Transformation Job

In this tutorial, a solution for deploying Machine Learning models is implemented using S3, Sagemaker and Lambda AWS
services. The model to be deployed is a Kmeans saved in a pickle file and the OS used were Linux.

The goal here is to have a solution that allows you to create a system in which inferences are made on a regular basis.

Pre-requisites

AWS Credentials and IAM Roles with the right permissions


Docker and Python installed on your local machine

Scheme

Fig. 1: Solution architecture

1. Building and pushing the Docker Image

To build a Docker image you need to write the Dockerfile and use a standard container folder structure in order to push this
image into AWS ECR. The structure must be as shown below:

Fig. 2: Container folder structure

This allows the container to know where to look for the installed programs.

In the Dockerfile , you can specify requirements and dependencies to be installed in the container (such as Python, NGINX, and
Scikit). Next, you need to have a line in the Dockerfile to copy the program folder to container’s WORKDIR, which is also defined
in the Dockerfile. See the Dockerfile below:

FROM ubuntu:16.04

MAINTAINER fealbuqu

# 1. Define the packages required in our environment.


RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python \
python3 \
nginx \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# 2. Define the folder (kmeans) where our inference code is located and set the working directory
COPY kmeans /opt/program
WORKDIR /opt/program

# 3. Here we define all python packages we want to include in our environment.


RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
pip install -r requirements.txt && \
rm -rf /root/.cache

# 4. Set some environment variables.


ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

Then the installed container executes programs included in the kmeans folder to start the server. On the file predictor.py is
where it is located the code for the predictions, more specifically inside the function transformation() as follows:

@app.route('/invocations', methods=['POST'])
def transformation():
"""Do an inference on a single batch of data. In this sample server, we take data as CSV, convert
it to a pandas data frame for internal use and then convert the predictions back to CSV (which really
just means one prediction per line, since there's a single column.
"""
# Get input JSON data and convert it to a DF
input_json = flask.request.get_json()
input_json = json.dumps(input_json)
input_df = pd.read_json(input_json)

print('Invoked with {} records'.format(input_df.shape[0]))

#Transforming the data


input_df.fecha=pd.to_datetime(input_df.fecha)
input_df.fecha=input_df.fecha.dt.date
fecha_predict=input_df.fecha.max()
predict_cross_selling=input_df[input_df.fecha==fecha_predict]

#Scalling
scaler = StandardScaler()
scaler.fit(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua','fecha']))])
cross_scale_predict=scaler.transform(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idus
cross_scale_predict=pd.DataFrame(cross_scale_predict)
cross_scale_predict.columns=list(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua',

#PCA
pca_predict=PCA(n_components=10)
pca_predict.fit(cross_scale_predict)
cross_scale_predict=pca_predict.transform(cross_scale_predict)
cross_scale_predict=pd.DataFrame(cross_scale_predict)

# Do the prediction
predictions = ScoringService.predict(cross_scale_predict)

# Transform predictions to JSON


result = {'output': []}
list_out = []
for label in predictions:
row_format = {'label': '{}'.format(label)}
list_out.append(row_format)
result['output'] = list_out
result = json.dumps(result)
return flask.Response(response=result, status=200, mimetype='application/json')

The next step is to build and push the Docker image to a AWS ECR repository. So the first thing is to create an repository.

Fig. 3: Create repository

Now let's build and push the image to the created repository. For this, configure the AWS CLI on your local machine so you
interact with your account programatically. Install AWS CLI using pip install awscli and then set the AWS Credentials with
aws configure on terminal.

With all setted up just click in your repository, then View push commands and follow the steps.

Fig. 4: Steps to build and push image

Note: the commands bellow must be run in a terminal inside the container folder.
1. Retrieve an authentication token and authenticate your Docker client to your registry:

aws ecr get-login-password --region us-east-1 |

docker login --username AWS --password-stdin <user>.dkr.ecr.<region>.amazonaws.com/<repository>

2. Build your Docker image:

docker build -t <image> .

3. After the build completes, tag your image so you can push the image to this repository:

docker tag <image>:latest <user>.dkr.ecr.<region>.amazonaws.com/<repository>:latest

4. Run the following command to push this image to your newly created AWS repository:

docker push <user>.dkr.ecr.<region>.amazonaws.com/<repository>:latest

At the end the image is already pushed to the repository.

2. Creating AWS Sagemaker's model

Now with the inference code and the image into a repository AWS ECR we can create a Sagemaker model that will use this
image.

For this, we need to compress the pickle file from the model with gzip. Use the following command to compress an entire
directory or a single file on Linux. It’ll also compress every other directory inside a directory you specify–in other words, it works
recursively.

tar -czvf name-of-archive.tar.gz /path/to/directory-or-file

Here’s what those switches actually mean:

-c: Create an archive


-z: Compress the archive with gzip
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in
these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.

The next steps now is to upload the .tar.gz file in S3, you can do that using AWS CLI or through the Console. With the S3 path
to the model.tar.gz and the ECR path to the image you can create a Sagemaker model, as shown below.
Fig. 5: Create Sagemaker model

Fig. 6: Name it, set the permissions and the right paths then create

Now you should have a model setted up, with this model you can create an Endpoit or a Batch Transformation Job for the
inferences.

3. Creating Lambda function to start a Batch Transformation Job

In order to make the scheduled inferences, we can think that a cron job regularly loads a .json file into a specific S3 path,
containing the samples to be predicted with the model created before.

We can then create a Lambda function, triggered when the input data is uploaded, then run a script to get this input data from S3
and start a Batch Transformation Job in Sagemaker, where the predicted values will be stored in a specificied folder in S3
So let's start creating a Lambda function.

Fig. 7: Create lambda function

Our code is written in python, thus select python 3.8 for the execution, name it and choose/create the IAM role with the
permissions to access the files in S3 and start a Batch Transformation Job in Sagemaker.

Fig. 8: Lambda's function configuration

The inference code shown before receives as input a json, consequently the function must be triggered when an json object is
created in S3.

Fig. 9: Add the trigger


To start a Batch Transformation Job we can use the python module boto3 , where we instance a Sagemaker client object that
uses the path from the uploaded file, a specified output path and other keyword args to start the transformation job. See the code
below:

import json
import boto3
from datetime import datetime

def find_indices(lst, condition):


return [i for i, elem in enumerate(lst) if condition(elem)]

def lambda_handler(event, context):

for record in event['Records']:


bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
tmp = key.split('/')
out_key = '/'.join(tmp[:find_indices(tmp, lambda e: e == 'input')[0]])
sm = boto3.client('sagemaker')

stringNow = datetime.now().strftime("%d%m%Y-%H%M%S")
data_path = "s3://{}/{}".format(bucket, key)
output_path="s3://{}/{}output/{}".format(bucket, out_key, stringNow)
print(output_path)
response = sm.create_transform_job(
TransformJobName='crossSellKmeans-'+stringNow,
ModelName='crossSellKmeans',
MaxConcurrentTransforms=1,
MaxPayloadInMB=100,
BatchStrategy='MultiRecord',
TransformInput={
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': data_path
}
},
'SplitType': 'Line',
'ContentType': 'application/json'
},
TransformOutput={
'S3OutputPath': output_path,
'Accept':'application/json'
},
TransformResources={
'InstanceType': 'ml.m4.xlarge',
'InstanceCount': 1
}
)
r = {
'status': 200,
'body': response
}
return r

Just save the Lambda function with the script above, setting the paths and desired keyword args (e.g. InstanceType) for the job.
Fig. 10: Save the changes made

After all this we can upload a .json file to trigger our function and see in Sagemaker Batch Transformation Jobs if the job
started and/or completed succesfully.

Fig. 11: Batch transformation jobs

References

https://towardsdatascience.com/brewing-up-custom-ml-models-on-aws-sagemaker-e09b64627722
https://github.com/leongn/model_to_api
https://aws.amazon.com/pt/blogs/machine-learning/train-and-host-scikit-learn-models-in-amazon-sagemaker-by-building-a-
scikit-docker-container/
https://github.com/aws-samples/serverless-ai-workshop/tree/master/Lab%202%20-%20SageMaker%20Batch%20Transform
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own/

You might also like