0% found this document useful (0 votes)

61 views8 pages

Documentation Sagemaker

This document describes using AWS Sagemaker, S3, and Lambda to build a machine learning deployment pipeline that runs batch predictions on a regular basis. It details building a Docker container with prediction code, creating a Sagemaker model, and setting up a Lambda function to trigger batch transformations when new data is uploaded to S3.

Uploaded by

wpairo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views8 pages

Documentation Sagemaker

Uploaded by

wpairo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Machine Learning Model Deployment Using AWS Sagemaker Batch

Transformation Job

In this tutorial, a solution for deploying Machine Learning models is implemented using S3, Sagemaker and Lambda AWS
services. The model to be deployed is a Kmeans saved in a pickle file and the OS used were Linux.

The goal here is to have a solution that allows you to create a system in which inferences are made on a regular basis.

Pre-requisites

AWS Credentials and IAM Roles with the right permissions

Docker and Python installed on your local machine

Scheme

Fig. 1: Solution architecture

1. Building and pushing the Docker Image

To build a Docker image you need to write the Dockerfile and use a standard container folder structure in order to push this
image into AWS ECR. The structure must be as shown below:

Fig. 2: Container folder structure

This allows the container to know where to look for the installed programs.

In the Dockerfile , you can specify requirements and dependencies to be installed in the container (such as Python, NGINX, and
Scikit). Next, you need to have a line in the Dockerfile to copy the program folder to container’s WORKDIR, which is also defined
in the Dockerfile. See the Dockerfile below:

FROM ubuntu:16.04

MAINTAINER fealbuqu

# 1. Define the packages required in our environment.

RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python \
python3 \
nginx \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# 2. Define the folder (kmeans) where our inference code is located and set the working directory
COPY kmeans /opt/program
WORKDIR /opt/program

# 3. Here we define all python packages we want to include in our environment.

RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
pip install -r requirements.txt && \
rm -rf /root/.cache

# 4. Set some environment variables.

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

Then the installed container executes programs included in the kmeans folder to start the server. On the file predictor.py is
where it is located the code for the predictions, more specifically inside the function transformation() as follows:

@app.route('/invocations', methods=['POST'])
def transformation():
"""Do an inference on a single batch of data. In this sample server, we take data as CSV, convert
it to a pandas data frame for internal use and then convert the predictions back to CSV (which really
just means one prediction per line, since there's a single column.
"""
# Get input JSON data and convert it to a DF
input_json = flask.request.get_json()
input_json = json.dumps(input_json)
input_df = pd.read_json(input_json)

print('Invoked with {} records'.format(input_df.shape[0]))

#Transforming the data

input_df.fecha=pd.to_datetime(input_df.fecha)
input_df.fecha=input_df.fecha.dt.date
fecha_predict=input_df.fecha.max()
predict_cross_selling=input_df[input_df.fecha==fecha_predict]

#Scalling
scaler = StandardScaler()
scaler.fit(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua','fecha']))])
cross_scale_predict=scaler.transform(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idus
cross_scale_predict=pd.DataFrame(cross_scale_predict)
cross_scale_predict.columns=list(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua',

#PCA
pca_predict=PCA(n_components=10)
pca_predict.fit(cross_scale_predict)
cross_scale_predict=pca_predict.transform(cross_scale_predict)
cross_scale_predict=pd.DataFrame(cross_scale_predict)

# Do the prediction
predictions = ScoringService.predict(cross_scale_predict)

# Transform predictions to JSON

result = {'output': []}
list_out = []
for label in predictions:
row_format = {'label': '{}'.format(label)}
list_out.append(row_format)
result['output'] = list_out
result = json.dumps(result)
return flask.Response(response=result, status=200, mimetype='application/json')

The next step is to build and push the Docker image to a AWS ECR repository. So the first thing is to create an repository.

Fig. 3: Create repository

Now let's build and push the image to the created repository. For this, configure the AWS CLI on your local machine so you
interact with your account programatically. Install AWS CLI using pip install awscli and then set the AWS Credentials with
aws configure on terminal.

With all setted up just click in your repository, then View push commands and follow the steps.

Fig. 4: Steps to build and push image

Note: the commands bellow must be run in a terminal inside the container folder.
1. Retrieve an authentication token and authenticate your Docker client to your registry:

aws ecr get-login-password --region us-east-1 |

docker login --username AWS --password-stdin <user>.dkr.ecr.<region>.amazonaws.com/<repository>

2. Build your Docker image:

docker build -t <image> .

3. After the build completes, tag your image so you can push the image to this repository:

docker tag <image>:latest <user>.dkr.ecr.<region>.amazonaws.com/<repository>:latest

4. Run the following command to push this image to your newly created AWS repository:

docker push <user>.dkr.ecr.<region>.amazonaws.com/<repository>:latest

At the end the image is already pushed to the repository.

2. Creating AWS Sagemaker's model

Now with the inference code and the image into a repository AWS ECR we can create a Sagemaker model that will use this
image.

For this, we need to compress the pickle file from the model with gzip. Use the following command to compress an entire
directory or a single file on Linux. It’ll also compress every other directory inside a directory you specify–in other words, it works
recursively.

tar -czvf name-of-archive.tar.gz /path/to/directory-or-file

Here’s what those switches actually mean:

-c: Create an archive

-z: Compress the archive with gzip
-v: Display progress in the terminal while creating the archive, also known as “verbose” mode. The v is always optional in
these commands, but it’s helpful.
-f: Allows you to specify the filename of the archive.

The next steps now is to upload the .tar.gz file in S3, you can do that using AWS CLI or through the Console. With the S3 path
to the model.tar.gz and the ECR path to the image you can create a Sagemaker model, as shown below.
Fig. 5: Create Sagemaker model

Fig. 6: Name it, set the permissions and the right paths then create

Now you should have a model setted up, with this model you can create an Endpoit or a Batch Transformation Job for the
inferences.

3. Creating Lambda function to start a Batch Transformation Job

In order to make the scheduled inferences, we can think that a cron job regularly loads a .json file into a specific S3 path,
containing the samples to be predicted with the model created before.

We can then create a Lambda function, triggered when the input data is uploaded, then run a script to get this input data from S3
and start a Batch Transformation Job in Sagemaker, where the predicted values will be stored in a specificied folder in S3
So let's start creating a Lambda function.

Fig. 7: Create lambda function

Our code is written in python, thus select python 3.8 for the execution, name it and choose/create the IAM role with the
permissions to access the files in S3 and start a Batch Transformation Job in Sagemaker.

Fig. 8: Lambda's function configuration

The inference code shown before receives as input a json, consequently the function must be triggered when an json object is
created in S3.

Fig. 9: Add the trigger

To start a Batch Transformation Job we can use the python module boto3 , where we instance a Sagemaker client object that
uses the path from the uploaded file, a specified output path and other keyword args to start the transformation job. See the code
below:

import json
import boto3
from datetime import datetime

def find_indices(lst, condition):

return [i for i, elem in enumerate(lst) if condition(elem)]

def lambda_handler(event, context):

for record in event['Records']:

bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
tmp = key.split('/')
out_key = '/'.join(tmp[:find_indices(tmp, lambda e: e == 'input')[0]])
sm = boto3.client('sagemaker')

stringNow = datetime.now().strftime("%d%m%Y-%H%M%S")
data_path = "s3://{}/{}".format(bucket, key)
output_path="s3://{}/{}output/{}".format(bucket, out_key, stringNow)
print(output_path)
response = sm.create_transform_job(
TransformJobName='crossSellKmeans-'+stringNow,
ModelName='crossSellKmeans',
MaxConcurrentTransforms=1,
MaxPayloadInMB=100,
BatchStrategy='MultiRecord',
TransformInput={
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': data_path
}
},
'SplitType': 'Line',
'ContentType': 'application/json'
},
TransformOutput={
'S3OutputPath': output_path,
'Accept':'application/json'
},
TransformResources={
'InstanceType': 'ml.m4.xlarge',
'InstanceCount': 1
}
)
r = {
'status': 200,
'body': response
}
return r

Just save the Lambda function with the script above, setting the paths and desired keyword args (e.g. InstanceType) for the job.
Fig. 10: Save the changes made

After all this we can upload a .json file to trigger our function and see in Sagemaker Batch Transformation Jobs if the job
started and/or completed succesfully.

Fig. 11: Batch transformation jobs

References

https://towardsdatascience.com/brewing-up-custom-ml-models-on-aws-sagemaker-e09b64627722
https://github.com/leongn/model_to_api
https://aws.amazon.com/pt/blogs/machine-learning/train-and-host-scikit-learn-models-in-amazon-sagemaker-by-building-a-
scikit-docker-container/
https://github.com/aws-samples/serverless-ai-workshop/tree/master/Lab%202%20-%20SageMaker%20Batch%20Transform
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own/

Amazon SageMaker
No ratings yet
Amazon SageMaker
1,055 pages
PDF Handout Implement MLOps Practices With Amazon SageMaker
No ratings yet
PDF Handout Implement MLOps Practices With Amazon SageMaker
24 pages
AWS Training Notes - Summary
No ratings yet
AWS Training Notes - Summary
131 pages
B Tech Major Project Report Final
No ratings yet
B Tech Major Project Report Final
56 pages
Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project
No ratings yet
Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project
36 pages
Introduction To AWS SageMaker
100% (1)
Introduction To AWS SageMaker
52 pages
Sagemaker-V1 18 0
No ratings yet
Sagemaker-V1 18 0
164 pages
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
No ratings yet
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
234 pages
Aws Sagemaker
No ratings yet
Aws Sagemaker
18 pages
AWS Machine Learning Engineer Nanodegree Program Syllabus
No ratings yet
AWS Machine Learning Engineer Nanodegree Program Syllabus
16 pages
1 - Optimize Amazon SageMaker Deployment Strategies
No ratings yet
1 - Optimize Amazon SageMaker Deployment Strategies
45 pages
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
No ratings yet
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
11 pages
AI Practitioner Study Guide
No ratings yet
AI Practitioner Study Guide
15 pages
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
No ratings yet
Lab1-01 - Amazon Sagemaker Data Wrangling and Features Storel
47 pages
B Tech Major Project Report Final B Tech Major Project Report Final
No ratings yet
B Tech Major Project Report Final B Tech Major Project Report Final
56 pages
Sagemaker Pyspark
No ratings yet
Sagemaker Pyspark
49 pages
Unit-4 Containers and Docker
No ratings yet
Unit-4 Containers and Docker
44 pages
Operationalizing The Model
No ratings yet
Operationalizing The Model
46 pages
REPEAT_1_Starting_the_enterprise_ML_journey,_featuring_ProSiebenSat.1_Media_SE_AIM205-R1
No ratings yet
REPEAT_1_Starting_the_enterprise_ML_journey,_featuring_ProSiebenSat.1_Media_SE_AIM205-R1
62 pages
Cloud Computing: Project Work
No ratings yet
Cloud Computing: Project Work
34 pages
Unit-3 Packaging ML Model
No ratings yet
Unit-3 Packaging ML Model
39 pages
AWS ML Notes -Domain 3 - Deployment
No ratings yet
AWS ML Notes -Domain 3 - Deployment
30 pages
AWS SageMaker Custom Algorithms and Frameworks
No ratings yet
AWS SageMaker Custom Algorithms and Frameworks
19 pages
Pytorch Extending Our Containers
No ratings yet
Pytorch Extending Our Containers
13 pages
Feature Store
No ratings yet
Feature Store
19 pages
Lab2 - Train, Tune and Deploy XGBoost
No ratings yet
Lab2 - Train, Tune and Deploy XGBoost
13 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
RevolutionizingHealthcarewithMachineLearning06143a1d15927607
No ratings yet
RevolutionizingHealthcarewithMachineLearning06143a1d15927607
19 pages
High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
No ratings yet
High-Performance & Cost-Effective Model Deployment With Amazon SageMaker
21 pages
Week 9-Module 10 Build and Deploy ML Models
No ratings yet
Week 9-Module 10 Build and Deploy ML Models
27 pages
Build, Train, and Deploy Machine Learning Models On Aws With Amazon Sagemaker
No ratings yet
Build, Train, and Deploy Machine Learning Models On Aws With Amazon Sagemaker
21 pages
Mamindla Sathvika - Lab10
No ratings yet
Mamindla Sathvika - Lab10
10 pages
Lab1-02 - Numoy and Pandas
No ratings yet
Lab1-02 - Numoy and Pandas
7 pages
AWS Step Functions Overview
No ratings yet
AWS Step Functions Overview
7 pages
Integrating Machine Learning Into Web Applications With Flask
No ratings yet
Integrating Machine Learning Into Web Applications With Flask
7 pages
AWS ML Notes -Domain Misc
No ratings yet
AWS ML Notes -Domain Misc
15 pages
Deploy A Machine Learning Model As An API On AWS, Step by Step
No ratings yet
Deploy A Machine Learning Model As An API On AWS, Step by Step
12 pages
Technical Report Diffusion Platform
No ratings yet
Technical Report Diffusion Platform
4 pages
Production Data Processing With Apache Spark
No ratings yet
Production Data Processing With Apache Spark
7 pages
Arnav MLOPSLab03
No ratings yet
Arnav MLOPSLab03
5 pages
assignment4 supritha
No ratings yet
assignment4 supritha
9 pages
AWS ML Exam Notes - Important
No ratings yet
AWS ML Exam Notes - Important
20 pages
Useful Commands
No ratings yet
Useful Commands
3 pages
AI-Powered DevOps
No ratings yet
AI-Powered DevOps
7 pages
Amazon SageMaker Guide - FAQs
No ratings yet
Amazon SageMaker Guide - FAQs
9 pages
NTFX Price Prediction
No ratings yet
NTFX Price Prediction
5 pages
CV Suresh Iyer
No ratings yet
CV Suresh Iyer
4 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
Aws Exp 11
No ratings yet
Aws Exp 11
6 pages
ML Ops notes
No ratings yet
ML Ops notes
5 pages
chatGPT Deployment Pipeline Setup With Sagemaker and Lambda
No ratings yet
chatGPT Deployment Pipeline Setup With Sagemaker and Lambda
2 pages
How To Deploy Machine Learning Models in Production As APIs
No ratings yet
How To Deploy Machine Learning Models in Production As APIs
2 pages
ass4
No ratings yet
ass4
4 pages
??????? ???????? ???????? ??????????
No ratings yet
??????? ???????? ???????? ??????????
6 pages
ccd viva
No ratings yet
ccd viva
6 pages
AWS_SageMaker_with_Python
No ratings yet
AWS_SageMaker_with_Python
6 pages
DesineDataStruectres
No ratings yet
DesineDataStruectres
3 pages
SageMaker+Custom+Workshop+for+Rehrig+Pacific
No ratings yet
SageMaker+Custom+Workshop+for+Rehrig+Pacific
9 pages
Deploy Algo
No ratings yet
Deploy Algo
1 page