Documentation Sagemaker
Documentation Sagemaker
Transformation Job
In this tutorial, a solution for deploying Machine Learning models is implemented using S3, Sagemaker and Lambda AWS
services. The model to be deployed is a Kmeans saved in a pickle file and the OS used were Linux.
The goal here is to have a solution that allows you to create a system in which inferences are made on a regular basis.
Pre-requisites
Scheme
To build a Docker image you need to write the Dockerfile and use a standard container folder structure in order to push this
image into AWS ECR. The structure must be as shown below:
This allows the container to know where to look for the installed programs.
In the Dockerfile , you can specify requirements and dependencies to be installed in the container (such as Python, NGINX, and
Scikit). Next, you need to have a line in the Dockerfile to copy the program folder to container’s WORKDIR, which is also defined
in the Dockerfile. See the Dockerfile below:
FROM ubuntu:16.04
MAINTAINER fealbuqu
# 2. Define the folder (kmeans) where our inference code is located and set the working directory
COPY kmeans /opt/program
WORKDIR /opt/program
Then the installed container executes programs included in the kmeans folder to start the server. On the file predictor.py is
where it is located the code for the predictions, more specifically inside the function transformation() as follows:
@app.route('/invocations', methods=['POST'])
def transformation():
"""Do an inference on a single batch of data. In this sample server, we take data as CSV, convert
it to a pandas data frame for internal use and then convert the predictions back to CSV (which really
just means one prediction per line, since there's a single column.
"""
# Get input JSON data and convert it to a DF
input_json = flask.request.get_json()
input_json = json.dumps(input_json)
input_df = pd.read_json(input_json)
#Scalling
scaler = StandardScaler()
scaler.fit(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua','fecha']))])
cross_scale_predict=scaler.transform(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idus
cross_scale_predict=pd.DataFrame(cross_scale_predict)
cross_scale_predict.columns=list(predict_cross_selling.loc[:,(~predict_cross_selling.columns.isin(['tracab_idusua',
#PCA
pca_predict=PCA(n_components=10)
pca_predict.fit(cross_scale_predict)
cross_scale_predict=pca_predict.transform(cross_scale_predict)
cross_scale_predict=pd.DataFrame(cross_scale_predict)
# Do the prediction
predictions = ScoringService.predict(cross_scale_predict)
The next step is to build and push the Docker image to a AWS ECR repository. So the first thing is to create an repository.
Now let's build and push the image to the created repository. For this, configure the AWS CLI on your local machine so you
interact with your account programatically. Install AWS CLI using pip install awscli and then set the AWS Credentials with
aws configure on terminal.
With all setted up just click in your repository, then View push commands and follow the steps.
Note: the commands bellow must be run in a terminal inside the container folder.
1. Retrieve an authentication token and authenticate your Docker client to your registry:
3. After the build completes, tag your image so you can push the image to this repository:
4. Run the following command to push this image to your newly created AWS repository:
Now with the inference code and the image into a repository AWS ECR we can create a Sagemaker model that will use this
image.
For this, we need to compress the pickle file from the model with gzip. Use the following command to compress an entire
directory or a single file on Linux. It’ll also compress every other directory inside a directory you specify–in other words, it works
recursively.
The next steps now is to upload the .tar.gz file in S3, you can do that using AWS CLI or through the Console. With the S3 path
to the model.tar.gz and the ECR path to the image you can create a Sagemaker model, as shown below.
Fig. 5: Create Sagemaker model
Fig. 6: Name it, set the permissions and the right paths then create
Now you should have a model setted up, with this model you can create an Endpoit or a Batch Transformation Job for the
inferences.
In order to make the scheduled inferences, we can think that a cron job regularly loads a .json file into a specific S3 path,
containing the samples to be predicted with the model created before.
We can then create a Lambda function, triggered when the input data is uploaded, then run a script to get this input data from S3
and start a Batch Transformation Job in Sagemaker, where the predicted values will be stored in a specificied folder in S3
So let's start creating a Lambda function.
Our code is written in python, thus select python 3.8 for the execution, name it and choose/create the IAM role with the
permissions to access the files in S3 and start a Batch Transformation Job in Sagemaker.
The inference code shown before receives as input a json, consequently the function must be triggered when an json object is
created in S3.
import json
import boto3
from datetime import datetime
stringNow = datetime.now().strftime("%d%m%Y-%H%M%S")
data_path = "s3://{}/{}".format(bucket, key)
output_path="s3://{}/{}output/{}".format(bucket, out_key, stringNow)
print(output_path)
response = sm.create_transform_job(
TransformJobName='crossSellKmeans-'+stringNow,
ModelName='crossSellKmeans',
MaxConcurrentTransforms=1,
MaxPayloadInMB=100,
BatchStrategy='MultiRecord',
TransformInput={
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': data_path
}
},
'SplitType': 'Line',
'ContentType': 'application/json'
},
TransformOutput={
'S3OutputPath': output_path,
'Accept':'application/json'
},
TransformResources={
'InstanceType': 'ml.m4.xlarge',
'InstanceCount': 1
}
)
r = {
'status': 200,
'body': response
}
return r
Just save the Lambda function with the script above, setting the paths and desired keyword args (e.g. InstanceType) for the job.
Fig. 10: Save the changes made
After all this we can upload a .json file to trigger our function and see in Sagemaker Batch Transformation Jobs if the job
started and/or completed succesfully.
References
https://towardsdatascience.com/brewing-up-custom-ml-models-on-aws-sagemaker-e09b64627722
https://github.com/leongn/model_to_api
https://aws.amazon.com/pt/blogs/machine-learning/train-and-host-scikit-learn-models-in-amazon-sagemaker-by-building-a-
scikit-docker-container/
https://github.com/aws-samples/serverless-ai-workshop/tree/master/Lab%202%20-%20SageMaker%20Batch%20Transform
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own/