0% found this document useful (0 votes)
8 views39 pages

Unit-3 Packaging ML Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views39 pages

Unit-3 Packaging ML Model

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Packaging ML Model

Trainer: Ms. Nidhi Grover Raheja


Package an ML Model
• A package for an ML model refers to a structured bundle of
files and code that includes the trained machine learning
model, its dependencies, and necessary configurations.
• This package is designed to be easily distributable and
reusable across different environments or platforms.
• Packaging an ML model ensures that others can load the
model, process inputs, and make predictions without
needing to re-train the model from scratch.
Why Package an ML Model?
• Reusability: You can reuse the model across different
projects and environments.
• Consistency: Ensures that the same model and code are
used consistently across different platforms.
• Ease of Deployment: Makes it easier to deploy the model in
production, whether as an API or through Docker.
• Version Control: You can version different iterations of the
model and code.
Key Concepts of Packaging
• Packaging a machine learning (ML) model for production involves
wrapping the model and its dependencies into a reusable and
distributable form, allowing it to be deployed and used in various
environments.
• Below is a breakdown of the key concepts:
❑Model Serialization
❑Dependency Management
❑Project Structure
❑Configuration Files
❑Building the Package
❑Distribution and Installation
❑Prediction at Runtime
❑Deployment
1. Model Serialization: Train and save the model using joblib.
2. Dependency Management: Create a requirements.txt file listing
the dependencies.
3. Project Structure: Organize the code into separate directories for
model code, data, and configurations.
4. Configuration: Use YAML or JSON files to store environment-
specific settings.
5. Build the Package: Define the setup process using setup.py and
install the package.
6. Distribute and Install: Build the package using setuptools and install
it in any Python environment.
7. Prediction: Load the model at runtime and run inference.
8. Deployment: Optionally, package the entire project in Docker or
deploy it as a web service.
Step 1: Model Serialization
• After training your ML model, the first step is to serialize (or save) it
into a file so it can be loaded later for predictions without retraining.
• This allows you to "package" the model as a file.
• Serialization Tools are given below:
✓joblib and pickle: Commonly used for serializing ML models in
Python.
✓ONNX: An open standard format for exporting ML models across
different frameworks. Example: Saving a trained model using joblib
# Step 1: Train and serialize the model
import joblib
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load dataset and train a model


X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier().fit(X_train, y_train)

# Save the trained model as a .pkl file


joblib.dump(model, "iris_model.pkl")
print("Model saved!")

Outcome: The model is saved in iris_model.pkl, which can be loaded later for predictions.
Step 2: Dependency Management
• Every ML model relies on libraries like scikit-learn, numpy, or
pandas.
• Packaging requires managing these dependencies to ensure
the model works in any environment.
✓requirements.txt: This file lists the libraries required to
run the ML model.
✓environment.yml: If using Conda, this file captures the
Python version, dependencies, and system packages.
Example requirements.txt:

scikit-learn==1.0.2
joblib==1.1.0
numpy==1.21.0
Step 3: Project Structure
• Organizing your code, model, and resources is crucial for packaging.
• A well-structured project ensures code maintainability and makes it easy to build the package.
• Here’s an example project structure:
Step 4: Configuration Files
• Configuration files, typically in formats like .yaml, .json, or .ini, store environment-
specific settings (e.g., paths, thresholds, or model hyperparameters) so that the
package can be easily reconfigured in different environments.
Example config.yaml:

Usage: The config.yaml file defines where the model is stored and which features are used for
prediction
Step 5: Building the Package
• Once the model and scripts are in place, create a setup.py file to define how your
project can be installed as a Python package.
• The setup file specifies the package name, version, dependencies, and entry points
(if needed).
Example setup.py:
from setuptools import setup, find_packages

setup(
name="ml_model_package",
version="0.1",
packages=find_packages(),
install_requires=[
"scikit-learn",
"numpy",
"joblib"
],
entry_points={
'console_scripts': [
'predict=ml_model_package.predict:main',
]
}
)
Step 6: Distribution and Installation

Once the setup file is ready, you can build and install the package.

1. Install Locally:
pip install .
This command installs the package in your Python environment.

2. Build the Package: To create a distributable package, use:

python setup.py sdist bdist_wheel


This generates distribution archives (like .tar.gz or .whl files) in the dist/
directory. These can be uploaded to PyPI or shared directly.
Example: predict.py
Step 7: Prediction at Runtime
• The packaged model can now be
used to make real-time predictions
by loading the serialized model and
running inference.

We can now run predictions by executing:

python ml_model_package/predict.py
Example Dockerfile:
Build and run the container:
Step-by-Step Process to Build an ML Package:
Step 1: Create a Virtual Environment
• Before packaging, create a virtual environment to ensure that all
dependencies are isolated.
1. Open Command Prompt and create a virtual environment:

python -m venv ml_env

2. Activate the virtual environment:

ml_env\Scripts\activate
Step 2: Create a Project Directory

1. Create a project directory for your package.


2. Inside this directory, create the following subdirectories.
Project Directory Structure
Step 3: Write the ML Model Code

• Train the Model: Create a script train_model.py in the ml_model_package


directory to train and save the ML model using echo.> command
• We'll use the Iris dataset for this example.
# train_model.py
We'll use the Iris dataset for this example.
import joblib
train_model.py → from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import os

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Ensure the directory exists


model_dir = "data"
if not os.path.exists(model_dir):
os.makedirs(model_dir)

# Save the model


joblib.dump(model, os.path.join(model_dir, "iris_model.pkl"))
print("Model saved at 'data/iris_model.pkl'")
Prediction Code: Create a script predict.py to load the model and make predictions on new data.

Create predict.py using echo.> command


predict.py # predict.py

import joblib
import numpy as np

# Load the trained model


model = joblib.load("data/iris_model.pkl")

def predict_iris():
# Predict using the model
# Sample input for prediction
sample_input = [[5.1, 3.5, 1.4, 0.2]] # Example for Iris dataset

# Make prediction
prediction = model.predict(sample_input)

print(f"Prediction: {prediction}")

if __name__ == "__main__":
predict_iris()
Step 4: Define the Package with setup.py

The setup.py file defines how your project will be packaged and installed.

Create the setup.py file in the root directory (ml_package):


setup.py

import sys
sys.path.append('.')

from setuptools import setup, find_packages

setup(
name="ml_model_package",
version="0.1",
packages=find_packages(),
install_requires=[
"scikit-learn",
"joblib",
"numpy"
],
entry_points={
'console_scripts': [
'predict=ml_model_package.predict:predict_iris',
]
}
)
Step 5: Create __init__.py in ml_model_package

The __init__.py file is used to mark a directory as a Python package. It also allows you to
initialize or configure things when the package is imported. For your ml_model_package,
the __init__.py file can remain simple or can be used to import functions or classes to
make them accessible at the package level.
Example Contents for __init__.py

Suppose you want to make the predict function from the predict.py module directly accessible
when someone imports the package. If you have multiple useful functions across different
modules (e.g., train_model.py and predict.py), you can add them in __init__.py to provide
access to everything at the top level of your package

You can add an import statement like this in __init__.py:

Now, if someone imports your package, they can access the predict_iris function.
Step 6: Create config.yaml file
Create config.yaml file in ml_model_package

Check config.yaml FileEnsure the config.yaml file has the correct path to the model:
Add the following code to config.yaml

# config.yaml

model_path: "data/iris_model.pkl"
input_columns: ["sepal_length", "sepal_width", "petal_length", "petal_width"]
Step 6: Install the Package Locally

Now we can install the package locally for testing.

Run the following command to build and install the package:

pip install .

This will package the entire project, including the scripts and dependencies, and
install it in your virtual environment.
Step 7: Run the Prediction Script

Once the package is installed, you can use the predict command directly from
the terminal:

predict

You might also like