Save and Load Machine Learning Models in Python with scikit-learn
Last Updated :
02 Aug, 2022
In this article, let's learn how to save and load your machine learning model in Python with scikit-learn in this tutorial.
Once we create a machine learning model, our job doesn't end there. We can save the model to use in the future. We can either use the pickle or the joblib library for this purpose. The dump method is used to create the model and the load method is used to load and use the dumped model. Now let's demonstrate how to do it. The save and load methods of both pickle and joblib have the same parameters.
syntax of dump() method:
pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
parameters:
- obj: The pickled Python object.
- file: The pickled object will be written to a file or buffer.
- fix_imports: When supplied, the method dump() will determine if the pickling procedure should be compatible with Python version 2 or not based on the value for the pickle protocol option. True is the default value. Only a name-value pair should be used with this default parameter.
syntax of load() method:
pickle.load(file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=None)
The load() method Returns the rebuilt object hierarchy indicated therein after reading the pickled representation of an object from the open file object file.
Example 1: Saving and loading models using pickle
Python's default method for serializing objects is a pickle. Your machine learning algorithms can be serialized/encoded using the pickling process, and the serialized format can then be saved to a file. When you want to deserialize/decode your model and utilize it to produce new predictions, you can load this file later. The training of a linear regression model is shown in the example that follows. In the below example we fit the data with train data and the dump() method is used to create a model. The dump method takes in the machine learning model and a file is given. The test data is used to find predictions after loading the model using the load() method. root mean square error metric is used to evaluate the predictions of the model.
Python3
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import pickle
# import the dataset
dataset = pd.read_csv('headbrain1.csv')
X = dataset.iloc[:, : -1].values
Y = dataset.iloc[:, -1].values
# train test split
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.2, random_state=0)
# create a linear regression model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# save the model
filename = 'linear_model.sav'
pickle.dump(regressor, open(filename, 'wb'))
# load the model
load_model = pickle.load(open(filename, 'rb'))
y_pred = load_model.predict(X_test)
print('root mean squared error : ', np.sqrt(
metrics.mean_squared_error(y_test, y_pred)))
Output:
root mean squared error : 72.11529287182815
Example 2: Saving and loading models using joblib
The SciPy ecosystem includes Joblib, which offers tools for pipelining Python jobs. It offers tools for effectively saving and loading Python objects that employ NumPy data structures. This can be helpful for machine learning algorithms that need to store the complete dataset or have a lot of parameters. let's look at a simple example where we save and load a linear regression model. The same steps are repeated while using the joblib library.
Python3
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import joblib
# import the dataset
dataset = pd.read_csv('headbrain1.csv')
X = dataset.iloc[:, : -1].values
Y = dataset.iloc[:, -1].values
# train test split
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.2, random_state=0)
# create a linear regression model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# save the model
filename = 'linear_model_2.sav'
joblib.dump(regressor, open(filename, 'wb'))
# load the model
load_model = joblib.load(open(filename, 'rb'))
y_pred = load_model.predict(X_test)
print('root mean squared error : ', np.sqrt(
metrics.mean_squared_error(y_test, y_pred)))
Output:
root mean squared error : 72.11529287182815
Similar Reads
Identifying Overfitting in Machine Learning Models Using Scikit-Learn Overfitting is a critical issue in machine learning that can significantly impact the performance of models when applied to new, unseen data. Identifying overfitting in machine learning models is crucial to ensuring their performance generalizes well to unseen data. In this article, we'll explore ho
7 min read
How to Save Machine Learning Models in R Saving machine learning models in R is essential for preserving trained models for future use, such as making predictions on new data or deploying the models in production environments. This article covers various methods to save and load machine learning models in R Programming Language ensuring yo
5 min read
Why Save Machine Learning Models? Machine learning models play a pivotal role in data-driven decision-making processes. Once a model is trained on a dataset, it becomes a valuable asset that can be used for making predictions on new, unseen data. In the context of R Programming Language, saving machine learning models is a crucial s
5 min read
What is fit() method in Python's Scikit-Learn? Scikit-Learn, a powerful and versatile Python library, is extensively used for machine learning tasks. It provides simple and efficient tools for data mining and data analysis. Among its many features, the fit() method stands out as a fundamental component for training machine learning models. This
4 min read
How to Scale Machine Learning with MLOps: Strategies and Challenges Machine Learning (ML) has transitioned from an experimental technology to a cornerstone of modern business strategy and operations. Organizations are increasingly leveraging ML models to derive insights, automate processes, and make data-driven decisions. However, as the adoption of ML grows, scalin
5 min read
Learning Model Building in Scikit-learn Building machine learning models from scratch can be complex and time-consuming. Scikit-learn which is an open-source Python library which helps in making machine learning more accessible. It provides a straightforward, consistent interface for a variety of tasks like classification, regression, clu
8 min read
Loan Eligibility Prediction using Machine Learning Models in Python Have you ever thought about the apps that can predict whether you will get your loan approved or not? In this article, we are going to develop one such model that can predict whether a person will get his/her loan approved or not by using some of the background information of the applicant like the
5 min read
Python - Create UIs for prototyping Machine Learning model with Gradio Gradio is an open-source python library which allows you to quickly create easy to use, customizable UI components for your ML model, any API, or any arbitrary function in just a few lines of code. You can integrate the GUI directly into your Python notebook, or you can share the link to anyone.Requ
4 min read
Implementing SVM and Kernel SVM with Python's Scikit-Learn In this article we will implement a classification model using Scikit learn implementation for SVM model in Python. Then we will try to understand what is a kernel and how it can helps us to achieve better performance by learning non-linear boundaries in the dataset. What is a SVM algorithm? Support
6 min read
Save classifier to disk in scikit-learn in Python In this article, we will cover saving a Save classifier to disk in scikit-learn using Python. We always train our models whether they are classifiers, regressors, etc. with the scikit learn library which require a considerable time to train. So we can save our trained models and then retrieve them w
3 min read