0% found this document useful (0 votes)
43 views6 pages

Predicting Customer Class Using Customer Lifetime Value With Random Forest Algorithm

The document discusses a predictive model for customer classification using Customer Lifetime Value (CLV) and the Random Forest algorithm, aimed at enhancing customer relationship management for online retailers. The model utilizes a dataset from a global Superstore, focusing on historical transaction data to identify customer classes for effective marketing resource allocation. The study compares the performance of the Random Forest algorithm with AdaBoost, highlighting the importance of feature selection and hyperparameter tuning in achieving accurate predictions.

Uploaded by

eng24cse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Predicting Customer Class Using Customer Lifetime Value With Random Forest Algorithm

The document discusses a predictive model for customer classification using Customer Lifetime Value (CLV) and the Random Forest algorithm, aimed at enhancing customer relationship management for online retailers. The model utilizes a dataset from a global Superstore, focusing on historical transaction data to identify customer classes for effective marketing resource allocation. The study compares the performance of the Random Forest algorithm with AdaBoost, highlighting the importance of feature selection and hyperparameter tuning in achieving accurate predictions.

Uploaded by

eng24cse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Predicting Customer Class using Customer Lifetime Value with Random Forest
Algorithm

Than Than Win1, Khin Sundee Bo 2


University of Information Technology, Myanmar
Email: [email protected] 1 , [email protected] 2

Abstract whose costs of marketing, selling, and servicing can


exceed the profit from them. Therefore, online retailers
As there are a lot of booming online retailers in e- should focus only on their highest business value
commerce industry in the Internet age, the need of customers to maintain long term CRM. The business value
maintaining competitive advantages has become to pay of a customer is often expressed with CLV which
attention to customer relationship management (CRM). represents the total amount of money a customer is
To build a successful CRM strategy, it is needed to know expected to spend in business during their lifetime. CLV
individual customer class which can be calculated from helps us to solve many problems, such as decisions
Customer Lifetime Value (CLV): the monetary value of related to segmentation, addressing, retaining, and
customers purchased from the business during their acquiring customers, or issues concerning a company's
lifetime. CLV modelling allows us to identify customer’s long-term value [7].
predicted business value. It provides the retailers for
effectively allocating the resource in their business. This Most researchers have implemented CLV problems as
predictive model has been taken on the global Super regression task but predicting categories of CLV
Store Retail dataset with almost ten thousand (customer class) as classification analysis is more
transactions. Our model will predict the customers’ class informative in decision making process which can help the
of the next year based on their CLV that will help the retailer to allocate the marketing spend effectively for their
online retailer to decide which customer should be business strategies. The CLV of the individual customer
invested to get long term CRM. Random Forest (RF) can be calculated from the customers’ purchase
algorithm is utilized to train our model and Random behaviours that can be captured from the historical
Search tuning is conducted to get the best predictive transaction data [1]. This prediction model has been
accuracy. The experimental analysis is performed to conducted based on the retail transaction dataset
compare with AdaBoost algorithm on the same dataset. employed with RF Classification algorithm to forecast the
business class of customers for the next year. The dataset
used to train this predictive model consists of four years
Key Words- Customer Lifetime Value, Random Forests, (from 2011 to 2014) transaction data from www.kaggle.com.
AdaBoost
To build this predictive model, dataset preprocessing is
made firstly that includes cleaning the data, setting
1. Introduction interval (feature and target period for training and testing),
aggregating the data, discretizing (target variable) and
Today is Internet age, technology become well feature extraction (predictor variables). The target
developed and people like shopping online more than at variables are calculated by discretizing of CLV. And then,
the physical store because they can do the price merge the datasets of Feature and Target period for
comparisons by browsing through dozens of different training and testing respectively. After that, feature
websites to find the best price, and shopping online is selection is made on the training set to select the most
very convenient. This change drives to the large increase related features which contribute most to our predictand
of the retailers in the online retail industry. As cost to variable. Initially, our model is trained using RF Classifier
acquire a new customer is more expensive than retaining algorithm with its default hyperparameters. The training
an existing customer, online retailers should focus on their target and features are used to learn the parameters of the
existing customers. But there will be some customers model. The parameters from the training period are applied

236

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
2

to the model for testing features to predict target variable and the AdaCost boosting. The AdaCost classifier and
for the testing period. Afterwards, Random Search cross the cost-sensitive tree achieved the best results in their
validating is conducted to find the optimal hyperparameter empirical application. Neural network and the decision tree
values of Random Forest to achieve the best accuracy. give the best results by AUROC.
Finally, performance evaluation of our model is made to
As we want to generate accurate predictions of
compare with another ensemble method called AdaBoost
individual customer class, we chose Random Forest
algorithm on the test dataset.
algorithm because it provides better predictive
We have organised the rest of this paper in the performance compared to other supervised learning
following way: Section 2 will review the work related with algorithms. Random forest can give the best results even if
the CLV prediction. Section 3 will describe the dataset, the dataset has many features. We explored the many
data pre-processing and methods used in this paper: the datasets and finally found and choose the US super store
explanation of feature selection method, RF algorithm, dataset which has such characteristics contains no of 24
Random Search to tune the best hyperparameters of RF features and 25000 transactions records.
algorithm. After that, the performance measure and results
of our model are represented in Section 4. Discussion
3. Model Implementation
about our model can be found in Section 5.
This section contains the explanation of the steps of
2. Related work how our prediction model was developed, which is
g
illustrated in Figure 1.
(Bernat, 2019) compared the predictive power of three
different models: Pareto/NBD, Cox proportional hazard,
and Gradient tree boosting to predict CLV for individual
customers [1]. The models are trained to predict customer
spend in the next year using the historical transactions. In
addition to RFM variables, they use other covariates.
Among them, The Pareto/NBD extension model performed
better than others.
(Jasek P, 2018) discussed the predictive abilities of CLV
models by the comparison of the performance of different
predictive models: Extended Pareto/NBD model, Markov
chain model and Status Quo model based on the six
datasets [7]. In that paper, the performance of the models
was evaluated for both long and short period. The
EP/NBD model outperformed other models in a majority of
evaluation metrics.
(Chamberlain, 2017) developed the CLV prediction
system for UK based global e-commerce company that
uses rich features for the past three-year data to predict
the net spend of customers of the next year modeling with
RF regression algorithm [2]. It addressed their two
problems: (CLV and churn prediction) and results were
evaluated. And then, they use feature learning to improve Figure 1 Overview design of the proposed system
their model by experimenting a hybrid model that development
combines logistic regression with a Deep Neural Network. First of all, the description and cleaning of the dataset
is given. Next, the data are prepared to be a suitable format
(Nicolas Glady, 2008) defined a churner as someone
for using in the models. And then, the target variable is
whose CLV is decreasing [9]. Contribution of that paper
discretized to identify more informative customer class.
introduced a new loss function wherein the loss incurred
After that, applying RF classifier algorithm, conducting
by the CLV decrease will be used to assess the cost to
hyperparameter tuning and a brief overview of the
misclassify a customer. They compare the performance of
development environment are described.
five classifiers: logistic regression, multi-layer perceptron
neural network, decision tree, cost-sensitive decision tree
237

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
3

3.1. Description and cleaning of the dataset

The global Superstore dataset from Kaggle is adopted


to compute the individual CLV of the customers. It
consists of 51300 order_line rows wherein totally 25000
unique Order transactions that were made by 1500
customers within a purchase period of 4 years from 1 Jan
2011 to 12 Dec 2014.
The original dataset contains the 24 attributes: Row ID,
Order ID, Order Date, Ship Date, Ship Mode, Customer ID,
Customer Name, Segment, City, State, Country, Postal
Code, Market, Region, Product ID, Category, Sub-
Category, Product Name, Sales, Quantity, Discount, Profit,
Shipping Cost, Order Priority. In order to get a set of Figure 3 Number of customers per frequency
customer behavioral features for our model, we drop the
order related attributes such as Order Priority, Category,
Product Name, etc. that cannot improve our model 3.2. Data pre-processing
performance. If a categorical attribute has too many
unique values, it can make tree-based algorithms’ Initially, Order date of the dataset is changed into
predictive power diminished. We explored the unique datetime format because we need to do some mathematical
values of each categorical attributes and drop high unique operation to calculate recent number of purchased days of
values attributes and are described in Table 1. each customer. We set the Feature and Target period for
Table 1 Count of unique values of Categorical attributes training and testing: (2011-01-01 to 2012-12-31), (2013-01-
01 to 2013-12-31), (2012-01-01 to 2013-12-31) and (2014-01-
01 to 2014-12-31) are created as new datasets for each
Categorical Attributes Unique Values period which is shown in Figure 4. For all the four period
datasets, we create ‘Orders’ dataframe, which consists of
Segment 3
all unique orders from the original dataset grouping
Country 141 Order_lines by Order_Id, the variables Total_Amount is
created by the sum of Sales (Product’s selling price)
City 2592
Column, sum of Profit as Total_Profit and sum of Discount
State 41 as Total_Discount. Using Orders dataset, we create new
dataframe ‘Customers’ by grouping by Customer_ID in
Region 13 order to get unique customers.
We dropped ‘Country’, ‘City, ‘State’ and ‘Region’. In
order to get more insight of customers ’ purchase behavior,
Figure 2 and Figure 3 show that that the number of orders Training Period Features Target
increasing slightly every year and number of customers
per frequency by customers that give more understanding
Testing Period Features Target
of customer purchase behaviour.

-3y -2y -1y Present +1y

Figure 4 Training and testing period of our model


And then, calculate their Recency (total number of
days which is measured as the day of a customer’s last
purchase minus the last day of observation period),
Frequency (the number of orders a customer made during
the observation period) and Monetary value (the sum of
Total_Amount of all orders made by a customer). The
Figure 2 Number of orders per year dependent variable ‘CLV’ is calculated based on the
Customers dataset using the following equations (1) to (7).

238

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
4

CLV = (Custome r Value /Churn Rate )*Profit margin (1) importance to rank the most important variables from our
features set using the Random Forest feature importance
Custome r Value = Average O rder Value * Purchase (2) method, which ranks the contribution level of each feature
Frequency
on the prediction of predictand variable ‘0’ or ‘1’. Six
Churn Rate =1 – Repeat Rate (3) features out of total features are selected and their
importance values are mapped in Table 3. We used these
Profit margin = Total Revenue* profit Rate (%) (4)
features to train our model.
Ave rage O rder Value = Total Revenue/No: of (5) Table 3 Feature importance of selected features
Transactions of individual customer
Selected Features Importance
Purchase Fre quency = Total No: of Transactions of (6)
all customer/No: of Customers
Revenue 0.203783
Repeat Rate = The rate of custome rs the ir (7)
transaction count are greater than one Shipping Cost 0.180218

Resulted continuous CLV values are discretized to Frequency 0.164207


classify the customer class as ‘0 - (High Class customer)’
Profit 0.144875
or ‘1 - (Low Class customer)’ that is used as the label for
our model. That column will describe which customer class
Recency 0.137844
should be invested for the retention strategy in the
upcoming years. The three rows randomly selected from Discount 0.124139
the preprocessed data are described in Table 2.
Table 2 Three rows with discretization of target variable
3.3. Random forest

Customer ID AB/10015 AB/10105 AB/10255 Random forest is a supervised machine learning


algorithm which can be used for both regression and
Revenue 5922.034 5832.223 3845.484
classification problems. It is an ensemble learning methods
Profit 279.4568 1277.752 -885.905 that combines multiple decision trees, trains each one on a
different set of the samples which are drawn with
Discount 4.52 6.3 2.64
replacement, splitting nodes in each tree considering a
Shipping limited number of the features, gets a predicted outcome
Cost 989.79 975.65 421.02 from each decision tree and calculates the votes for
Frequency 14 14 13 predicted outcome. The maximum voting (for
classification) or an average (for regression) of the
Home predictions of each individual tree gives the more accurate
Segment Cons umer Cons umer Offi ce final prediction.
Recency 3 13 2 In Random Forest algorithm, there are two stages: (1)
Customer random forest creation, (2) making a prediction from the
Class 0 0 1 random forest classifier created in the first stage.
Random Forest creation pseudocode:
According to some categorical variables in our feature
set, we conducted one-hot encoding to changes x First randomly selects ‘k’ features from all
categorical data to a numerical format in order to features
understand for our model. It spreads the values in a x Among the “k” features, calculate the node “d”
column to multiple flag columns and assigns 0 or 1 to using the best split point
them. These binary values express the relationship
between grouped and encoded column. x Split the node into child nodes using the best
split
The last step of pre-processing is feature selection:
choosing the most relevant features which contribute x Repeat the upper three steps until “l” number of
most to our predictand variable. We further applied feature nodes has been reached

239

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
5

x Build forest by repeating the upper steps for “n” which provides a range of supervised and unsupervised
number times to create “n” number of trees learning algorithms is utilized to conduct the necessary
function in our development.
The random forest prediction pseudocode:
Table 4 Initialized values and optimal hyperparameters
x Take the test features and use the rules of each
values of random s earch of model
randomly created decision tree to predict the
outcome and store the predicted outcome (target)
Hyperparameter Initialization Best Parameter
x Calculate the votes for each predicted target
criterion 'entropy','gini' ‘gini’
x Consider the high voted predicted target as the
final prediction from the random forest algorithm max_depth 10-110 50
max_features 'auto', 'sqrt' 'sqrt'
3.4. Hyperparameter tuning min_samples_leaf 1, 2, 4 2

min_samples_split 5, 10, 20 20
For machine learning models, optimizing
hyperparameters is a key step to acquire the accurate n_estimators 200-2000 200
results. In contrast, model’s parameters are values
estimated during the training process that parameters bootstrap True, False True
specify how to transform the input data into the desired
output and hyperparameters define structure of the model 4. Performance measure and results
that can impact model accuracy and computational
efficiency. The models can have many hyperparameters
and finding the best combination set of parameters is Our model predicts the class of customers for whether
called hyperparameter tuning. which customer will be high or low class in order to decide
which customer class should give how much offer in the
The hyperparameters of random forest are upcoming promotion and marketing campaigns.
max_samples (Number of samples train each decision tree), Effectively allocating marketing spend to each customer
max_features (The number of features to consider when class can reduce much cost rather than offering all
looking for the best split), n_estimators (The number of customers equally. The performance of our model is
trees in the forest), criterion (The function to measure the evaluated based on precision, recall and accuracy. Initially
quality of a split), max_depth (The maximum depth of the we trained the model with default parameters and then
tree), min_samples_leaf (the minimum number of samples accessed the predictive accuracy of our model by training
required to be at a leaf node) and min_samples_split (the with optimal hyperparameter sets tuned with Random
minimum number of samples required to split). We used Search on the test dataset using the classification-report
Random Search to explore the best hyperparameter sets which a key metrics to measure the quality of predictions
for our model because of their advantages of improved of classification algorithms, which reports the scores of
exploratory power, finding the optimal value for the critical precision, recall, f1-score and accuracy. Precision
hyperparameter and much lesser time. Random search is a describes what proportion of predicted Positives is truly
technique where random combinations of the positive, Recall expresses what proportion of actual
hyperparameters are used to find the best hyperparameter Positives is correctly classified, f1-score describes what
set for the model. It yields better results to Grid Search percentage of positive predictions are correct, Accuracy
tuning comparatively. The initialization and the optimal means what proportion of all Positive and Negative were
set of hyperparameters are described in Table 4. After the correctly classified, and their equations are as follows:
best hyperparameter values are found, the model is trained
once again with the training set. Once the model is trained, Precision = TP/ (TP+FP) (8)
the testing set is inputted into the model to obtain the Recall = TP/ (TP+FN) (9)
loyalty class of each customer.
f1-score = 2*(Recall*Precision)/(Recall+Precision) (10)

Accuracy = (TP+TN)/ (TP+TN+FP+FN) (11)


3.5. Development environment
Python programming language with Spyder
Integrated Development Environment was used to Where, TP is when a Positive sample
implement our predictive model. The Scikit-learn library is correctly classified as Positive class, FP means when a

240

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
6

Negative sample is falsely classified as Positive class, TN retention strategy. As the further study based on the
is when a Negative sample is correctly classified as customer’s class and exploring their preferences, product
Negative class, FN is called when a Positive sample interest will be recommended, which leads to increase in
is incorrectly predicted as Negative class.Table 5 sale and helps to develop a better relationship with your
describes the accuracy of classifier models with different potential customers and can incentivize Low class
hyperparameters’ sets on testing dataset. In Table 6, we customer to improve our retention strategy.
show that comparison result of Random Forest best model
and AdaBoost model with respect to precision, recall, and 6. References
F1.
Table 5 Accuracy of classifier models with different
hyperparameters’ sets on testing dataset [1] J.R. Bernat, A.J. Koning, and D. Fok, “M odelling
customer lifetime value in a continuous, non-
Random Forest Models Accuracy contractual Time Setting”, Netherlands, 2019.

Default hyperparameters 81.46% [2] B.P. Chamberlain, A. Cardoso, C.H. Bryan Liu, R.
Pagliari, and M .P. Deisenroth, “Customer lifetime
Selected feature using Feature Selection 82.26% value prediction using embeddings,” the 23rd ACM
with default hyperparameters SIGKDD International Conference, Volume: 23,
Canada, 2017.
Random Search’s best hyperparameters set 84.27%
[3] S. Chen, “Estimating customer lifetime value using
AdaBoost Model 78.21% machine learning techniques”, London, 2018.

[4] DATA SCIENCE GROUP, AM PERITY INC,


Table 6 Resulted precision, recall, and f1-score for each “Predicting Customer Lifetime Value with Unified
customer class of best model Customer Data”, 2019.
Precision Recall F1-score [5] E. Farzanfar and N. Delafrooz, “Determining the
Customer Lifetime Value based on the Benefit
0 (High Class) 93.65% 79.45% 85.96%
Clustering in the Insurance Industry”. Indian Journal
1 ( Low Class) 74.34% 91.70% 82.12% of Science and Technology, India, 2016.

[6] C. Jangid, T. Kothari, J. Spear, and E. Wadsworth,


5. Discussion and conclusion “Custoval: estimating customer lifetime value using
machine learning techniques,” Dept. of CIS-Senior
Design 2013, 2014.
From Table 5, we can see that the accuracy of model
with Random Search’s optimal hyperparameter value [7] P. Jasek, L. Vrana, L. Sperkova, Z. Smutny, and M .
outperform than AdaBoost models. Model with default Kobulsky, “M odeling and application of customer
hyperparameters values with all features is 81.46%. With lifetime value in online retail,” Informatics 5, India,
only the selected features from feature selection, the 2019.
model’s accuracy improved to 82.26%, that is really good
[8] M . Karlssson, “Predicting customer lifetime value
enough model. Random forest model with the optimal
using machine learning algorithms”, 2016.
hyperparameters tuned by Random Search increased by
2%. [9] N. Glady, B. Baesens, and C. Croux, “M odeling churn
using customer lifetime value,” KU Leuven KBI
For High class customer, our model performed very
Working Paper, Belgium, 2008.
well with high precision and good f1-score, and high recall
for Low class customers. [10] T. Rathi, “Customer lifetime value measurement using
machine learning techniques”, IGI Global, USA, 2011.
In this paper, we provide customer class prediction
model using random forest algorithm to correctly classify [11] A. Vanderveld, A. Pandey, A. Han, and R. Parekh,
individual online retail customer class, viewed from the “An engagement-based customer lifetime value system
perspective of customer lifetime value. We showed that for e-commerce,” Proceedings of the 22nd ACM
our model performed consistently well using s elected SIGKDD International Conference on Knowledge
features and best parameters from Random Search. Using Discovery and Data Mining, New York, 2016.
our model can help the online retailers to decide which
class of customer need to put much effort to maintain

241

Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.

You might also like