Predicting Customer Class Using Customer Lifetime Value With Random Forest Algorithm
Predicting Customer Class Using Customer Lifetime Value With Random Forest Algorithm
Predicting Customer Class using Customer Lifetime Value with Random Forest
Algorithm
236
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
2
to the model for testing features to predict target variable and the AdaCost boosting. The AdaCost classifier and
for the testing period. Afterwards, Random Search cross the cost-sensitive tree achieved the best results in their
validating is conducted to find the optimal hyperparameter empirical application. Neural network and the decision tree
values of Random Forest to achieve the best accuracy. give the best results by AUROC.
Finally, performance evaluation of our model is made to
As we want to generate accurate predictions of
compare with another ensemble method called AdaBoost
individual customer class, we chose Random Forest
algorithm on the test dataset.
algorithm because it provides better predictive
We have organised the rest of this paper in the performance compared to other supervised learning
following way: Section 2 will review the work related with algorithms. Random forest can give the best results even if
the CLV prediction. Section 3 will describe the dataset, the dataset has many features. We explored the many
data pre-processing and methods used in this paper: the datasets and finally found and choose the US super store
explanation of feature selection method, RF algorithm, dataset which has such characteristics contains no of 24
Random Search to tune the best hyperparameters of RF features and 25000 transactions records.
algorithm. After that, the performance measure and results
of our model are represented in Section 4. Discussion
3. Model Implementation
about our model can be found in Section 5.
This section contains the explanation of the steps of
2. Related work how our prediction model was developed, which is
g
illustrated in Figure 1.
(Bernat, 2019) compared the predictive power of three
different models: Pareto/NBD, Cox proportional hazard,
and Gradient tree boosting to predict CLV for individual
customers [1]. The models are trained to predict customer
spend in the next year using the historical transactions. In
addition to RFM variables, they use other covariates.
Among them, The Pareto/NBD extension model performed
better than others.
(Jasek P, 2018) discussed the predictive abilities of CLV
models by the comparison of the performance of different
predictive models: Extended Pareto/NBD model, Markov
chain model and Status Quo model based on the six
datasets [7]. In that paper, the performance of the models
was evaluated for both long and short period. The
EP/NBD model outperformed other models in a majority of
evaluation metrics.
(Chamberlain, 2017) developed the CLV prediction
system for UK based global e-commerce company that
uses rich features for the past three-year data to predict
the net spend of customers of the next year modeling with
RF regression algorithm [2]. It addressed their two
problems: (CLV and churn prediction) and results were
evaluated. And then, they use feature learning to improve Figure 1 Overview design of the proposed system
their model by experimenting a hybrid model that development
combines logistic regression with a Deep Neural Network. First of all, the description and cleaning of the dataset
is given. Next, the data are prepared to be a suitable format
(Nicolas Glady, 2008) defined a churner as someone
for using in the models. And then, the target variable is
whose CLV is decreasing [9]. Contribution of that paper
discretized to identify more informative customer class.
introduced a new loss function wherein the loss incurred
After that, applying RF classifier algorithm, conducting
by the CLV decrease will be used to assess the cost to
hyperparameter tuning and a brief overview of the
misclassify a customer. They compare the performance of
development environment are described.
five classifiers: logistic regression, multi-layer perceptron
neural network, decision tree, cost-sensitive decision tree
237
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
3
238
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
4
CLV = (Custome r Value /Churn Rate )*Profit margin (1) importance to rank the most important variables from our
features set using the Random Forest feature importance
Custome r Value = Average O rder Value * Purchase (2) method, which ranks the contribution level of each feature
Frequency
on the prediction of predictand variable ‘0’ or ‘1’. Six
Churn Rate =1 – Repeat Rate (3) features out of total features are selected and their
importance values are mapped in Table 3. We used these
Profit margin = Total Revenue* profit Rate (%) (4)
features to train our model.
Ave rage O rder Value = Total Revenue/No: of (5) Table 3 Feature importance of selected features
Transactions of individual customer
Selected Features Importance
Purchase Fre quency = Total No: of Transactions of (6)
all customer/No: of Customers
Revenue 0.203783
Repeat Rate = The rate of custome rs the ir (7)
transaction count are greater than one Shipping Cost 0.180218
239
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
5
x Build forest by repeating the upper steps for “n” which provides a range of supervised and unsupervised
number times to create “n” number of trees learning algorithms is utilized to conduct the necessary
function in our development.
The random forest prediction pseudocode:
Table 4 Initialized values and optimal hyperparameters
x Take the test features and use the rules of each
values of random s earch of model
randomly created decision tree to predict the
outcome and store the predicted outcome (target)
Hyperparameter Initialization Best Parameter
x Calculate the votes for each predicted target
criterion 'entropy','gini' ‘gini’
x Consider the high voted predicted target as the
final prediction from the random forest algorithm max_depth 10-110 50
max_features 'auto', 'sqrt' 'sqrt'
3.4. Hyperparameter tuning min_samples_leaf 1, 2, 4 2
min_samples_split 5, 10, 20 20
For machine learning models, optimizing
hyperparameters is a key step to acquire the accurate n_estimators 200-2000 200
results. In contrast, model’s parameters are values
estimated during the training process that parameters bootstrap True, False True
specify how to transform the input data into the desired
output and hyperparameters define structure of the model 4. Performance measure and results
that can impact model accuracy and computational
efficiency. The models can have many hyperparameters
and finding the best combination set of parameters is Our model predicts the class of customers for whether
called hyperparameter tuning. which customer will be high or low class in order to decide
which customer class should give how much offer in the
The hyperparameters of random forest are upcoming promotion and marketing campaigns.
max_samples (Number of samples train each decision tree), Effectively allocating marketing spend to each customer
max_features (The number of features to consider when class can reduce much cost rather than offering all
looking for the best split), n_estimators (The number of customers equally. The performance of our model is
trees in the forest), criterion (The function to measure the evaluated based on precision, recall and accuracy. Initially
quality of a split), max_depth (The maximum depth of the we trained the model with default parameters and then
tree), min_samples_leaf (the minimum number of samples accessed the predictive accuracy of our model by training
required to be at a leaf node) and min_samples_split (the with optimal hyperparameter sets tuned with Random
minimum number of samples required to split). We used Search on the test dataset using the classification-report
Random Search to explore the best hyperparameter sets which a key metrics to measure the quality of predictions
for our model because of their advantages of improved of classification algorithms, which reports the scores of
exploratory power, finding the optimal value for the critical precision, recall, f1-score and accuracy. Precision
hyperparameter and much lesser time. Random search is a describes what proportion of predicted Positives is truly
technique where random combinations of the positive, Recall expresses what proportion of actual
hyperparameters are used to find the best hyperparameter Positives is correctly classified, f1-score describes what
set for the model. It yields better results to Grid Search percentage of positive predictions are correct, Accuracy
tuning comparatively. The initialization and the optimal means what proportion of all Positive and Negative were
set of hyperparameters are described in Table 4. After the correctly classified, and their equations are as follows:
best hyperparameter values are found, the model is trained
once again with the training set. Once the model is trained, Precision = TP/ (TP+FP) (8)
the testing set is inputted into the model to obtain the Recall = TP/ (TP+FN) (9)
loyalty class of each customer.
f1-score = 2*(Recall*Precision)/(Recall+Precision) (10)
240
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.
6
Negative sample is falsely classified as Positive class, TN retention strategy. As the further study based on the
is when a Negative sample is correctly classified as customer’s class and exploring their preferences, product
Negative class, FN is called when a Positive sample interest will be recommended, which leads to increase in
is incorrectly predicted as Negative class.Table 5 sale and helps to develop a better relationship with your
describes the accuracy of classifier models with different potential customers and can incentivize Low class
hyperparameters’ sets on testing dataset. In Table 6, we customer to improve our retention strategy.
show that comparison result of Random Forest best model
and AdaBoost model with respect to precision, recall, and 6. References
F1.
Table 5 Accuracy of classifier models with different
hyperparameters’ sets on testing dataset [1] J.R. Bernat, A.J. Koning, and D. Fok, “M odelling
customer lifetime value in a continuous, non-
Random Forest Models Accuracy contractual Time Setting”, Netherlands, 2019.
Default hyperparameters 81.46% [2] B.P. Chamberlain, A. Cardoso, C.H. Bryan Liu, R.
Pagliari, and M .P. Deisenroth, “Customer lifetime
Selected feature using Feature Selection 82.26% value prediction using embeddings,” the 23rd ACM
with default hyperparameters SIGKDD International Conference, Volume: 23,
Canada, 2017.
Random Search’s best hyperparameters set 84.27%
[3] S. Chen, “Estimating customer lifetime value using
AdaBoost Model 78.21% machine learning techniques”, London, 2018.
241
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on January 22,2025 at 04:13:51 UTC from IEEE Xplore. Restrictions apply.