0% found this document useful (0 votes)

413 views9 pages

Machine Learning Framework For Customer Purchase Prediction

Uploaded by

Saurav Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

413 views9 pages

Machine Learning Framework For Customer Purchase Prediction

Uploaded by

Saurav Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

European Journal of Operational Research 281 (2020) 588–596

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

A machine learning framework for customer purchase prediction in

the non-contractual setting
Andrés Martínez a, Claudia Schmuck b, Sergiy Pereverzyev Jr. b, Clemens Pirker c,
Markus Haltmeier b,∗
a
Coolblue BV, Weena 664, 3012 CN Rotterdam, The Netherlands
b
Department of Mathematics, University of Innsbruck, Technikerstraße 13, 6020 Innsbruck, Austria
c
Department of Strategic Management, Marketing and Tourism, University of Innsbruck, 6020 Innsbruck, Austria

a r t i c l e i n f o a b s t r a c t

Article history: Predicting future customer behavior provides key information for eﬃciently directing resources at sales
Received 25 November 2016 and marketing departments. Such information supports planning the inventory at the warehouse and
Accepted 17 April 2018
point of sales, as well strategic decisions during manufacturing processes. In this paper, we develop ad-
Available online 5 May 2018
vanced analytics tools that predict future customer behavior in the non-contractual setting. We establish
Keywords: a dynamic and data driven framework for predicting whether a customer is going to make purchase at
Analytics the company within a certain time frame in the near future. For that purpose, we propose a new set
Purchase prediction of customer relevant features that derives from times and values of previous purchases. These customer
Sales forecast features are updated every month, and state of the art machine learning algorithms are applied for pur-
Non-contractual setting chase prediction. In our studies, the gradient tree boosting method turns out to be the best performing
Machine learning method. Using a data set containing more than 10 0 0 0 customers and a total number of 20 0 0 0 0 pur-
chases we obtain an accuracy score of 89% and an AUC value of 0.95 for predicting next moth purchases
on the test data set.
© 2018 Elsevier B.V. All rights reserved.

1. Introduction those who are simply in the midst of a pause between transac-
tions.
Customer management requires that firms make a careful as- It is widely accepted by business wisdom and research litera-
sessment of the costs and benefits of alternative expenditures and ture that it costs five to ten times more to acquire a new customer
investments, and identify the optimal allocation of resources to than to retain an existing customer (Bhattacharya, 1998; Daly,
marketing and sales actions over time. Decision makers will ben- 2002). While the factor itself may vary substantially depending on
efit from decision support models that relate costs and customer the business context, retaining customers has received strong at-
purchase behavior, and forecast the value of the customer portfo- tention from both academia and practitioners (see Van den Poel
lio (Berger et al., 2002). Thus, knowing who is likely to purchase & Lariviere, 2004 for an overview). Thereby, it has been well es-
within the next months is one of key drivers to allocate efficiently tablished that appropriate retention strategies have strong bene-
resources at the sales and marketing departments (see e.g. Allenby, fits over acquisition approaches (see Ganesh, Arnold, & Reynolds,
Leone, & Jen, 1999). This information is also needed when plan- 20 0 0). However, it has to be noted that retention activities are
ning the inventory at the warehouse and/or point of sales, as well not necessarily desirable in an unconditional way, since target-
for deciding quantities at the manufacturing processes. Thereby, ing profitable customers can make marketing spending more effi-
the non-contractual distinction is of fundamental importance for cient (Kumar, Venkatesan, & Reinartz, 2008; Mulhern, 1999; Zei-
developing models for customer-base analysis. One of the main thaml, Rust, & Lemon, 2001), even more if this profitability can
challenges in the non-contractual settings is how to differentiate be predicted (Reinartz & Kumar, 2003). Among practitioners, it is
customers who have ended their relationship with the firm from quite desirable to consider customers’ future profitability and re-
sponsiveness, specifically in terms of purchase actions, to market-
∗
ing when allocating resources (Rust, Kumar, & Venkatesan, 2011;
Corresponding author.
E-mail addresses: [email protected] (A. Martínez),
Venkatesan & Kumar, 2004).
[email protected] (C. Schmuck), [email protected] Firms are encouraged to develop models to predict which cus-
(S. Pereverzyev Jr.), [email protected] (C. Pirker), tomers are more likely to defect (Keaveney & Parthasarathy, 2001;
[email protected] (M. Haltmeier).

https://doi.org/10.1016/j.ejor.2018.04.034
0377-2217/© 2018 Elsevier B.V. All rights reserved.
A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596 589

Neslin, Gupta, Kamakura, Lu, & Mason, 2006). Once identified, The application of machine learning or data mining techniques
these likely defectors should be targeted with appropriate incen- for predictive purposes on the customer-base is often analyzed in
tives to convince them to stay (Hadden, Tiwari, Roy, & Ruta, 2007). the customer relationship management and expert systems do-
main, and customer churn prediction is the most popular objec-
tive in this fields. The concept of churn and associated statistical
1.1. Customer purchase prediction implementations have been well studied in B2C business models
(see, e.g., Burez & Van den Poel, 2007; Neslin et al., 2006; Ver-
While purchase prediction has received attention for a long beke, Dejaeger, Martens, Hur, & Baesens, 2012; Xia & Jin, 2008;
time in consumer research (see e.g., Herniter, 1971, the rise of Xie, Li, Ngai, & Ying, 2009), especially in a contractual setting. Main
customer analytics by marketing analysts has revived such issues industries include retail markets, subscription management, finan-
in the recent years (Winer, 2001). As outlined in Platzer and cial services and electronics commerce (see e.g. Chen, Fan, & Sun,
Reutterer (2016), one of the most challenging areas remains the 2012 for an overview. This is in line with the general trend of a
prediction of customer purchases in the non-contractual settings: stronger focus of academia and intelligence approaches on B2C ap-
The current status of the customer is not directly observable at a plications (Wiersema, 2013). However churn prediction is also im-
time and the available historical record is censored while customer portant in the B2B context, where it has been studied much less;
data tends to vary substantially. During the last years, large im- see Jahromi, Stakhovych, and Ewing (2014). In particular, the de-
provements in the information technology domain have resulted velopment of business relationships remains central to B2B com-
in the increased availability of customer transaction data (Fader panies (Eriksson & Vaghult, 20 0 0). The importance of retention
& Hardie, 2009). Initial analyses of these transaction databases for suppliers becomes even clearer in the B2B context where cus-
are usually descriptive in form of basic summary statistics such tomers make larger and more frequent purchases with far higher
as the average number of orders or the average order size, and transactional values (Boles, Barksdale, & Johnson, 1997; Rauyruen
information on the distribution of behaviors across the customer & Miller, 2007).
portfolio. Further processing of the customer base may use mul- Accurate demand forecasting is a fundamental aspect of supply
tivariate statistical methods and data mining tools to identify chain management. In overall terms, methods for estimating the
characteristics of, for instance, heavy buyers, or to determine level of supply chain flexibility are a function of varying demand
which groups of products tend to be purchased together (i.e., quantities and varying supply lead times (Das & Abdel-Malek,
performing a market-basket analysis). 2003). Most of the companies manage their inventory starting by
The next step in terms of data processing is to undertake forecasting the expected demand quantities by stock keeping units
customer-base analysis that are more predictive in nature (Fader & (SKUs). Very often those time series are intermittent (Willemain,
Hardie, 2009). In this work we develop a machine learning frame- Smart, Shockor, & DeSautels, 1994), with many time periods having
work for forecasting future purchasing by the firm’s customers no demand and being very difficult to predict (Willemain, Smart, &
from a given customer transaction database. For that purpose we Schwarz, 2004). This pattern is characteristic of demand for ser-
first compute a large number of customer features, that charac- vice parts inventories and capital goods. Manufacturers perceive
terizes the customer at a given month. We then apply machine the forecasting of intermittent data to be an important problem.
learning algorithms including logistic Lasso regression (Friedman, In practice, the standard method of forecasting intermittent de-
Hastie, & Tibshirani, 2010; Tibshirani, 1996), the extreme learning mand are exponential smoothing, moving averages and the Cros-
machine (Huang, Zhu, & Siew, 2006) and gradient tree boosting ton’s method. When possible, by aggregating retail sales we might
(Chen & Guestrin, 2016; Friedman, 2001) for predicting whether find strong trend and seasonal patterns, for those cases can be
the customer makes a purchase in the upcoming month. Our ap- applied traditional methods for predicting the demand (Alon, Qi,
proach avoids prohibitive customer inquires that may be costly to & Sadowski, 2001). The results of our research can support in-
be acquired in the non-contractual setting, and nevertheless shows ventory management via enriching the expected demand by SKUs
performance comparable to state of the art approaches. with such kind of information.
While our framework can equally be applied for predicting cus-
tomer behavior in any future time period, we focus our presen- 1.2. Relations to previous work
tation on predicting next month purchase. Our research is moti-
vated for being applied in a company working in the B2B field In the last decade, various machine learning methods for pre-
that requires decisions in how to deploy its account management dicting customer retention and profitability have been analyzed in
sales and marketing activities. Those activities are done mainly the academia field and some of them often used by practition-
on a monthly basis; consequently predicting one and two months ers. In most cases those approaches are based on extracting cus-
ahead is suitable to achieve the desired goals. Competitive market- tomer’s latent characteristics from its past purchase behavior with
ing strategy focuses on how a business should deploy marketing the mindset that observed behavior is the outcome of an under-
resources at its disposal to facilitate the achievement and main- lying stochastic process (Fader & Hardie, 2009). This approach to
tenance of competitive positional advantages in the marketplace the customer’s purchase prediction can be named as characteris-
(Varadarajan & Yadav, 2002). The monthly binding is a natural tics approach. Previous studies have analyzed the use of random
timeframe for most of the companies and business, especially im- forest techniques in order to predict customer retention and prof-
portant for business domains where the company requires a very itability (Larivière & Van den Poel, 2005) in a financial services and
fast speed when deciding, and for companies were some activities B2C context. Three major predictor categories that encompass po-
are done in a monthly basis. In regards of some other business do- tential explanatory variables were considered and those three cat-
mains, we mention fast moving consumer goods, and fast fashion egories were: past customer behavior, observed customer hetero-
retailers, where the company has to decide rapidly for attracting geneity and variables related to intermediaries. In that research,
consumers, in particular fast fashion requires to introduce inter- it was found evidences that past customer behavior is more im-
pretations of the runway designs to the stores in a minimum of portant to generate repeat purchasing and favorable profitability
three to five weeks (Barnes & Lea-Greenwood, 2006). Anyway, our evolution, while the intermediary’s role has a greater impact on
framework can be generalized to predicting purchases of a cus- the customers defection proneness. Literature on effective B2B pro-
tomer within any future time period in a straight forward manner; motions suggests to incorporate an enhanced, in depth view on
compare also with Remark 2.3 below. the complex decision making setup such as buying center analysis
590 A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596

(Hellman, 2005). Decision making is also more complex with B2B Table 2.1
Original transactional data. For every transaction of customer
customers, as companys purchase decision is usually the conse-
k one stores time tk, i and value Vk, i of its ith purchase made
quence of a complex decision process, an alignment among stake- between month A and month B.
holders and business goals, and comparing the decision process
Customer ID Order Purchase time Purchase value
done as an end consumer in the B2C domain. Also is important
to highlight a very strong influence coming from the industry dy- k i tk, i Vk, i
namic and other activities like product launches and campaigns. 1 1 t1, 1 V1, 1

Closely related to our work is Jahromi et al. (2014), where the
1 N1 t1,N1 V1,N1
authors develop a model for predicting wether a customer per-
forms a purchase in some prescribed future time frame based on K 1 t K, 1 Vk, 1
purchase information from the past. They propose customer char-
K NK tK,NK VK,NK
acteristics such as the number of transactions observed in past
time frames, time of the last transaction, and the relative change
in total spending of a customer. They found an adaptive boost-
ing method (Freund, Schapire et al., 1996) to perform best on the of generality we assume that the purchases of any customer are
tested data with an AUC value of 0.92. In contrast, in our study ordered chronologically, that is tk,i1 ≤ tk,i2 for all i1 ≤ i2 .
we compute a richer set of customer characteristics than the one With the above notions, the problem under consideration can
in Jahromi et al. (2014). These features are listed in Table 2.2 and be stated as follows:
described in detail in Section 2.3. For our framework, the best per-
forming method (gradient tree boosting) shows an AUC value of Problem 2.1 (Prediction of customer purchases). Given purchase
0.95. We point out that we obtain a higher AUC score even we data Pk {(tk,i , Vk,i )|i = 1, . . . , Nk } for every customer k ∈ K be-
use a time frame of only one month within purchases are pre- tween month A and month B (as illustrated in Table 2.1), predict
dicted. This is much smaller than the 6 months time frame used whether a given customer makes a transaction in the month fol-
in Jahromi et al. (2014). A smaller time frame is beneficial in terms lowing to B.
of actionability for a company; however it also makes the predic-
tion much more complicated. These results demonstrate that our We address Problem 2.1 using machine learning algorithms. For
method provides valuable and reliable information for supporting that purpose we introduce some further notation. We define the
sales and marketing departments also in the short term. binary variable yk, τ that characterizes the purchase of the cus-
tomer k in month τ ∈ [A, B] {A, A + 1, . . . , B} by
1.3. Outline
1 if customer k makes purchase in month τ
yk,τ (2.1)
The remainder of this article is organized as follows. In 0 otherwise .
Section 2 we formally describe the considered purchase predic-
Further, for pairs (k, τ ) ∈ K × [A, B] we construct a feature vec-
tion activity, see Problem 2.2. In that section we also describe
tor xk, τ that characterizes the state of customer k at time τ based
the features characterizing the customers at a specific time. In
on purchase information of customer k up to month τ . Notice that
Section 3 we describe how to solve Problem 2.2 and therefore per-
the values yk, τ are only known for τ ≤ B and that we aim at es-
form purchace prediction using machine learning tools. In partic-
timating yk,B+1 for the upcoming month B + 1. Therefore, Problem
ular, we use the logistic Lasso, the extreme learning machine and
2.1 can be reformulated as follows:
gradient tree boosting for model selection. Our framework for pur-
chase prediction is applied in Section 4 to transactional B2B data Problem 2.2 (Reformulation as supervised learning prob-
of 10 0 0 0 customers and a total number of 20 0 0 0 0 transactions. lem). Estimate the values of yk,B+1 from the feature vector
The gradient tree boosting turns out to be the performing model xk,B+1 representing the behavior of customer k until month B, and
showing an accuracy score of 88.98% and an AUC value of 0.949. known input-output pairs (xk, τ , yk, τ ) for all k ∈ K and certain
The paper ends with a short discussion presented in Section 5. τ ∈ [A, B].
2. Formal problem definition and description of our customer The efficient solution of Problem 2.2 requires the computation
features of significant customer features xk,τ [1], . . . , xk,τ [M]. In this work
we propose a certain set of M = 274 characteristic features that are
In this section we establish a mathematical framework that for- listed in Table 2.2. A detailed description of these features and its
mally describes the customer’s purchase task. We give particular computation is given in Section 2.3.
emphasis on the description of the features characterizing the cus-
tomer at a certain time instance, that are used for subsequent pre- Remark 2.3 (Predicting different time frames). For keeping the
dictive analysis. presentation simple, in the formal problem description we consid-
ered the case of predicting purchases within the following month.
2.1. Problem description In a straight forward manner, Problem 2.1 can be generalized to
predicting purchases of a customer within any future time frame
Suppose that certain customers purchase products or services [A + M1 , A + M2 ]. In the case M1 < M2 the prediction period con-
at a given company. We suppose that the company has a total sist of several months, while the case M1 = M2 corresponds to
number of K customers. Any customer is represented by its ID purchase prediction within a single month. Next month purchase
k ∈ K {1, . . . , K }. Here and below means equal by definition. description as formalized in Problem 2.1 corresponds to the case
The purchases of customer k are characterized by purchase times where M1 , M2 are both equal to one. In the case of a general pre-
tk, i and purchase values Vk, i for i = 1, . . . , Nk , with Nk denoting the diction period, one can again solve a supervised machine learning
total number of purchases of customer k. The whole transaction task similar to Problem 2.2, where the label values (2.1) are mod-
data made between month A and month B can be arranged in a ified to reflect the desired time frame. Actually, we also present
list as shown in Table 2.1. Here and below months are identified results for two-month purchase prediction corresponding to M1 =
with elements in the set Z of all integer numbers. Without loss M2 = 2; see Section 5.
A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596 591

Table 2.2 for extracting properly the purchase trends; we have found no
Characteristic customer features derived from the transactional raw
significant changes of final results when using higher polynomials.
data.
Fig. 2.1 shows an example of a customer’s binned purchase data
Characteristics related to purchase time vk,τ together with the corresponding 6th order moving average
Number of total purchases x[1] v̄k,τ and the polynomial fit vˆ k,+ (t ).
Mean time between purchases x[2]
Standard deviation of purchase frequency x[3] 2.3. Customer features
Maximal time without purchase x[4]
Time since last purchase x[5]
Thresholds for classification x[6], x[7], x[8] We are now ready to formally define the characterizing fea-
Frequency classification x[9] tures xk,τ [1], . . . , xk,τ [274] listed in Table 2.2. The features of any
Characteristics related to purchase value customer dynamically depend on time τ . For its computation we
Moving averages x[10], x[11], x[12] use subsets of the purchase data (original as well as smoothed)
Maximum values of purchase x[13], x[14] that containing purchases made between months τ − T and τ − 1,
Mean values of purchase x[15], x[16], x[17]
Median values of purchase x[18], x[19], x[20]
where T is some fixed time period. Formally, we define this past
Time frame variations x[21], x[22] purchase data for τ ∈ [A + T , B + 1] as
Purchase trend x[23]
Pk,τ { tk,i , Vk,i ∈ Pk |τ − T ≤ tk,i < τ − 1} .
Further customer information
Country of customer x[24] We denote the number of customer’s purchases in the time
Creation of additional variables frame [τ − T , τ − 1] by Nk, τ and the corresponding purchase data
Pairwise products x[25], . . . , x[214] by
Powers of two and three x[215], . . . , x[254]
Logarithms x[255], . . . , x[274] τ t
tk,i for i = 1, . . . , Nk,τ ,
k,Mk,τ +i
τ V
Vk,i for i = 1, . . . , Nk,τ ,
k,Mk,τ +i

2.2. Data binning and smoothing where Mk, τ is the total number of purchases made prior to month
τ − T.
The customer features will be extracted from the original pur-
chase information, as well as smoothed versions obtained by mov- 2.3.1. Characteristics related to the purchase time
ing averages and a polynomial fit. For that purpose we first define We first describe the characteristics related to the times of pur-
binned purchase data as the sum of all purchases of a customer chases.
within a given month, • Number of purchases.

vk,τ Vk,i for k ∈ K and τ ∈ [A, B] . The first considered feature is the number of customer’s pur-
i:tk,i =τ chases in the time frame [τ − T , τ − 1]. We take

In particular, we have vk,τ = 0 if there is no purchase from the xk,τ [1] Nk,τ |Pk,τ | ,
customer k in month τ . Otherwise vk,τ is equal to the correspond- as the number of elements in Pk,τ .
ing purchase value. • Mean time between purchases.
We use the moving average of order six of the binned purchase Next we consider the weighted average of the number of time
values, deﬁned by units between purchases in Pk,τ (or purchases in the time
frame [τ − T , τ − 1]),
1
5
v̄k,τ vk,τ −τ for k ∈ K and τ ∈ [A − 5, B] . Nk,τ
6
τ =0 τ .
xk,τ [2] wi tk,i
With the computed moving average of the binned purchases we i=2
want to have a representative monthly purchase value for each τ tτ − tτ
Here tk,i k,i k,i−1
is the number of time units between
customer. Those values allow us to compare customers at any
the ith and (i − 1 )-th purchase in Pk,τ . In this work we propose
month within the year, independently of having purchased or not Nk,τ
in that month. In the case of our particular data set, is very likely to choose the weights as wi (i − 1 )2 / i=2 (i − 1 )2 .
that customer buy products at least twice in a year. Therefore, a • Standard deviation of times between purchases.
moving average of order six seems to be the best moving window Using the same weights wi as above we deﬁne the weighed
for provide a representative value. Finally, we consider a polyno- standard deviation of the number of time units between pur-
mial regression approximation of order seven, chases in Pk,τ as

vˆ k (t ) = θ0 + θ1 t + θ2 t 2 + · · · + θ7 t 7 . N
k,τ
τ 2
Here t represents a continuous time variable and vˆ k is con- xk,τ [3] wi tk,i − xk,τ [2] .
structed such that vˆ k ( j ) v̄k, j for j = τ − 1, τ − 2, . . . , τ − T . The i=2

coefficients θ i are determined using the elastic-net method. Ac- • Maximal time without purchase.
tually, we will use vˆ k,+ (t ) max{1, vˆ k (t )}. One motivation for Here we consider the maximum number of time units between
introducing such a continuous domain representation is the high purchases in Pk,τ ,
variability of the customer purchases. This aims for for having a
τ |i = 2, . . . , N }.
xk,τ [4] max{tk,i
representative for purchases within the customer lifetime which 1 k,τ
should not be affected by the high volatility observed, which is • Time since last purchase.
especially high in the B2B domain. With a sufficient number of
The next feature measures the number of months since the last
features, the elastic-net algorithm extracts a representative curve
purchase in the time frame [τ − T , τ − 1] has been performed,
and the employed regularization and cross validation yields a τ
reliable estimate allowing the extraction of customer characteris- τ − 1 − tk,N if Nk,τ = 0
xk,τ [5] k,τ
tics. In our analysis we have found that order seven is sufficient T otherwise .
592 A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596

15000

elastic net
Moving average order 6
Purchase value
10000
Monetary value

5000

0 5 10 15 20 25
Month
Fig. 2.1. An example of the customer’s purchase data (tk, i , Vk, i ) that are visualized by the asterisks. The corresponding 6th order moving averages v̄k,τ are visualized by the
circles. The ﬁtted purchase value function vˆ k,+ (t ) is visualized by the dashed curve.

• Thresholds for classification. Here we take two different characteristics defined as the max-
We consider certain thresholds for the number of time units ima over the actual purchases and the polynomial fit, respec-
between purchases tively,
τ |i = 1, . . . , N } ,
xk,τ [13] max{Vk,i
xk,τ [6] xk,τ [2] + h1 xk,τ [3] , k,τ

xk,τ [7] xk,τ [2] + h2 xk,τ [3] , xk,τ [14] max{vˆ k,+ (t )|t ∈ [τ − 1, τ − T ]} .
xk,τ [8] xk,τ [2] + h3 xk,τ [3] , • Mean values of purchase.
Here we consider the mean values of the actual, the binned and
where h1 , h2 , h3 are some positive numbers. We propose to the fitted purchase values defined by
take h1 = 2, h2 = 4 and h3 = 8.
Nk,τ
• Frequency classification. 1 τ
xk,τ [15] Vk,i ,
The next characteristic is a categorial feature that characterizes Nk,τ
i=1
the customer k according on the purchase frequency. It is de-
τ
−1
fined by 1
⎧ xk,τ [16] vk,t ,
T
⎪
⎨normal
if xk,τ [5] ≤ xk,τ [6] , t=τ −T
attrition if xk,τ [6] < xk,τ [5] ≤ xk,τ [7] , τ −1
x9 [k, τ ] 1
⎩at − risk if
⎪ xk,τ [7] < xk,τ [5] ≤ xk,τ [8] , xk,τ [17] vˆ k,+ (t ).
T
lost if xk,τ [8] < xk,τ [5] . t=τ −T

• Median values of purchase.

2.3.2. Features related to the purchase value Similar, we consider the medians of the actual, the binned and
We further consider the following customer characteristics that the ﬁtted purchase values,
are related to the purchase values of customers. τ |i = 1, . . . , N } ,
xk,τ [18] med{Vk,i k,τ

• Moving averages. xk,τ [19] med{vk,t |t ∈ [τ − 1, τ − T ]} ,

We consider the moving averages of order 6 and 3 of the
xk,τ [20] med{vˆ k,+ (t )|t ∈ [τ − 1, τ − T ]} .
binned purchase values and the polynomial approximation,
• Time frame variations.
xk,τ [10] v̄k,τ −1 , The time frame variation is characterized as the relative change
xk,τ [11] vˆ k,+ (τ − 1 ) , in purchase values,

0 if Nk,τ ≤ 1
1
2
xk,τ [12] k
vτ −1−τ . xk,τ [21] Vkτ,N −Vkτ,N
k ,τ k ,τ −i
3 Vkτ,N
otherwise ,
τ =0 k ,τ −i

• Maximum values of purchase. where i min{5, Nk,τ − 1}.

A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596 593

We also consider a categorical characteristic for time frame pairs (x, y) from the so-called training data set. Due to data imper-
variation that we define as follows: fections, not all of the training examples will be predicted exactly
by the classifier. In our case, the training data set takes the form
steady if xk,τ [21] < −μ ,
xk,τ [22] within − limits if |xk,τ [21]| ≤ μ , D { xk,τ , yk,τ |k ∈ K , A + T ≤ τ ≤ B} . (3.1)
alternating if xk,τ [21] > μ . Actually, all of our considered methods output a regression
Here μ is some positive value; we propose μ = 0.3. function mapping to the real numbers,
• Purchase trend. φ : RM → [0, 1] ⊆ R : x
→ φ (x ) . (3.2)
We characterize the purchase trend as a categorial variable de-
The output value φ (x) of the regression function can be inter-
pending on the relative change
preted as the probability that a feature vector x corresponds to
vˆ k,+ (τ − 1 ) − vˆ k,+ (τ − 6 ) a next month purchase. From the estimated purchase probabili-
dk,τ
vˆ k,+ (τ − 6 ) ties one constructs the classifier = λ by taking λ (x ) = 1 for
φ (x) > λ and zero otherwise. Here λ ∈ [0, 1] is a certain threshold
of the fitted purchase values. More precisely, we define the pur-
that is selected as tradeoff between sensitivity and specificity. In
chase trend by
⎧ this work we use a threshold of 0.5 for the final classification.
⎪decreasing−− if dk,τ ≤ −a3 For constructing the regression function in (3.2), we apply the
⎪
⎪
⎪
⎪decreasing− if − a3 < dk,τ ≤ −a2 following state-of-the art machine learning algorithms:
⎪
⎨decreasing if − a2 < dk,τ ≤ −a1 • Logistic Lasso regression;
xk,τ [23] stable if − a1 < dk,τ ≤ a1 • Extreme learning machine;
⎪
⎪ if a1 < dk,τ ≤ a2
⎪increasing
⎪ • Gradient tree boosting.
⎪
⎪ if a2 < dk,τ ≤ a3
⎩increasing+ These methods are briefly reviewed in the following subsec-
increasing++ dk,τ > a3 .
tions. Any of these methods will be used in combination with
Here a1 , a2 and a3 are some positive values; we propose to take 10-fold cross validation for estimating optimal values of the pa-
a1 = 0.15, a2 = 0.225, a3 = 0.3. rameters these methods depend on. In particular, applying cross
validation avoids overfitting on the training data set and therefore
2.3.3. Additional characteristics allows to generalize the trained models to predicting customer
Beside the characteristics described above we also use the cate- purchases where the next-month purchase is not known.
gorical characteristic xk, τ [24] denoting the country of the customer We decided on the above classification methods because they
k. In order to further increase the prediction accuracy we compute are totally different from one another, and further are known to
auxiliary variables from the variables xk, τ [m] excluding the four yield high accuracy with reasonable computational effort.
categorical characteristics.
The auxiliary variables are created by applying the following 3.2. Logistic Lasso regression
mathematical operations to the original variables:
For Lasso regression we use the logistic model which is one
• Pairwise products. of the most common models used in the context of classification
Here we form all products xk, τ [m] · xk, τ [m ] of the non- (Hastie, Tibshirani, & Friedman, 2009). We estimated the coeffi-
categorial features with m = m . This yields 19 + 18 + · · · + 1 = cients (β j ) in the logistic model by adding the 1 -penalty term
190 additional variables.
d

• Powers of two and three.
R (β ) β j ,
We further consider powers xk, τ [m]2 and xk, τ [m]3 of all non-
j=1
categorial features. This yields 20 + 20 = 40 additional vari-
ables. which is known as the Lasso (Tibshirani, 1996). The Lasso penalty
d
• Taking Logarithm.
j=1 β j results in variable selection and shrinkage. The purpose
Finally, we add the logarithms of the non-categorial features of this penalty is to retain a subset of the characteristics and to
log (xk, τ [m]). This yields 20 additional variables. discard the rest. This subset selection produces a model that is in-
terpretable and has possibly a lower prediction error than the full
In summary we have M = 24 + 190 + 40 + 20 = 274 variables model.
xk, τ [m] characterizing the customer k at time τ . Using these vari- For numerically computing the coefficients we use the al-
able we will train a classifier that predicts the binary purchase gorithm for logistic Lasso regression provided in the package
variable y. Although the creation of the artificial variables in prin- glmnet; (See Friedman et al., 2010, Chapter 3).
ciple does not increase the information content of the data, it puts
the data into a higher dimensional space and significantly im- 3.3. Extreme learning machine
proves results of the machine learning algorithms. For example,
the powers contain interactions between the variables which oth- Another model that we consider is the single-hidden layer feed-
erwise would difficult to be found by the algorithms. forward neural network (SLFN). We use the extreme learning ma-
chine algorithm (Huang et al., 2006) for building the SLFN on our
3. Application of machine learning algorithms training data. The extreme learning machine algorithm became a
very popular research subject in the past years (Huang, 2015). Un-
In this section we solve Problem 2.2 (the formally described like other algorithms for building neural networks, the extreme
purchase prediction issue) by various machine learning algorithms learning machine randomly chooses hidden nodes and analytically
for binary classification. determines the output weights of the SLFN. The extreme learning
machine provides a good theoretical performance at a very fast
3.1. Binary classification learning speed.
For our results we use implementation of this algorithm pro-
A binary classification algorithm constructs a function : vided in the package elmNN; (See Gosso & Martinez-de-Pison,
RM → {0, 1} in such a way that (x ) = y with high probability for 2012).
594 A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596

3.4. Gradient tree boosting Table 4.1

AUC and total prediction accuracy for the Lasso, the ex-
treme learning machine and the gradient tree boosting
Among the machine learning methods, boosting (Freund, method, evaluated on the training set with 10-fold cross
Schapire, & Abe, 1999), and specifically gradient tree boosting, fre- validation.
quently shows the best performance for many applications. The
AUC Accuracy (%)
term gradient boosting has been invented in Friedman (2001).
Boosting is a procedure that combines the outputs of several weak Lasso 0.9263 85.74
Extreme learning machine 0.9223 85.58
classifiers in order to produce a powerful classifier. In this way,
Gradient tree boosting 0.9340 86.68
boosting has a similarity to bagging (Breiman, 1996) and other
ensemble-based machine learning methods.
Boosting and bagging both form a set of simpler classifiers that Table 4.2
Confusion matrices for the Lasso (top), the extreme
are combined by voting. In bagging the ensembles are generated
learning machine (middle), and gradient tree boost-
by repeated bootstrap sampling of the data, and in boosting by ad- ing (bottom), evaluated on the training set with 10-
justing appropriate weights of training data. The purpose of boost- fold cross validation.
ing is to sequentially apply the weak classification algorithm to
Actual purchase Yes (%) No (%)
repeatedly modified versions of the data, thereby producing a se-
Lasso
quence of weak classifiers. The predictions from the weak classi-
Yes 23.42 07.60
fiers are combined through a weighted majority vote to produce No 06.66 62.32
the final prediction; (See Hastie et al., 2009, Chapter 10) for de- Extreme learning machine
tails. Yes 23.09 07.93
In our work we use the implementation of the gradient tree No 06.49 62.49
Gradient tree boosting
boosting algorithm provided by the package XGBoost; (See Chen
Yes 23.22 07.80
& Guestrin, 2016). No 05.52 63.46

4. Results
Table 4.3
Confusion matrix for the gradient tree boosting
We apply the developed framework for prediction of customer method evaluated on the independent test data
purchase on transactional data provided by a large manufacturer for prediction of purchases in April 2015. The
located in central Europe. The data have been gathered from trans- total prediction accuracy computes to 88.98%
actions of the B2B unit, which have been recorded from January an the AUC equals 0.949.

2009 until May 2015. We only consider transactions of customers Actual purchase Yes (%) No (%)
whose ﬁrst purchase is at least six months ago due to the lack of Gradient tree boosting
suﬃcient information in the other cases. Yes 21.70 06.37
The transactions belong to K = 10136 different customers from No 04.66 67.28
125 different countries. As the time unit we consider a month
as it is not very common in the considered data to have a cus-
tomer with more than one purchase per month. If a customer has All computations have been performed on a virtual machine
more than one purchase in a month, then for the actual purchase on ESX Cluster with 12 cores and 60 gigabytes RAM. The opera-
values Vi we take sum of the purchase values in the considered tion system is SUSE Linux Enterprise Server, and we have run the
month. After this monthly aggregation, the data set contains in scripts using RStudio Server with R version 3.1.2 underneath. The
total 192,470 orders for all customers. We take January 2009 as computation times for training a single model are 6 minutes for
month A = 1 such that May 2015 corresponds to month 77. The the Lasso, about one minute for the extreme learning machine, and
time period for computing the feature values is taken as T = 24. 2.5 minutes for gradient tree boosting.

4.1. Comparison of the machine learning algorithms

4.2. Example for purchase prediction
We first evaluate the predictive performance of the constructed
estimators on the training set (3.1) using the classifiers described The best performing model for the given data is the gradi-
in Section 3. We trained the models on the trainings data (3.1), ent tree boosting method. We therefore apply this classification
with A + T = 25 and B = 75. The use of 10-fold cross validation method to actually predict the customer purchases on the test set
avoids overfitting, thus the performance on the training data will that is not used for constructing the classifier. For that purpose
be representative for the actual performance on data with un- we trained the gradient tree boosting classifier using the train-
known output during the prediction phase. As assessment criteria, ing data (3.1) with A + T = 25 and B = 75. Hence the the model
we use the total prediction accuracy (percentage of correctly clas- is trained only using data until March 2015. The model is then ap-
sified purchases), and the area under the receiver operating char- plied to the test data
acteristic curve (AUC). Additionally, we consider the confusion ma-

xk,B+1 , yk,B+1 for k ∈ K , (4.1)
trices.
The results on the whole training set are presented in Tables 4.1 which corresponds to predictions of purchases in April 2015. Note
and 4.2. All considered methods give an excellent performance in that April 2015 has been chosen for evaluating the one-month
terms of the AUC and prediction accuracy. In particular, the estima- prediction models instead of May 2015, where data would also
tor constructed by the gradient tree boosting has the highest AUC be available. Such a choice has been made because we also
score, and it also has the best performance in the confusion matrix. tested our framework for two-month purchase prediction (com-
The differences in the AUC can be considered as very significant for pare Section 5), and in this way we are able to better compare
the total customer portfolio classification in the prediction phase. the performance of the one and two month predictions.
Therefore, we recommend the gradient tree boosting for its actual The resulting total prediction accuracy computed on the test
use in practice. data is given by 88.98% and the AUC value is 0.949; see Table 4.3.
A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596 595

side, Agrawal, Nottebohm, and West (2010) noted that 1.2% of rev-
enue represents a robust benchmark for travel cost in capital goods
industry; Berard (2014) found that 10% and more of annual com-
pany budget can be attributed to travel & entertainment where the
lion share is linked to sales. Saving up to 20% in average on these
cost or reinvesting into customer who are ready to buy represents
a very signiﬁcant improvement lever.

5. Discussion and conclusion

Predicting future customer behavior provides many beneﬁts for

a company, reaching from supporting of planning the inventory
at the warehouse to the identification of customer churn. While
data analytic tools for such purposes are well investigated in a
contractual setting (see for example, Chen et al., 2012; Miguéis,
Camanho, & e Cunha, 2013; Verbeke et al., 2012), they are much
less developed in the non-contactual setting; (see Jahromi et al.,
2014). In this work we developed a framework for dynamic and
fully data driven prediction of next month customer purchase in a
non-contactual setting. For that purpose, we constructed a specific
set of 274 feature variables characterizing the customer at specific
months. We applied several state of the art machine learning algo-
Fig. 4.1. ROC curve for the gradient tree boosting estimator computed for evaluated rithms, including the logistic Lasso, the extreme learning machine
on the independent test data for prediction of purchases in April 2015.. and gradient tree boosting, for computing next month purchase
probabilities. Our results show that the gradient tree boosting out-
Table 4.4
performes the Lasso and the extreme learning machine. Applied to
Prediction accuracy for some measures typically employed by man-
agers in a business context.
transactional data from to 10136 different customers it provides a
total accuracy of 88.98% and an AUC value of 0.949 evaluated on
Method 1 month 3 month Prior quarter
the test data. These results are in accordance with the contractual
hiatus (rolling) hiatus purchase
setting, where boosted decision trees have been reported to out-
Correctly classified perform other machine learning algorithms (Verbeke et al., 2012).
cases in April 2015 68% 59% 57%
A next step for enriching the information provided by our
framework is the prediction of the actual values of future pur-
chases. The presented approach could be modified for predicting
In Fig. 4.1, we present also the ROC curve for the gradient tree purchase values, by training a prediction model based on the same
boosting estimator on the test data (4.1). input variables, but using the purchase values as response variable.
Because of customer management investments lead to less tan- An accurate estimation of time and value for purchases, is an in-
gible assets (e.g., brands, customers, distributors), they are typically teresting line of future research and would be a powerful decision
harder to justify. But assets resulting from investment in business support tool for operative as well as strategic activities. Planning
processes create sustainable competitive advantages and capabil- can benefit if we know the product category, since some of the
ities (Srivastava, Shervani, & Fahey, 1999). Therefore, they repre- products are more prone to be purchased with a higher frequency
sent resources that firms can tap in driving company and share- than others. The underlying dynamics of the customer’s purchase
holder value. Inputting the demand of customers who are likely frequency probably is also based in the customers basket, which is
to purchase into demand planning can lower the inventory levels already partially contained in transactions with the company. For
required. It enables companies to postpone actions until they are planning purposes we see at least three important components: (1)
needed and thereby create targeted actions that should increase Identify the customer who is going to buy, (2) determine the bas-
revenue. Additionally, it is established in the literature that re- ket of these customers, and (3) enrich expected demands of SKUs
lationship management efforts have a positive impact e.g. by in- by information from the first two steps. In this paper we have fo-
creased earnings compared to transactional approaches (Buckinx & cused our efforts on the first step. Detailed investigation of the re-
Van den Poel, 2005). maining issues is an important line of future research.
In order to quantify one of the positive outcomes of our pro- Our algorithmic framework can be extended in a straight for-
posed framework in a business context, we compared our re- ward way to compute probabilities for customer purchases for any
sults against measures typically employed by managers (Wübben month in the near future. For example, we also tested the gradient
& Wangenheim, 2008). This is based in customer’s recency of the tree boosting for the prediction of purchases in the second month
last purchase (hiatus) shown in Table 4.4. For this discussion, we after the month B. For that purpose we considered the modified
consider cost savings and productivity increases for traveling sales- training data (xk,τ , yk,τ +1 ) for predicting yk,B+1 from xk, B . In this
men, typically employed for mid sized to large accounts to assure case we obtained a total prediction accuracy of 88% and an AUC
advisory and support. In such a setup, besides some other fac- of 0.941 on the test data. As argued above, the presented method
tors, the purchase timing is an important information for perform- could straight forwardly be extended to even later months ahead
ing a visit or not (Stone, Woodcock, & Wilson, 1996), and increas- or larger time frames similar to Jahromi et al. (2014).
ing precision from 68% to 89% can have a substantial impact on
both, productivity and cost: On the sales productivity axis, offer- References
ing a better allocation so sales travel which represented approxi-
matively 3–4 days per week in our study company’s industry, can Agrawal, A., Nottebohm, O., & West, A. (2010). Five ways CFOs can make
cost cuts stick. Last retrieved 31.10.2017, online article by McKinsey,
yield substantial gains. In practical terms, this means up to 1 day https://www.mckinsey.com/business-functions/strategy- and- corporate- finance/
more productive time per week per account manager. On the cost our- insights/five- ways- cfos- can- make- cost- cuts- stick.
596 A. Martínez et al. / European Journal of Operational Research 281 (2020) 588–596

Allenby, G. M., Leone, R. P., & Jen, L. (1999). A dynamic model of purchase timing Huang, G., Zhu, Q., & Siew, C. (2006). Extreme learning machine: Theory and appli-
with application to direct marketing. Journal of the American Statistical Associa- cations. Neurocomputing, 70(1), 489–501.
tion, 94(446), 365–374. Jahromi, A. T., Stakhovych, S., & Ewing, M. (2014). Managing B2B customer
Alon, I., Qi, M., & Sadowski, R. J. (2001). Forecasting aggregate retail sales:: A com- churn, retention and profitability. Industrial Marketing Management, 43(7), 1258–
parison of artificial neural networks and traditional methods. Journal of Retailing 1268.
and Consumer Services, 8(3), 147–156. Keaveney, S. M., & Parthasarathy, M. (2001). Customer switching behavior in online
Barnes, L., & Lea-Greenwood, G. (2006). Fast fashioning the supply chain: shaping services: An exploratory study of the role of selected attitudinal, behavioral,
the research agenda. Journal of Fashion Marketing and Management: An Interna- and demographic factors. Journal of the Academy of Marketing Science, 29(4),
tional Journal, 10(3), 259–271. 374–390.
Berard, L. (2014). The travel and expense management guide for Kumar, V., Venkatesan, R., & Reinartz, W. (2008). Performance implications of adopt-
2014: Trends for the future. Last retrieved 31.10.2017, http://www. ing a customer-focused sales campaign. Journal of Marketing, 72(5), 50–68.
aberdeenessentials.com/opspro- essentials/the- travel- and- expense- management- Larivière, B., & Van den Poel, D. (2005). Predicting customer retention and prof-
guide- for- 2014- trends- for- the- future. itability by using random forests and regression forests techniques. Expert Sys-
Berger, P. D., Bolton, R. N., Bowman, D., Briggs, E., Kumar, V., Parasuraman, A., & tems with Applications, 29(2), 472–484.
Terry, C. (2002). Marketing actions and the value of customer assets: A frame- Miguéis, V. L., Camanho, A., & e Cunha, J. F. (2013). Customer attrition in retailing:
work for customer asset management. Journal of Service Research, 5(1), 39–54. an application of multivariate adaptive regression splines. Expert Systems with
Bhattacharya, C. B. (1998). When customers are members: Customer retention in Applications, 40(16), 6225–6232.
paid membership contexts. Journal of the academy of marketing science, 26(1), Mulhern, F. J. (1999). Customer profitability analysis: Measurement, concentration,
31–44. and research directions. Journal of Interactive Marketing, 13(1), 25–40.
Boles, J. S., Barksdale, H. C., & Johnson, J. T. (1997). Business relationships: an exam- Neslin, S. A., Gupta, S., Kamakura, W., Lu, J., & Mason, C. H. (2006). Defection detec-
ination of the effects of buyer-salesperson relationships on customer retention tion: Measuring and understanding the predictive accuracy of customer churn
and willingness to refer and recommend. Journal of Business & Industrial Mar- models. Journal of Marketing Research, 43(2), 204–211.
keting, 12(3/4), 253–264. Platzer, M., & Reutterer, T. (2016). Ticking away the moments: Timing regularity
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. helps to better predict customer activity. Marketing Science, (5), 799.
Buckinx, W., & Van den Poel, D. (2005). Customer base analysis: Partial defection Van den Poel, D., & Lariviere, B. (2004). Customer attrition analysis for financial
of behaviourally loyal clients in a non-contractual FMCG retail setting. European services using proportional hazard models. European Journal of Operational Re-
Journal of Operational Research, 164(1), 252–268. search, 157(1), 196–217.
Burez, J., & Van den Poel, D. (2007). CRM at a pay-TV company: Using analytical Rauyruen, P., & Miller, K. E. (2007). Relationship quality as a predictor of B2B cus-
models to reduce customer attrition by targeted marketing for subscription ser- tomer loyalty. Journal of Business Research, 60(1), 21–31.
vices. Expert Systems with Applications, 32(2), 277–288. Reinartz, W. J., & Kumar, V. (2003). The impact of customer relationship character-
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. istics on profitable lifetime duration. Journal of Marketing, 67(1), 77–99.
arXiv:1603.02754. Rust, R. T., Kumar, V., & Venkatesan, R. (2011). Will the frog change into a prince?
Chen, Z., Fan, Z., & Sun, M. (2012). A hierarchical multiple kernel support vector Predicting future customer profitability. International Journal of Research in Mar-
machine for customer churn prediction using longitudinal behavioral data. Eu- keting, 28(4), 281–294.
ropean Journal of operational research, 223(2), 461–472. Srivastava, R. K., Shervani, T. A., & Fahey, L. (1999). Marketing, business processes,
Daly, J. L. (2002). Pricing for profitability: Activity-based pricing for competitive advan- and shareholder value: An organizationally embedded view of marketing activ-
tage: 11. John Wiley & Sons. ities and the discipline of marketing. The Journal of Marketing, 63, 168–179.
Das, S. K., & Abdel-Malek, L. (2003). Modeling the flexibility of order quantities and Stone, M., Woodcock, N., & Wilson, M. (1996). Managing the change from market-
lead-times in supply chains. International Journal of Production Economics, 85(2), ing planning to customer relationship management. Long Range Planning, 29(5),
171–181. 675–683.
Eriksson, K., & Vaghult, A. L. (20 0 0). Customer retention, purchasing behavior and Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the
relationship substance in professional services. Industrial Marketing Manage- Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
ment, 29(4), 363–372. Varadarajan, P. R., & Yadav, M. S. (2002). Marketing strategy and the internet: an or-
Fader, P. S., & Hardie, B. G. (2009). Probability models for customer-base analysis. ganizing framework. Journal of the Academy of Marketing Science, 30(4), 296–312.
Journal of Interactive Marketing, 23(1), 61–69. Venkatesan, R., & Kumar, V. (2004). A customer lifetime value framework for cus-
Freund, Y., Schapire, R., & Abe, N. (1999). A short introduction to boosting. Journal tomer selection and resource allocation strategy. Journal of Marketing, 68(4),
of Japanese Society For Artificial Intelligence, 14(5), 771–780. 106–125.
Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. Verbeke, W., Dejaeger, K., Martens, D., Hur, J., & Baesens, B. (2012). New insights into
In Proceedings of the thirteenth international conference on machine learning, Bari, churn prediction in the telecommunication sector: A profit driven data mining
Italy.: 96 (pp. 148–156). approach. European Journal of Operational Research, 218(1), 211–229.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized Wiersema, F. (2013). The B2B agenda: The current state of B2B marketing and a
linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. look ahead. Industrial Marketing Management, 42(4), 470–488.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting ma- Willemain, T. R., Smart, C. N., & Schwarz, H. F. (2004). A new approach to forecast-
chine. Annals of Statistics, 29(5), 1189–1232. ing intermittent demand for service parts inventories. International Journal of
Ganesh, J., Arnold, M. J., & Reynolds, K. E. (20 0 0). Understanding the customer base forecasting, 20(3), 375–387.
of service providers: An examination of the differences between switchers and Willemain, T. R., Smart, C. N., Shockor, J. H., & DeSautels, P. A. (1994). Forecasting
stayers. Journal of Marketing, 64(3), 65–87. intermittent demand in manufacturing: a comparative evaluation of Croston’s
Gosso, A., & Martinez-de-Pison, F. (2012). elmNN: Implementation of Extreme method. International journal of forecasting, 10(4), 529–538.
Learning Machine algorithm for single hidden layer feed forward neural net- Winer, R. S. (2001). A framework for customer relationship management. California
works. R package version 1. management review, 43(4), 89–105.
Hadden, J., Tiwari, A., Roy, R., & Ruta, D. (2007). Computer assisted customer churn Wübben, M., & Wangenheim, F. (2008). Instant customer base analysis: Managerial
management: State-of-the-art and future trends. Computers & Operations Re- heuristics often “;;get it right”;;. Journal of Marketing, 72(3), 82–93.
search, 34(10), 2902–2917. Xia, G., & Jin, W. (2008). Model of customer churn prediction on support vector
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: machine. Systems Engineering-Theory & Practice, 28(1), 71–77.
Data mining, inference, and prediction. Springer. Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction us-
Hellman, K. (2005). Strategy-driven B2B promotions. Journal of Business & Industrial ing improved balanced random forests. Expert Systems with Applications, 36(3),
Marketing, 20(1), 4–11. 5445–5449.
Herniter, J. (1971). A probabilistic market model of purchase timing and brand se- Zeithaml, V. A., Rust, R. T., & Lemon, K. N. (2001). The customer pyramid: Creating
lection. Management Science, 18(4-part-ii), 102–113. and serving profitable customers. California Management Review, 43(4), 118–142.
Huang, G. (2015). What are extreme learning machines? Filling the gap between
Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cognitive Computa-
tion, 7(3), 263–278.

Naseer Abbas Grammar 3rd Edition 2025 By Izat Ullah Wazir
No ratings yet
Naseer Abbas Grammar 3rd Edition 2025 By Izat Ullah Wazir
65 pages
Mathworks 11 Workbook 2011th Edition Katherine Borgen pdf download
100% (1)
Mathworks 11 Workbook 2011th Edition Katherine Borgen pdf download
60 pages
Basics of Mechanics 05-07-2024
No ratings yet
Basics of Mechanics 05-07-2024
267 pages
Organic Chemistry Carey 9th Edition Test Bank download
100% (2)
Organic Chemistry Carey 9th Edition Test Bank download
22 pages
Water Conservation in Irrigation
No ratings yet
Water Conservation in Irrigation
6 pages
ascarza_fader_hardie_17_3bc27635-06f1-4cfd-86dc-c8d77645e8d6 (1)
No ratings yet
ascarza_fader_hardie_17_3bc27635-06f1-4cfd-86dc-c8d77645e8d6 (1)
44 pages
TOM1
No ratings yet
TOM1
11 pages
Inductive Deductive and Abductive Approaches of Reasoning
100% (3)
Inductive Deductive and Abductive Approaches of Reasoning
4 pages
8 Costs Production 2022
No ratings yet
8 Costs Production 2022
40 pages
CRM Chapter 17
No ratings yet
CRM Chapter 17
28 pages
reseacch
No ratings yet
reseacch
29 pages
1-s2.0-S0020025519312022-main
No ratings yet
1-s2.0-S0020025519312022-main
19 pages
04 - Chapter 12 - Three Phase Circuits
No ratings yet
04 - Chapter 12 - Three Phase Circuits
25 pages
Yoyoyoy
No ratings yet
Yoyoyoy
35 pages
Customerchurnprediction systema machinelearning
No ratings yet
Customerchurnprediction systema machinelearning
24 pages
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
No ratings yet
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
11 pages
A SLR On Customer Dropout Prediction 44
No ratings yet
A SLR On Customer Dropout Prediction 44
29 pages
Application Guide Twin Safe en
No ratings yet
Application Guide Twin Safe en
351 pages
Forecasting Client Retention A Machine-Learning Approach
No ratings yet
Forecasting Client Retention A Machine-Learning Approach
9 pages
CLASS X CHEMISTRY Solution-988048
No ratings yet
CLASS X CHEMISTRY Solution-988048
9 pages
Lariviere 2005
No ratings yet
Lariviere 2005
13 pages
Carnot Cycle or Carnot Heat Engine: Seminar Presentation On
No ratings yet
Carnot Cycle or Carnot Heat Engine: Seminar Presentation On
16 pages
Churn Analytics
No ratings yet
Churn Analytics
46 pages
The Effect of Dimethyl Ether (D.M.E.) As LPG Substitution On Household Stove: Mixture Stability, Stove Efficiency, Fuel Consumption, and Materials Testing
No ratings yet
The Effect of Dimethyl Ether (D.M.E.) As LPG Substitution On Household Stove: Mixture Stability, Stove Efficiency, Fuel Consumption, and Materials Testing
10 pages
Instant Customer Base Analysis: Managerial Heuristics Often "Get It Right"
100% (1)
Instant Customer Base Analysis: Managerial Heuristics Often "Get It Right"
12 pages
305E Hyd SCH
No ratings yet
305E Hyd SCH
14 pages
Transmission Loss Strategy
No ratings yet
Transmission Loss Strategy
42 pages
Survey and Cross-Benchmark Comparison of Remaining Time Prediction Methods in Business Process Monitoring
No ratings yet
Survey and Cross-Benchmark Comparison of Remaining Time Prediction Methods in Business Process Monitoring
34 pages
Ilon Alon
No ratings yet
Ilon Alon
33 pages
1-s2.0-S0957417410014041-main
No ratings yet
1-s2.0-S0957417410014041-main
6 pages
IJIKMv18p087-105Tran8783
No ratings yet
IJIKMv18p087-105Tran8783
19 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
PFEreport
No ratings yet
PFEreport
43 pages
Best Related Work Sample Ever
No ratings yet
Best Related Work Sample Ever
36 pages
25. Sharma & Soni, 2020, Discernment of Potential Buyers Based on Purchasing Behaviour via Machine Learning Techniques
No ratings yet
25. Sharma & Soni, 2020, Discernment of Potential Buyers Based on Purchasing Behaviour via Machine Learning Techniques
5 pages
Effect of Lanthanum Doping On Bismuth Ferrite (BiFeO3) For Solar Cell Applications
No ratings yet
Effect of Lanthanum Doping On Bismuth Ferrite (BiFeO3) For Solar Cell Applications
8 pages
Sustainability 11 06431 PDF
No ratings yet
Sustainability 11 06431 PDF
18 pages
Covid 19
No ratings yet
Covid 19
7 pages
Project Title
No ratings yet
Project Title
12 pages
1-s2.0-S0019850124000865-main
No ratings yet
1-s2.0-S0019850124000865-main
15 pages
Computers & Industrial Engineering: Anjali Shishodia, Priyanka Verma, Vijaya Dixit T
No ratings yet
Computers & Industrial Engineering: Anjali Shishodia, Priyanka Verma, Vijaya Dixit T
14 pages
2 Customer churning analysis using machine learning algorithms
No ratings yet
2 Customer churning analysis using machine learning algorithms
10 pages
Customer Profitability - Marketing Metrics
No ratings yet
Customer Profitability - Marketing Metrics
62 pages
Structure Profile PDF
No ratings yet
Structure Profile PDF
6 pages
Customer Profiling Segmentation and Sales Predicti
No ratings yet
Customer Profiling Segmentation and Sales Predicti
12 pages
A Review On Machine Learning Methods For Customer Churn Prediction and Recommendations For Business Practitioners
No ratings yet
A Review On Machine Learning Methods For Customer Churn Prediction and Recommendations For Business Practitioners
30 pages
Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company
No ratings yet
Predicting Buying Behavior Using CPT+: A Case Study of An E-Commerce Company
8 pages
A Machine Learning Approach To Identify Potential Customer Based On Purchase Behavior
No ratings yet
A Machine Learning Approach To Identify Potential Customer Based On Purchase Behavior
7 pages
Mini-Project - Churn Analysis .
No ratings yet
Mini-Project - Churn Analysis .
15 pages
GiaoHoThanh - RFM and CLV Paper - V2
No ratings yet
GiaoHoThanh - RFM and CLV Paper - V2
16 pages
10.2478 Amns.2022.1.00016
No ratings yet
10.2478 Amns.2022.1.00016
11 pages
erum (1) (1)
No ratings yet
erum (1) (1)
18 pages
Jtaer 17 00024
No ratings yet
Jtaer 17 00024
18 pages
Code No.: ETIT 303 L T C Paper: Java Programming and Website Design 3 1 4 Instructions To Paper Setters: Maximum MARKS: 75
No ratings yet
Code No.: ETIT 303 L T C Paper: Java Programming and Website Design 3 1 4 Instructions To Paper Setters: Maximum MARKS: 75
11 pages
Customer Clustering Based On Customer Purchasing Sequence Data
No ratings yet
Customer Clustering Based On Customer Purchasing Sequence Data
10 pages
Literature Review
No ratings yet
Literature Review
4 pages
Water: Effects of Irrigation With Saline Water On Crop Growth and Yield in Greenhouse Cultivation
No ratings yet
Water: Effects of Irrigation With Saline Water On Crop Growth and Yield in Greenhouse Cultivation
9 pages
Undersatanding Churn in B2B and Imporance 2025
No ratings yet
Undersatanding Churn in B2B and Imporance 2025
34 pages
Knowing What To Sell When and To Whom PDF
No ratings yet
Knowing What To Sell When and To Whom PDF
8 pages
document
No ratings yet
document
10 pages
E-Commerce Customer Churn Prevention Using Machine Learning-Based
No ratings yet
E-Commerce Customer Churn Prevention Using Machine Learning-Based
8 pages
Computers & Industrial Engineering: Ziqiu Kang, Cagatay Catal, Bedir Tekinerdogan
No ratings yet
Computers & Industrial Engineering: Ziqiu Kang, Cagatay Catal, Bedir Tekinerdogan
11 pages
IJIKMv18p087 105tran8783
No ratings yet
IJIKMv18p087 105tran8783
20 pages
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
No ratings yet
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
18 pages
Algorithms 17 00231
No ratings yet
Algorithms 17 00231
21 pages
131-574-1-PB
No ratings yet
131-574-1-PB
12 pages
PowerCo Problem
No ratings yet
PowerCo Problem
2 pages
btechminorprojectfinal (1)
No ratings yet
btechminorprojectfinal (1)
10 pages
Physics Class XII Chapter 11 Dual Nature of Radiation and Matter Practice Paper 11 2024 Answers
No ratings yet
Physics Class XII Chapter 11 Dual Nature of Radiation and Matter Practice Paper 11 2024 Answers
10 pages
[email protected]
No ratings yet
[email protected]
4 pages
Machine-Learning Techniques For Customer Retention - A Comparative Study
No ratings yet
Machine-Learning Techniques For Customer Retention - A Comparative Study
9 pages
Customer Purchasing Behavior Prediction Using Machine Learning Classification Techniques
No ratings yet
Customer Purchasing Behavior Prediction Using Machine Learning Classification Techniques
26 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
No ratings yet
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
7 pages
ReSci - Retention Marketing & Predictive Analytics
No ratings yet
ReSci - Retention Marketing & Predictive Analytics
27 pages
Rahman 2020
No ratings yet
Rahman 2020
6 pages
Efficacy of Customer Churn Prediction System
No ratings yet
Efficacy of Customer Churn Prediction System
8 pages
1 PB
No ratings yet
1 PB
6 pages
Developing Resilient Supply Chains: Lessons From High-Reliability Organisations
No ratings yet
Developing Resilient Supply Chains: Lessons From High-Reliability Organisations
46 pages
Machine Learning Based Customer Churn Prediction in Banking: November 2020
No ratings yet
Machine Learning Based Customer Churn Prediction in Banking: November 2020
7 pages
Churn Rate
No ratings yet
Churn Rate
8 pages
Fresher's Interview Questions - Quantitative Aptitude Completely Solved
No ratings yet
Fresher's Interview Questions - Quantitative Aptitude Completely Solved
0 pages
Customer Life Cycle: Predictive Customer Analytics - Part I
No ratings yet
Customer Life Cycle: Predictive Customer Analytics - Part I
5 pages
Marketing Analytics and Metrics Assignment: Customer Profitability & FMCG Case
No ratings yet
Marketing Analytics and Metrics Assignment: Customer Profitability & FMCG Case
8 pages
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
7 pages
Audio Bridge Numbers
No ratings yet
Audio Bridge Numbers
6 pages
STEL - Renewal ESET Price Offer For Pakiza Dyeing & Printing Ind. Ltd. (100 Users, 2 Yrs & 3 Yrs) - 11.04.2023-001
No ratings yet
STEL - Renewal ESET Price Offer For Pakiza Dyeing & Printing Ind. Ltd. (100 Users, 2 Yrs & 3 Yrs) - 11.04.2023-001
2 pages
KT 65 en
No ratings yet
KT 65 en
1 page
15
No ratings yet
15
6 pages
2017 CustomerChurn
No ratings yet
2017 CustomerChurn
6 pages
TEST-1, Set-A Lovely Professional University M. M-30 Sub. Code: MEC-107 Time-50minutes
No ratings yet
TEST-1, Set-A Lovely Professional University M. M-30 Sub. Code: MEC-107 Time-50minutes
3 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Robotic Systems: Flexible Parts Management - Fully Integrated
No ratings yet
Robotic Systems: Flexible Parts Management - Fully Integrated
20 pages
Predicting Customer Using SVM
100% (1)
Predicting Customer Using SVM
24 pages
Time Series Project
50% (4)
Time Series Project
2 pages
4 Ways of Creating and Ingesting MMS-CLO2 in A Nutshell, Ver. 2.0 (Based On 20 Drops Per ML) 30 Aug 2020
100% (3)
4 Ways of Creating and Ingesting MMS-CLO2 in A Nutshell, Ver. 2.0 (Based On 20 Drops Per ML) 30 Aug 2020
2 pages
Six Sigma Study Guide
No ratings yet
Six Sigma Study Guide
7 pages
Boiler Efficiency at Part Load Conditions
100% (1)
Boiler Efficiency at Part Load Conditions
4 pages
Business Analytics and Big Data
From Everand
Business Analytics and Big Data
Sachin Naha
No ratings yet

Machine Learning Framework For Customer Purchase Prediction

Uploaded by

Machine Learning Framework For Customer Purchase Prediction

Uploaded by

European Journal of Operational Research 281 (2020) 588–596

Contents lists available at ScienceDirect

European Journal of Operational Research

A machine learning framework for customer purchase prediction in

• Median values of purchase.

• Moving averages. xk,τ [19] med{vk,t |t ∈ [τ − 1, τ − T ]} ,

• Maximum values of purchase. where i min{5, Nk,τ − 1}.

3.4. Gradient tree boosting Table 4.1

4.1. Comparison of the machine learning algorithms

5. Discussion and conclusion

Predicting future customer behavior provides many beneﬁts for

You might also like