CLV Report
CLV Report
Submitted by
SAMIKSHA.V.T 142220205075
SUBHALAKSHMI.V 142220205092
SUPRAJA. G 142220205095
SUWAASHA.M 142220205097
of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
SRM NAGAR,
KATTANKULATHUR,
CHENGALPATTU
MARCH 2024
ANNA UNIVERSITY::CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “ ” is the bonafide work of “Samiksha V.T,
Subhalakshmi.V, Supraja.G, Suwaasha.M” who carried out the project work
under my supervision.
First and foremost, we would like to extend our heartfelt respect to the
Management, Director Dr. B. Chidhambara Rajan, M.E., Ph.D., Principal
Dr. M. Murugan, M.E., Ph.D., and Vice Principal Dr. Visalakshi Selvaraj
M.E., Ph.D., who helped us in our endeavors.
Finally, the constant support from our lovable Parents and Friends is
untold and immeasurable.
ABSTRACT
The Customer Lifetime Value (CLV) project integrates the Random Forest algorithm in
machine learning to accurately predict and optimize the long-term value of customers,
utilizing historical data encompassing purchase history, customer demographics, and
interaction patterns. The inclusion of a Customer Segmentation module enhances the
project's capabilities by categorizing customers based on their behaviors and
characteristics. This segmentation provides valuable insights into distinct customer
groups, enabling personalized strategies to maximize CLV. The Random Forest
algorithm, known for its adept handling of complex relationships, ensures precise CLV
predictions and identifies influential factors affecting customer value. Additionally, the
project offers a scalable and interpretable solution, empowering businesses with data-
driven decision-making. The combination of CLV prediction and customer
segmentation enhances the strategic approach to targeted marketing, allowing for
tailored campaigns and optimized resource allocation.
KEYWORDS:
Customer Lifetime Value, CLV, machine learning, Random Forest algorithm, customer
segmentation, personalized strategies, historical data, accuracy, targeted marketing.
iii
TABLE OF CONTENTS
LIST OF TABLE vi
LIST OF viii
ABBREVATIONS
1. INRODUCTION
1.1 OVERVIEW
1.2 TECHNOLOGY
1.2.1MACHINE
LEARNING
1.2.2DATA
COLLECTION AND
MANAGEMENT
TABLE PAGE
TABLE NAME
NO. NO.
FIGURE NAME
RF Random Forest
CS Customer Segmentation
K-Means K-Means(Clustering )
ML Machine Learning
DT Decision Tree
CHAPTER 1
INTRODUCTION
1. INTRODUCTION
The Random Forest algorithm, chosen for its adept handling of complex
relationships, ensures precise CLV predictions and identifies key factors
influencing customer value. This project goes beyond mere prediction, providing
practical insights for informed strategic decision-making.
This project aims to utilize the Random Forest algorithm in machine learning to
revolutionize how businesses manage customer relationships. Through a
meticulous analysis of historical data, including purchase history, customer
demographics, and interactions, the goal is to develop a robust Customer Lifetime
Value (CLV) model.
Precision: The Random Forest algorithm ensures accurate CLV predictions and
identifies key factors influencing customer value.
The Customer Lifetime Value (CLV) project can typically be divided into several
phases, including:
Model Development:
- Selecting and implementing the Random Forest algorithm for CLV prediction.
- Integrating a Customer Segmentation module to enhance the precision of
customer categorization.
K-Means:
- K-Means is applied in the Customer Segmentation module of the CLV project to
categorize customers based on their behavior and characteristics. It partitions
customers into clusters (k) to facilitate personalized strategies for different segments.
Data collection and management are crucial aspects of the Customer Lifetime
Value (CLV) evaluation and segmentation project.
Data Collection:
- Gather data from various sources, including transaction records, customer
interactions, demographics, and any relevant touchpoints. This comprehensive
dataset provides a holistic view of customer behavior.
Previous Data:
- Collect a sufficient historical dataset to capture trends and patterns over time.
The historical data should span a period long enough to encompass various
customer behaviors and interactions.
Data Intergrity:
- Ensure data quality by addressing issues such as missing values, outliers, and
inaccuracies. Clean and preprocess the data to create a reliable foundation for
accurate CLV predictions and effective customer segmentation.
Data Privacy and Compliance:
- Adhere to data privacy regulations and ensure compliance with relevant laws.
Implement anonymization and encryption measures to protect sensitive customer
information.
Customer Identifiers:
- Use unique customer identifiers to track individual customer journeys across
different touchpoints. This ensures accurate linkage of data and the creation of a
cohesive customer profile.
Feature Selection:
- Identify and select relevant features (variables) that contribute to CLV prediction
and customer segmentation. Consider factors such as purchase frequency, recency,
monetary value, and customer demographics.
Data Storage:
- Utilize secure and scalable data storage solutions, such as relational databases or
SQL databases. Cloud-based storage can provide flexibility and accessibility for data
processing.
1.3 SYSTEM ANALYSIS
Data Collection and Integration: The first step in building a CLV system is to
collect and integrate relevant data from various sources. This may include
transactional data (purchase history, order frequency, order value), demographic data
(age, gender, location), behavioral data (website visits, click-through rates, time spent
on site), and any other relevant customer interactions. This data is then cleaned,
transformed, and integrated into a unified dataset for analysis.
Feature Extraction: Once the data is collected, feature engineering is performed to
extract meaningful features that can be used to predict CLV. This may involve
creating variables such as recency (time since last purchase), frequency (number of
purchases over a certain period), monetary value (average order value), customer
tenure (time since first purchase), and various other customer attributes.
Model Development:
Model Validation and Evaluation: The developed models are then validated and
evaluated using appropriate metrics such as Mean Absolute Error (MAE), Root Mean
Squared Error (RMSE), or Area Under the Curve (AUC) for classification tasks. This
ensures that the models are accurately capturing the underlying patterns in the data
and can generalize well to unseen data.
Deployment and Integration: Once the models are validated, they are deployed into
the production environment and integrated into the company's existing systems. This
could involve developing APIs for real-time predictions or batch processing pipelines
for periodic updates of CLV estimates.
Business Insights and Decision Making: Finally, the CLV estimates generated by
the system are used to inform various business decisions such as customer acquisition
strategies, retention campaigns, pricing optimization, and resource allocation. By
understanding the lifetime value of different customer segments, the company can
allocate resources more effectively and maximize long-term profitability.
Privacy and Data Security Concerns: E-commerce CLV systems often collect
and analyze large amounts of customer data, including personal information and
purchase history. This raises concerns about privacy and data security, especially in
light of increasing regulatory scrutiny and consumer expectations regarding data
protection. Businesses must implement robust data security measures and comply
with relevant regulations such as GDPR and CCPA to protect customer privacy.
The proposed system utilizes machine learning capabilities to develop and deploy a
Random Forest model for calculating Customer Lifetime Value (CLV). It begins by
collecting and preparing historical customer transaction data, ensuring it is
formatted appropriately for analysis. Through data exploration and feature
engineering, relevant features such as recency, frequency, and monetary value are
extracted, alongside additional metrics like customer tenure and RFM scores.
Using the prepared dataset, a Random Forest regression model is trained and fine-
tuned using machine learning tools. This model is then evaluated for performance,
typically using metrics like Root Mean Squared Error (RMSE) or Mean Absolute
Error (MAE). With the trained model, predictions are made to estimate future
customer spend or revenue, thereby enabling the calculation of CLV estimates for
individual customers or segments.
Following model development, the CLV prediction model is integrated into existing
systems and workflows using deployment capabilities. It can be deployed as a
service, API, or batch processing pipeline for generating real-time or periodic CLV
estimates. Continuous monitoring and optimization of the model's performance are
conducted to ensure accuracy and relevance over time. This involves refining the
model based on new data and insights, as well as optimizing hyper parameters and
feature selection using machine learning tools.
Ultimately, the CLV estimates generated by the model provide valuable insights for
strategic decision-making in areas such as customer acquisition, retention, and
marketing strategies. analytics and visualization capabilities facilitate the
communication of these insights to stakeholders, empowering businesses to make
data-driven decisions and optimize their customer strategies effectively.
ADVANTAGES
Advanced Analytics Capabilities: provides advanced analytics and machine
learning tools that enable sophisticated data analysis and model development.
Leveraging these capabilities allows for the creation of accurate and robust CLV
prediction models.
Scalability: The proposed system can scale to handle large volumes of customer
transaction data, making it suitable for businesses of all sizes. scalability ensures
that the CLV prediction model can accommodate growing data requirements and
evolving business needs.
Actionable Insights: The CLV estimates generated by the model provide valuable
insights for strategic decision-making, such as customer acquisition, retention, and
marketing strategies. 's analytics and visualization capabilities facilitate the
communication of these insights to stakeholders, empowering businesses to make
informed decisions.
SOFTWARE REQUIREMENTS
• Operating System: Most modern operating systems will suffice, such as
Windows 10/11, macOS, or Linux.
• Python Programming Language
• Visual studio
HARDWARE REQUIREMENTS
Processor : Intel / Ryzen
RAM : 8
Internet Connection
GPU (optional): For faster neural network training (e.g., NVIDIA GPU)
Server: If you plan to deploy a backend for your application, you might need a
server to host it.
CHAPTER 2
LITERATURE REVIEW
.
Modelling Customer lifetime Value by Sunil Gupta et al.
This paper provides reviews about several CLV models which are useful for
market segmentation and the allocation of marketing resources for acquisition,
retention and cross-selling. There are two categories of models found. One category
of models consists of those that attempt to find the impact of marketing programs on
customer acquisition, retention and/or expansion (or cross-selling). The other
category of models deals with the relationship between various components of CLV.
RFM model gives scores to the customers based on their recency, frequency and
monetary value of their purchases. The customers are then segmented based on the
scores allotted and are targeted with different marketing strategies. Probability
models assume that the consumer’s behaviour varies across the population based on
some probability distribution. Econometric models combine customer acquisition,
retention and expansion by modelling them. Persistence models are used to study
the impact of advertising, discounting and product quality on consumer equity and
to examine differences in CLV resulting from different customer acquisition
methods. Many computer science models are being used like generalised additive
models (GAM), multivariate adaptive regression lines (MARS), classification and
regression trees (CART) and support vector machines (SVM) for their predictive
ability. Diffusion/Growth models can be used to forecast the acquisition of future
customers. There are scopes of future work which are suggested by the authors like
working on data which suggests customers’ attitudes and their share of wallet, can
focus on the portfolio of a customer rather than focusing on one customer’s CLV,
estimation of costs per customer. Understanding the limits CLV possesses,
understanding the limits of theory-based models. Customer Lifetime Value
Measurement by Sharad Borle et al 23 Sharad Borle et al. used a hierarchical bayes
approach to model a customer’s lifetime value. They have modeled the purchase
timing, purchase amount and risk of defection from the firm for each customer.
They have taken the data from a membership-based direct marketing company
where the number of times each customer joined and terminated the membership are
known. This model is then compared with other models on a separate dataset. The
other models include extended NBD-Pareto model, RFM model two models nested
in their model, a heuristic model that takes the average customer lifetime, the
average interpurchase time, and the average dollar purchase amount and uses them
to predict the present value of future customer revenues at each purchase occasion.
They have proved through the results that the hierarchical bayes approach model
performs better than the other models in predicting CLV and in targeting valuable
customers. Customer-Base Analysis with Discrete-Time Transaction Data by Fader,
Hardie and Berger Fader et al. proposed a model to predict future purchase patterns
of customers in discrete-time, which means transactions occur at some intervals.
The model proposed was “beta-geometric/beta-binomial” (BG/BB) which act as a
discrete time analog. Any customer purchase history in discrete time can be
represented as a binary string where 1 represents a purchase and 0 represents not a
purchase. Given this string, the model tries to find a probability that the customer is
still and what are the expectations of future purchasing. They have applied this
model to a dataset of cruise-line transactions for 6094 customers over a period of
five years. They have observed that the customers who took a cruise in 1997 have
the same probability of being alive in 1998. Customers who have taken a cruise in
each of the last 4 years will have a higher probability than others. Similarly, there
have been estimations made about all the categories of users based on their recency
and frequency. 24 Customer Lifetime Value: Marketing models and applications by
Paul D. Berger and Nada I. Nasr Berger and Nasr have presented a series of
mathematical models of customer lifetime value and managerial applications of
these models. The mathematical models are presented in 5 different cases of
different assumptions of the following: (1) The number of times sales take place in a
year. (2) spending to retain customers and the change in customer retention rate. (3)
difference in revenues received per customer. These models are useful to decide
how much a company should spend on promotional campaigns and to check
different profitability among different market segments. The CLV determination can
help us find out the effect of a marketing strategy with its acquisition and retention
rates. These models help to determine in various situations like the effect of price
skimming strategy on acquisition rate. They can also help to decide how much to
spend on the acquisition and how much to spend on retention of customers. A model
to determine customer lifetime value in retail banking context by Haenlein, M et al.
Haenlein, M et al. have proposed a model to find out CLV of retail banking
customers which is based on a combination of first-order Markov Chain model and
CART (classification and regression tree) analysis. They used profitability driver’s
age, demographics or lifestyle, type and intensity of product ownership and activity
level as independent variables, and carried out age-dependent CART analyses to
split customers of similar age into same sub-groups to find out the target variable
contribution margin. These sub-groups were then used as states, among which
customers are allowed to flow, of the first-order Markov model. The transition
probabilities were estimated by transition frequencies. The CLV for each customer
was determined as the discounted sum of state-dependent contribution margins
which were weighted by their corresponding transition probabilities.