Customer Segmentation
Customer Segmentation
[email protected]
[email protected]
[email protected]
Abstract
This research paper examines the impact of customer segmentation on purchasing intentions in online
shoppers. Using the K means method and the Online Shoppers Purchasing Intention Dataset, which
includes 12,330 sessions from a popular e-commerce website, the study analyzes customer behavior.
The findings highlight notable sales surges in August, varying preferences across countries, and the
significance of customer engagement and transactional activity. Through RFM and K means
segmentations, exceptional customers with high engagement and value are identified, emphasizing the
importance of customer retention strategies. Moreover, opportunities to re-engage less active customers
are uncovered. The research provides valuable insights for businesses to optimize marketing efforts,
enhance customer satisfaction and loyalty, and maximize revenue potential in the online shopping.
Keywords- customer segmentation; online shopping; data mining; machine learning; k-Means
algorithm; rfm analysis
1 Introduction
Over time, the growing competition among businesses and the availability of extensive historical data
have resulted in the widespread adoption of data mining techniques [1]. These methods are utilized to
uncover valuable and strategic insights hidden within organizational data, which can then be presented
in a user-friendly manner for decision support. Data mining incorporates concepts from various
disciplines such as statistics, artificial intelligence, machine learning, and data systems. Its applications
are diverse, ranging from bioinformatics and weather forecasting to fraud detection, financial analysis,
and customer segmentation. With a multitude of competitors and entrepreneurs in the market,
businesses face significant competition in both acquiring new customers and retaining existing ones.
As a result, providing exceptional customer service has become crucial, regardless of a company's size
[2].
The main objective of this paper is to utilize data mining techniques in order to identify customer
segments within a commercial enterprise. Customer segmentation involves categorizing the customer
base of a business into distinct segments, with each segment consisting of customers who share similar
market characteristics [3]. These characteristics are determined by a variety of factors that directly or
indirectly influence the market or business, such as customer preferences, expectations, geographical
location, and purchasing behavior.
The importance of customer segmentation lies in its capacity to allow businesses to customize
marketing strategies that are appropriate for each customer segment [4]. It aids in decision-making in
complex situations, such as credit relationships with customers. Furthermore, it assists in identifying
product associations, managing supply and demand, and uncovering connections and interactions
between consumers, products, or customers and products that may otherwise be overlooked by the
business. Customer segmentation also facilitates the prediction of customer churn, the identification of
customers likely to face problems, and the generation of other market research inquiries that lead to
finding solutions.
Data mining has proven to be effective in uncovering subtle patterns or relationships that are buried
within integrated databases. This learning approach falls under the category of supervised learning.
Data mining incorporates various integration algorithms, such as the K-means algorithm, K-nearest
algorithm, and self-organizing maps (SOM) [5]. These algorithms have the capability to discover
groups within the data without prior knowledge, iteratively comparing input patterns until achieving
satisfactory training examples based on the specific subject matter or process. Each group comprises
data points that share close similarities within the group but exhibit significant differences from data
points in other groups. Integration techniques find extensive applications in fields like pattern
recognition, image analysis, bioinformatics, and other related domains.
This study employed the k-means clustering algorithm to conduct customer segmentation. The
implementation of the K-means algorithm utilized the Sklearn library, and training was performed
using a standardized silhouette score. The training dataset consisted of clusters derived from the retail
industry [6]. After several iterations, four stable intervals or customer segments were identified. The
classification of customers into these segments was based on two primary factors: the monthly number
of items purchased by a customer and the average number of customers per month. The resulting
customer categories were labeled as cluster 1, cluster 2, cluster 3, and cluster 4.
The paper is structured into several sections that provide a comprehensive analysis of the topic. In
Section 1, an introduction to the research paper is presented. Section 2 offers an in-depth examination
of existing studies and literature, covering key theories, concepts, and findings in the field, while also
identifying knowledge gaps. Section 3 outlines the methodology, research design, data collection
methods, and variables and measures used to ensure transparency and replicability. In Section 4, the
Experimental Setup and Results section, the specific experimental conditions and sample selection
process are detailed, along with the findings obtained from the research. Lastly, in Section 5, the
Conclusion summarizes the main findings and their implications, highlighting the study's contributions
and potential directions for future research.
2 Literature Review
Various studies have provided evidence supporting the influence of consumers' perceptions and
attitudes towards social media advertising on their behavior, including purchase intention [6]. These
perceptions have also been found to be positively associated with advertisement recall and awareness.
Social media platforms have emerged as significant sources of information for consumers throughout
their entire purchase journey, from information gathering to alternative comparison and post-purchase
feedback.
In a study by Luna-Nevarez and Torres [7], it was revealed that social media users with positive
attitudes towards social network advertising exhibit higher intentions to engage in electronic word-of-
mouth, visit the company's website, and make purchases through the website. Similarly, Chu et al. [8]
investigated the beliefs, attitudes, and behavioral responses of young social media users towards social
media advertising, finding that positive attitudes have a favorable impact on behavior and purchase
intention, particularly for luxury products.
Specific aspects of social media advertising and their effects on consumer behavior have also been
explored. For instance, De Keyzer et al. [9] emphasized that the perceived personalization of Facebook
advertisements has a more pronounced positive effect on click intention for participants with positive
attitudes towards Facebook. Wen et al. [10] discovered that strong ties endorsing hedonic products on
social networking sites (SNSs) contribute to stronger purchase intentions, while utilitarian products are
more influenced by high-expert endorsers on SNSs.
The significance of social media as an information source in purchasing decisions was highlighted by
Powers et al. [11], revealing that a significant percentage of consumers consider social media essential
when making purchasing decisions. Lee and Hong [12] demonstrated the positive influence of
informativeness on consumers' intention to purchase products presented in social media ads. Hamouda
[13] emphasized that informativeness, credibility, and entertainment act as antecedents to advertising
value, subsequently shaping attitudes towards social media advertising. Additionally, they argued that
consumers with positive attitudes towards social media advertising are more likely to exhibit favorable
behavioral responses.
Perceived relevance and personalization emerged as crucial factors influencing purchase intention, as
revealed by Alalwan [14]. Alalwan's study identifies and examines the main factors related to social
media advertising that can predict purchase intention, including performance expectancy, hedonic
motivation, interactivity, informativeness, and perceived relevance. Mishra [15] found that users'
creation of brand-related content on social media also impacts purchase intention. Jung et al. [16]
demonstrated that attitudes towards advertisements mediate the effects of social network advertising
characteristics on behavioral intention, with peer influence exerting the strongest effects. The perceived
value of advertisements, encompassing informativeness, entertainment, and credibility, positively
affects consumers' purchase intention.
Duffett [17] discovered that Facebook advertising has a positive attitudinal impact on Millennials'
intention to purchase and actual purchase behavior. Boateng and Okoe [6] posited that individuals with
positive attitudes towards social media advertising are more likely to express the intention to purchase
products advertised on social media, moderated by corporate reputation. Pan et al. [18] suggested that
individual differences in cognitive need, processing fluency, and expertise influence attitudes towards
social network advertising, subsequently impacting intentions to engage in electronic word-of-mouth
and purchase behavior. Wiese and Akareem [19] found a positive relationship between users' attitudes
towards Facebook and their behavior towards brands, including visiting advertised websites, becoming
fans, and making purchases.
The existing body of literature on social media advertising has provided ample evidence supporting the
influence of consumers' perceptions and attitudes on their behavior, including purchase intention [6].
However, a research gap can be identified regarding the application of customer segmentation using k-
means clustering in the context of social media advertising. While previous studies have explored
various aspects of consumer responses to social media advertising, there is limited research specifically
examining customer segmentation using clustering techniques. This research gap presents an
opportunity to delve deeper into understanding how different consumer segments can be identified and
targeted based on their attitudes, preferences, and behaviors towards social media advertising. By
employing the k-means clustering approach, this study aims to address this gap by segmenting
consumers into distinct groups, enabling a more granular understanding of their preferences and
behaviors. This segmentation approach will allow marketers to tailor their advertising strategies and
messages to specific customer segments, enhancing their targeting efforts. Furthermore, by identifying
the characteristics and profiles of each segment, this research can provide insights into the distinct
needs, motivations, and purchase intentions of different consumer groups, aiding in the design of more
effective social media advertising campaigns. By filling this research gap, this study contributes to the
field of social media advertising by offering a data-driven and systematic approach to customer
segmentation, providing valuable insights for marketers to optimize their strategies and improve
campaign effectiveness while highlighting the importance of tailored approaches for specific customer
segments.
3 Methodology
In this section, we will outline the step-by-step approach given in fig. 1 we took in conducting our
research and collecting data for this study.
Fig. 1. Methodology
In the data preparation stage, several key steps were undertaken to ensure the dataset's quality and
consistency. First, the data was loaded, and a thorough cleaning process was initiated. This involved
identifying and removing any rows containing missing values (NaN) to ensure data integrity.
Additionally, efforts were made to address inconsistencies in the data descriptions, such as variations in
formatting. By standardizing the description format, the dataset became more streamlined and easier to
analyze.
Next, non-item entries were identified and removed from the dataset. These included descriptions that
solely represented fees or other non-relevant information. By eliminating these entries, the focus was
narrowed down to the core items of interest. Furthermore, data inconsistencies were addressed by
verifying if each stock code was consistently linked to one specific description. This helped to identify
any discrepancies or errors in the dataset and ensure the accuracy of the subsequent analysis.
Finally, outlier detection and treatment were performed. Outliers, which are extreme or unusual data
points, were identified and examined to determine their validity. Based on the nature of the data and the
specific analysis objectives, appropriate actions were taken, such as adjusting or removing outliers, to
prevent their undue influence on the final results. By addressing data outliers, the dataset's overall
quality and reliability were improved, setting the stage for robust and accurate analysis.
3.3 Clustering
Clustering, a fundamental technique in data analytics, plays a vital role in uncovering inherent patterns
and structures within complex datasets. It involves the unsupervised grouping of data points based on
their intrinsic similarities, enabling researchers to gain valuable insights and make informed decisions.
By employing various clustering algorithms and methodologies, researchers can effectively segment
data into meaningful clusters, where objects within the same cluster exhibit high similarity while being
distinct from those in other clusters. These clusters provide a compact representation of the dataset,
facilitating data summarization and visualization. Furthermore, clustering techniques contribute to a
wide range of applications, including customer segmentation, anomaly detection, image recognition,
and social network analysis. As the field of data analytics continues to evolve, advancements in
clustering algorithms and their integration with other analytical techniques offer exciting opportunities
for improved knowledge discovery and decision-making.
K-means clustering may be a helpful technique when conducting research to find trends in massive
datasets and divide data into informative subgroups. Choosing the right number of clusters, starting the
method with random centroids, allocating data points to the closest centroid, updating the centroids,
and repeating the procedure until convergence are the fundamental processes for employing k-means
clustering in research. Domain expertise, data visualization, and statistical techniques should all be
taken into consideration when deciding how many clusters to use.
4 Experimental Setup and Results
We created and deployed the following experimental setup to conduct our research and provide
insights:
4.1 Dataset
The Online Shoppers Purchasing Intention Dataset is a dataset that contains information about the
online browsing and purchasing behavior of visitors on an e-commerce website. This dataset is
available on the UCI Machine Learning Repository and includes 12,330 sessions, with each session
corresponding to a unique visitor to the website.[21]
For customer segmentation, we employed two approaches: RFM and KMeans. Through the RFM
approach, we categorized divided all clients into ten different groups depending on their recency,
frequency, and monetary value. In contrast, the KMeans approach resulted in four customer groups,
characterized by similarities in their recency, frequency, and monetary value. Unlike the RFM
approach, the KMeans segmentation provides a more flexible categorization method that is not based
on predefined grading.
RFM Analysis
According RFM Analysis Model we should to prepare the segmentions for the each metrics below.
Table 1. Customer segmentation using RFM analysis
As shown in the fig 3., the various segments depict the following information about the segments:
24% of our customers are in the hibernation segment and that for customers who have purchased from
us a few times and the last time was a long time.
18% of our customers are loyal and usually buys from us.
15% of our customers are champions so we are their favourite market.
14.% of our customers are at risk segment so we must attract them again.
11% of our customers have good potential that refers to customers who have bought from us recently
and have not buy many times.
8% of our customers are about to sleep and this segment of customers who last purchase from us for a
fairly long time.
4% of our customers are needing more attention to make them fall under champions segment.
2% of our customers are new customers and must be retained.
2% of our customers look promising.
2% of our customers can`t lost them and they purchased from us many times but last time was for a
long time.
According to the data, 15% of customers fall into the "Champions" category. These customers are
responsible for generating a significant portion of the company's revenue, making it essential to focus
on their experience. To ensure their continued loyalty, several strategies can be implemented.
Offer exclusive products or discounts to the Champions segment to make them feel valued and
appreciated.
Treat Champions as brand ambassadors and incentivize them to bring in new customers by giving them
a margin of profits. Seek feedback from Champions to gain insight into their preferences and improve
their experience.
K-Means
Based on the KMeans clustering analysis as shown in fig. 4, the customers can be grouped into four
distinct clusters based on their behaviour:
Cluster 0: "Punctual customers" - This cluster consists of customers who purchase items regularly and
punctually on the website.
Cluster 1: "Hibernating customers" - Customers in this cluster have the lowest purchase frequency,
haven't made a purchase recently, and spend the least amount of money.
Cluster 2: "Exceptional customers" - These are the high-value customers who make purchases
frequently, recently, and spend the most money. It is essential to retain such customers.
Cluster 3: "Recent customers" - This cluster comprises customers who have made a purchase relatively
recently, and it is crucial to keep them engaged. The total distortion score obtained by using the
recency, frequency, and monetary parameters is 4129.
Fig. 4. Customers Distribution of Clusters
For customer segmentation, we employed two approaches: RFM and KMeans. Through the RFM
approach, we categorized divided all clients into ten different groups depending on their recency,
frequency, and monetary value. In contrast, the KMeans approach resulted in four customer groups,
characterized by similarities in their recency, frequency, and monetary value. Unlike the RFM
approach, the KMeans segmentation provides a more flexible categorization method that is not based
on predefined grading.
Based on these segmentations, we were able to identify a few exceptional customers who exhibit high
levels of engagement and value. These customers could be targeted for customer retention strategies,
such as offering exclusive discounts or ambassador programs to strengthen their loyalty and maximize
their lifetime value to the business. Additionally, our analysis revealed that approximately a quarter of
the customer base is less active and tends to spend less. This finding highlights the opportunity for
targeted marketing efforts aimed at re-engaging and increasing the value of these less active customers.
Based on these segmentations, we were able to identify a few exceptional customers who exhibit high
levels of engagement and value. These customers demonstrate a strong commitment to the brand,
frequently making purchases with high monetary value and maintaining a recent transactional history.
Recognizing their significance, it is crucial for businesses to implement strategies to retain and nurture
these valuable customers. One approach could involve offering them personalized discounts or
exclusive offers as a token of appreciation for their loyalty. Additionally, considering their active
involvement, these customers can be identified as potential brand ambassadors, leveraging their
positive experiences to generate word-of-mouth marketing and attract new customers.
5 Conclusion
Based on our comprehensive analysis of the sales data, several key observations can be made. Firstly,
there is a notable sales upsurge in August, driven primarily by the UK market, which witnesses a
notable growth in the quantity of new customers during this month. However, other countries,
including France, Germany, Spain, Australia, and the Netherlands, also exhibit substantial sales
activity, indicating a diverse customer base and market potential across multiple regions.
In terms of average cart size, certain countries stand out with larger average transactions, such as
Australia, Japan, Sweden, and the Netherlands. The analysis also reveals that the best-selling item
overall is the white hanging heart t-light holder, indicating its popularity among customers. However, it
is worth noting that the best-selling items can vary across countries, reflecting regional preferences and
market dynamics.
Furthermore, our analysis demonstrates that, on average, customers visit the website approximately ten
times before making a purchase. Interestingly, customers from Ireland appear to be more active,
suggesting a higher level of engagement and potentially greater customer loyalty. Additionally, we
found that the peak time for purchasing items tends to be around noon, indicating a specific moment of
high customer engagement and transactional activity.
Furthermore, our analysis revealed that approximately a quarter of the customer base is less active and
tends to spend less. While these customers may not contribute significantly to the overall revenue, it is
important not to overlook their potential. Targeted marketing efforts can be employed to re-engage and
reignite their interest in the brand. This may involve sending personalized emails with tailored
promotions or offering incentives to encourage their return. By focusing on this segment, businesses
have an opportunity to tap into an untapped customer pool and potentially convert them into more
active and higher-spending customers.
Businesses can specifically target their marketing strategies to address the distinct demands and
behaviours of various client groups by utilising the insights gathered from both RFM and KMeans
segmentations. This targeted approach enables companies to allocate resources effectively, optimize
their marketing campaigns, and ultimately enhance customer satisfaction and loyalty. By understanding
the characteristics and preferences of exceptional customers and addressing the needs of less active
customers, businesses can create a more holistic customer experience and maximize their overall
revenue potential.
References
1. Blanchard, Tommy. Bhatnagar, Pranshu. Behera, Trash. (2019). "Marketing Analytics Scientific
Data: Achieve your marketing objectives with Python's data analytics capabilities." S.l: Packt printing
is limited.
2. Griva, A., Bardaki, C., Pramatari, K., Papakiriakopoulos, D. (2018). "Sales business analysis:
Customer categories use market basket data." Systems Expert Systems, 100, 1-16.
3. Puwanenthiren Premkanth, (2012). "Market Classification and Its Impact on Customer Satisfaction
and Special Reference to the Commercial Bank of Ceylon PLC." Global Journal of Management and
Business Publisher Research: Global Magazenals Inc. (USA). Print ISSN: 0975-5853. Volume 12 Issue
1.
4. Sulekha Goyat. (2011). "The basis of market segmentation: a critical review of the literature."
European Journal of Business and Management www.iiste.org. ISSN 2222-1905 (Paper) ISSN 2222-
2839 (Online). Vol 3, No.9, 2011.
5. Puwanenthiren Premkanth, (2012). "Market Classification and Its Impact on Customer Satisfaction
and Special Reference to the Commercial Bank of Ceylon PLC." Global Journal of Management and
Business Publisher Research: Global Magazenals Inc. (USA). Print ISSN: 0975-5853. Volume 12 Issue
1.
6. Boateng, H., Okoe, A.F., 2015. Consumers’ attitude towards social media advertising and their
behavioral response. Journal of Research in Interactive Marketing. 9(4), 299–312.
7. Luna-Nevarez, C., Torres, I.M., 2015. Consumer attitudes toward social network advertising. Journal
of Current Issues & Research in Advertising. 36(1), 1–19.
8. Chu, S.C., Kamal, S., Kim, Y., 2013. Understanding consumers’ responses toward social media
advertising and purchase intention toward luxury products. Journal of Global Fashion Marketing. 4(3),
158–174.
9. De Keyzer, F., Dens, N., Pelsmacker, D.P., 2015. Is this for me? How consumers respond to
personalized advertising on social network sites. Journal of Interactive Advertising. 15(2), 124–134.
10. Wen, C., Tan, B.C.Y., Chang, K.T., 2009. Advertising effectiveness on social network sites: An
investigation of tie strength, endorser expertise and product type.
11. Powers, T., Advicula, D., Austin, M.S., Graiko, S., Snyder, J., 2012. Digital and social media in
purchase decision process: A special report from the advertising research foundation. Journal of
Advertising Research. 52(4), 479–489.
12. Lee, J., Hong, I.B., 2016. Predicting positive user responses to social media advertising: The roles
of emotional appeal, informativeness, and creativity. International Journal of Information Management.
36, 360–373.
13. Hamouda, M., 2018. Investigating the impact of social media advertising features on customer
purchase intention. International Journal of Information Management. 42, 65–77.
14. Alalwan, A.A., 2018. Investigating the impact of social media advertising features on customer
purchase intention. International Journal of Information Management. 42, 65–77
15. Mishra, A.S., 2019. Antecedents of consumers’ engagement with brand-related content on social
media. Marketing Intelligence & Planning. 37(4), 386–400.
16. Jung, J., Shim, S.W., Jin, H.S., Khang, H., 2016. Factors affecting attitudes and behavioral
intention towards social networking advertising: A case of Facebook users in South Korea.
International Journal of Advertising. 35(2), 248–265.
17. Duffett, R.G., 2015. Facebook advertising’s influence on intention-to-purchase and purchase
amongst millennials. Internet Research. 25(4), 498–526.
18. Pan, Y., Torres, I.M., Zúniga, M.A., Fazli-Salehi, R., 2020. Social Network Advertising: The
Moderating Role of Processing Fluency, Need for Cognition, Expertise, and Gender. Journal of Internet
Commerce. 19(3), 298–323.
19. Wiese, M., Akareem, H.S., 2020. Determining perceptions, attitudes and behavior towards social
network site advertising in a three-country context. Journal of Marketing Management. 36 (5–6), 420–
455.
20. Blattberg, R. C., Kim, B. D., & Neslin, S. A. (2008). Database Marketing: Analyzing and
Managing Customers. Springer Science & Business Media.
21. https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset