A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
159
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
effectively. In an effort to increase their revenue, algorithms. The dataset includes 8950 customer
shopping centers frequently compete with one another for transactions. The research question is how many clusters
new customers. The potential of machine learning to can differentiate customers by transactions or behaviors.
support such ambitions is astounding. The shopping The methods he used are K-Means, Minibatch K-Means
centers use the data about their customers and create ML Clustering Algorithm, Hierarchical Clustering, and the
models in order to target the appropriate customers. As a Elbow Method. [11]
result, the complexes become more effective and their Shreya T. et al. have discussed clustering approaches
sales go up. This paper demonstrates how important and techniques in segmentation. They explained CRM
customer segmentation is and how to segment customers and the importance of customer segmentation in various
with the help of different clustering techniques. industries. They applied K-Means, Elbow Method, and
The optimal number of clusters can be determined. In Hierarchical Clustering. Dendrograms were used to
this paper, we focus on Elbow method and average visualize clusters in the dataset. They observe that K-
Silhouette method. Determining the optimal cluster size is Means performs better than Hierarchical clustering.
not always easy. The average silhouette method calculates Hierarchical clustering can handle fewer data points,
a silhouette value for each data point and uses its mean to whereas K-Means clustering performs better for a high
determine optimal cluster size [9]. Our clustering quality number of observations. [12]
was evaluated using the average silhouette method. A Vaidisha Mehta et al. have explained different
large average silhouette width indicates successful clusteringrelated algorithms (K-Means, Mean Shift,
clustering. The average silhouette method finds the mean Hierarchical) that aid in segmenting customers based on
silhouette observations for varying k. The optimal number their needs. These algorithms are examined and compared
of k clusters maximizes the average silhouette over with regard to their datasets using metrics like Silhouette
significant values. The kmeans function and silhouette and Davies-Bouldin. KMeans was found to have the best
function from the cluster package can calculate silhouette performance. [14]
width. The Davies-Bouldin score is also used to compare To segment the customer and implement the various
the accuracies of our models. marketing strategies appropriately, Patel Monil et al.
presented various clustering approaches. It has also been
This paper is divided into seven main sections. At the speculated that a hybrid clustering method might
end of the paper, you will also find a list of references we outperform a set of separate models. To determine which
used during the preparation of this paper. The rest of the clustering algorithm to employ when, they discussed and
sections are organized as follows: Section II is about the compared several clustering algorithms (K-Means,
problem statement. Section III contains short summaries Hierarchical, DBSCAN, Affinity Propagation). [16]
of related works and research. Section IV gives an Robert Kwiatkowski applied K-Means, DBSCAN, and
overview of our dataset. Section V is about the Affinity Propagation to the same dataset we are working
methodology and experiments. This is the most important on. The main aim of his project was performing a mall
section. Section VI includes discussion, comparison, and customers segmentation using machine learning
knowledge. And section VII is about a final summary. algorithms. He discovered that K-Means and Affinity
Propagation generate reasonable clusters of six. We
II. PROBLEM STATEMENT follow his approaches and, additionally, we apply
You own a shopping mall and have customer ID, age, Hierarchica clustering. We use Silhouette and Davies-
gender, annual income, and spending score from Bouldin scores to compare the accuracy as well. [18]
membership cards. You assign a customer a Spending
Score based on their behavior and purchases. You want to IV. DATASET OVERVIEW
know who can easily converge (target customers) so The sample dataset summarizes the behavior of 200
marketing can plan accordingly. By the end of this paper, active mall customers over the last 3 months. The dataset
you can answer the questions below: • How to segment is from Kaggle [19]. Features include (Fig. 1):
customers using machine learning algorithms easily? • CustomerID: Customer’s unique ID.
• Who are your target customers?
• Gender – categorical-binary: Gender of the customer
• How real-world marketing works?
(Male & Female).
160
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
• Spending Score - numerical: A score (out of 100) 1) Distributions: Here, we will explore numerical
given to a customer by mall authorities based on variable distributions. Data will be stratified by gender,
money spent and behavior. the only categorical variable.
161
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
2) Calculate sample distances to group centroids and
assign nearest samples to their cluster.
3) Calculate updated cluster centroids.
Follow steps 2 and 3 until the algorithm converges. K-
Means users must define three parameters:
• Initialization criteria: A smart initialization scheme
is implemented: “k-means++” by Arthur and
Vassilvitskii creates initial centroids far apart to
improve results [20]. A random point generator is
Fig. 6. Median income of all ages customers. another option. There are ongoing efforts to create
the most efficient K-Means seeding method, one of
2) Correlations: This section examines numerical them is based on independent component analysis
correlations. Based on correlations, the dataset variables [22].
• Number of clusters: It’s very challenging to choose
don’t have strong relationships, there is a -0.33 weak
negative association between Age and Spending Score, so the number of clusters. There are many
older customers tend to spend less as shown in Fig. 7. heuristic/simplified methods. Elbow method is one
Customers’ annual income has a negligible correlation of the easiest and most popular.
with their spending score and with their age as well. • A distance metric (not required in scikit learn
There are some “clusters” that can be seen implementation): Distance between points can be
calculated several ways. Most popular is Euclidean,
which scikit-learn uses. It’s called spherical k-
means. It finds only spherical-like groups and
becomes inflated in multidimensional analyses
(“curse of dimensionality”).
B. Clustering
In Fig. 8, there’s no clear “elbow”, 5 or 6 clusters seem
This section describes and demonstrates K-Means, fair. But the Silhouette metric with Elbow helps us choose
DBSCAN, Affinity Propagation, and Hierarchical the optimal number of clusters. The silhouette score lies
clustering. between [-1, 1]. The cluster is more dense and well-
1) K-Means: K-Means is a popular partitional separated than other clusters if the score is 1. A value
clustering algorithm. Its simplicity makes it popular. K- close to 0 denotes overlapping clusters with samples that
Means algorithm has 3 steps [20], also known as Lloyd’s are extremely close to the decision boundary of the
algorithm [21]: adjacent clusters, or we can conclude that there is no real
1) Using seed points, splits samples into initial groups. difference in the distance between clusters. The samples
The nearest samples to these seed points create may have been put in the wrong clusters if the score is
initial clusters. negative [-1, 0] [15] [12].
162
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
Fig. 8 suggests that 5 (sil score - 0.44) or 6 (sil score point costs time, poorly identifies clusters with varying
0.452) clusters would be the optimal choices. As the densities. Machine learning researchers have proposed
silhouette scores are getting closer to 1, we can say that DBSCAN variations and extensions. For example: A
the clusters are well apart from one another. K-Means cloud-based DBSCAN algorithm that improves
algorithm generates the following 5 (0-4) and 6 (0-5) scalability [23], and a three-way clustering method based
clusters as shown in Fig. 9 and Fig. 10 respectively. on an improved DBSCAN algorithm to overcome cluster
method problems [24].
163
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
advantage. When the appropriate number of clusters is
unknown, this unsupervised machine learning approach
performs effectively [26]. To work with Affinity
Propagation, User needs to specify the parameters:
Preference, a negative number that controls how many
exemplars are used [27], And a Damping factor that
prevents numerical oscillations when updating messages.
As with other algorithms, there are ongoing
improvements. Ping Li’s team proposed an algorithm that
improves clustering results by calculating element Fig. 14. Dendrogram to find optimal number of clusters.
preferences [28]. Wenlong Hang’s team proposed transfer
affinity propagation-based clustering, which outperforms A proximity matrix must be created in order to
current algorithm in a small dataset [29]. calculate the euclidean distances before the data can
actually begin to be clustered (distance between each
point). Updates to the distance between each cluster are
then made using the same matrix. By displaying the data
as a Dendrogram, we can figure out the optimal number
of clusters. To minimize the variance between the
clusters, the ‘ward’ linkage method is used while creating
the dendrogram. In actuality, the Ward method aims to
minimize variance within each cluster [30] [13]. In Fig.
Fig. 13. Clusters of customers using Affinity Propagation. 14, ‘Customers’ makes up the x-axis of the diagram,
while the ‘Euclidean distance’ between clusters makes up
In Fig. 13, we can see that Affinity Propagation creates the y-axis. Looking at the dendrogram and counting the
relatively 6 even-sized clusters (0-5) similar to those number of long branches without horizontal splitting will
created by K-Means. Table I shows the observations for reveal the optimal number of clusters. We can infer that
each of the clusters. there are 5 good clusters from the preceding figure.
164
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
-1 - - 18 - - Fig. 17. Representation of clusters by the age of customers created using
0 79 39 112 22 83 K-Means.
1 37 35 8 22 35
2 39 45 34 44 39
3 22 38 24 39 20 In cluster 1, 37 customers earn high but spend low,
4 23 22 4 34 23 average age is little bit less than cluster 0. Perhaps these
5 - 22 - 39 - are the mall’s dissatisfied customers. They’re prime
Fig. 16 is about to compare the accuracy of our models targets for the mall because they can spend money. So,
using their Silhouette and Davies-Bouldin scores. A mall authorities will add new facilities to attract and meet
higher Silhouette score and a lower Davies-Bouldin score their needs.
relate to a model with better defined clusters [17] [14]. Cluster 2 has 39 high-earning, high-spending customers
Silhouette and Davies-Bouldin scores are best for K- who are under 35. Customers belong to this cluster are the
Means. The KMeans have the potential to be more ideal profit sources. These may be loyal and target
effective in customer segmentation. For customer customers.
segmentation, K-means outperforms other clustering In cluster 3, 22 customers earn low but spend high, they
algorithms, and it also works well with larger datasets are young. They buy more products despite having a low
[14]. The Affinity Propagation algorithm produces nearly income. Maybe they’re happy with mall services. The
identical results to K-Means. The scores of Hierarchical mall may not target them well, but they won’t lose them.
model shows that it performed almost the same as K- Cluster 4 has 23 low-income and low-spend customers,
Means and AP on our dataset. The downside of which is reasonable because low-income people buy less.
Hierarchical is that it is very slow for large datasets [16]. This cluster’s customers are old. Customers of this cluster
The scores of DBSCAN are not good. DBSCAN failed to are the financially savvy. The mall won’t want them.
create reasonable clusters because DBCSAN finds
clusters based on the density of points. If one of our VII. CONCLUSION
clusters is less dense than the others, DBSCAN will give Customer segmentation is well-liked because it makes
us less than ideal results because it won’t recognize the marketing and sales more effective. This is so that you
less dense group as a cluster. This algorithm performed can have a better grasp on what your customers’ desires
poorly on our dataset. and needs are. This has even greater financial
implications, and using efficient customer segmentation
will help you raise client lifetime value. This implies that
they will spend more money and stay longer. You can
make customers more loyal by getting to know them
better so you can target them better. In this paper, four
different clustering models were developed in order to
investigate the various kinds of customers. Each model
uncovered unique swaths of consumers who could be
Fig. 16. Accuracy comparison of models using Silhouette and Davies- served by the enterprise in accordance with the nature of
Bouldin scores. their requirements. We compared their Silhouette and
DaviesBouldin scores in order to find the best one. K-
K-Means (KM-5) groups mall customers into 5 groups Means and Affinity Propagation are the two methods that
(04) as we seen in Fig. 9 and Fig. 17 shows the clusters produce the most accurate results and reasonable clusters,
created by the average age of the customers belong to especially for our problem statement and dataset.
those clusters. Hierarchical also performed very well, whereas DBSCAN
In cluster 0, 79 customers belong to this cluster with performed very poorly and identified the outliers within
the average ages of 43 have average income and the data. Clustering allows us to gain a deeper
spending. Shops and malls won’t target them, but other comprehension of the factors at play, which in turn
data analysis techniques may boost their spending score. compels us to make deliberative judgments. Companies
are able to release products and services that target
customers based on several parameters, such as income,
age, spending patterns, and so on once they have
identified their customers. For better market segmentation
in the future, more complicated patterns, such as product
reviews, can also be taken into consideration.
165
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
REFERENCES Learning”, IJRASET, vol. 8, issue VI, June 2020. [Online].
Available: https: //www.ijraset.com/fileserve.php?FID=29880.
[1] Wikipedia, “One size fits all”, 2016. [Online]. Available: https:// [17] Haitian Wei, “How to measure clustering performances when
en.wikipedia.org/wiki/One size fits all. there are no ground truth?”, January 2020. [Online]. Available:
[2] Retention Science, “Why One Size Fits All Marketing Doesn’t https://medium.com/@haataa/how-to-measure-
Work?”. [Online]. Available: clusteringperformances-when-there-are-no-ground-truth-
https://www.retentionscience.com/blog/why-onesize-fits-all- db027e9a871c.
marketing-doesnt-work/. [18] Robert Kwiatkowski, “Mall Customers Segmentation”, 2022.
[3] David F. Zirkle, “Market Segmentation: One Size Does Not Fit [Online]. Available:
All”. [Online]. Available: https://www.kaggle.com/code/datark1/customers-clusteringk-
https://barlowmccarthy.com/blog/marketsegmentation-one-size- means-dbscan-and-ap/notebook.
does-not-fit-all/. [19] Vijay Choudhary, “Mall Customer Segmentation Data”, Kaggle,
[4] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based 2018. [Online]. Available:
Algorithm for Discovering Clusters in Large Spatial Databases https://www.kaggle.com/datasets/vjchoudhary7/ customer-
with Noise”, KDD’96: Proceedings of the Second International segmentation-tutorial-in-python?select=Mall Customers.csv.
Conference on Knowledge Discovery and Data Mining, August [20] D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of
1996, pp. 226-231, 1996. Careful Seeding”, Proceedings of the Eighteenth Annual ACM-
[5] Sanatan Mishra, “Unsupervised Learning and Data Clustering”, SIAM Symposium on Discrete Algorithms, New Orleans,
May 2017. [Online]. Available: Louisiana, USA, 2007, doi:
https://towardsdatascience.com/unsupervisedlearning-and-data- 10.1145/1283383.1283494.
clustering-eeecb78b422a. [21] Data Science Lab, “Clustering With K-Means in Python”, 2013.
[6] Soroush Hashemifar, “KMeans vs. DBScan”, April 2018. [Online]. Available:
[Online]. Available: https://datasciencelab.wordpress.com/tag/lloydsalgorithm/.
https://soroushhashemifar.medium.com/kmeans-vs- [22] Takashi Onoda, Miho Sakai, and Seiji Yamada, “Careful Seeding
dbscand9d5f9dbee8b. based on Independent Component Analysis for k-means
[7] Fisseha Berhane, “Data distributions where Kmeans clustering Clustering”, IEEE/WIC/ACM International Conference on Web
fails, can DBSCAN be a solution?”. [Online]. Available: Intelligence and Intelligent Agent Technology, Toronto, Canada,
https://datascienceenthusiast.com/Python/DBSCAN Kmeans.html. 2010, doi: 10.1109/WIIAT.2010.102.
[8] Alex Williams, “Why is clustering hard?”, September 2015. [23] W. Jing, C. Zhao and C. Jiang, “An improvement method of
[Online]. Available: DBSCAN algorithm on cloud computing”, Procedia Computer
http://alexhwilliams.info/itsneuronalblog/2015/09/11/ clustering1/. Science, vol. 147, 2019, pp. 596-604, doi:
[9] Sukavanan Nanjundan, Shreeviknesh Sankaran, C.R. Arjun, and 10.1016/j.procs.2019.01.208.
G. Paavai Anand, “Identifying the number of clusters for K- [24] H. Yu, L. Chen and X. Wang, “A three-way Clustering Method
Means: Based on an Improved DBSCAN Algorithm”, Physica A:
A hypersphere density based approach”, Computer Science, Statistical Mechanics and its Applications, vol. 535, 2019, doi:
ArXiv, December 2019. [Online]. Available: 10.1016/j.physa.2019.122289.
https://arxiv.org/ftp/arxiv/papers/ 1912/1912.00643.pdf. [25] B. J. Frey and D. Dueck, “Clustering by Passing Messages
[10] Ekta Sharma, “K-Means vs. DBSCAN Clustering — For Between Data Points”, Science, vol. 315, issue 5814, 2007, pp.
Beginners”, May 2020. [Online]. Available: 972-976, doi: 10.1126/science.11368.
https://towardsdatascience.com/ k-means-vs-dbscan-clustering- [26] Cory Maklin, “Affinity Propagation Algorithm Explained”, 2019.
49f8e627de27. [Online], Available: https://towardsdatascience.com/unsupervised-
[11] M. A. Ishantha, Project: “Mall Customer Segmentation Using machinelearning-affinity-propagation-algorithm-explained-
Clustering Algorithm”, LNBTI machine learning conference, d1fef85f22c8.
Colombo, 2021. [Online]. Available: [27] Aneesha Bakharia, “Using Affinity Propagation to Find the
https://www.researchgate.net/ publication/349714847 MALL Number of Clusters in a Dataset”, 2016. [Online]. Available:
CUSTOMER SEGMENTATION USING CLUSTERING https://aneesha.medium.com/using-affinity-propagation-to-find-
ALGORITHM. thenumber-of-clusters-in-a-dataset-52f5dd3b0760.
[12] Shreya T., Aditya B., and Poovammal E., “Approaches to [28] Ping Li, Haifeng Ji, Baoliang Wang, Zhiyao Huang, and Haiqing
Clustering in Customer Segmentation”, International Journal of Li, “Adjustable preference affinity propagation clustering”,
Engineering Technology, vol 7, no 3.12, pp. 802 -807, July 2018, Pattern Recognition Letters, Vol. 85, pp. 72-78, 2017, doi:
doi: 10.1016/j.patrec.2016.11.017.
10.14419/ijet.v7i3.12.16505. [29] Wenlong Hang, Fu-lai Chung, and Shitong Wang, “Transfer
[13] Samet Girgin, “Hierarchical Clustering Model in 5 Steps with affinity propagation-based clustering”, Information Sciences, Vol.
Python”, April 2019. [Online]. Available: 348, 2016, pp. 337-356, doi: 10.1016/j.ins.2016.02.009.
https://medium.com/@sametgirgin/ hierarchical-clustering-model- [30] Bradley Boehmke Brandon Greenwell, “Chapter 21 Hierarchical
in-5-steps-with-python-6c45087d4318. Clustering”, in Hands-On Machine Learning with R, 2020.
[14] Vaidisha Mehta, Ritvik Mehra, and Sourabh Singh Verma, “A [Online]. Available:
Survey on Customer Segmentation using Machine Learning https://bradleyboehmke.github.io/HOML/hierarchical.html.
Algorithms to Find Prospective Clients”, 9th International
Conference on Reliability, Infocom Technologies and
Optimization (Trends and Future Directions) (ICRITO), Noida,
India, 2016, doi: 10.1109/ICRITO51393.2021.9596118.
[15] Ajitesh Kumar, “KMeans Silhouette Score Explained With Python
Example”, September 2020. [Online]. Available:
https://dzone.com/articles/ kmeans-silhouette-score-explained-
with-python-exam.
[16] Patel Monil, Patel Darshan, Rana Jecky, Chauhan Vimarsh, and
Prof. B. R. Bhatt5, “Customer Segmentation using Machine
166
ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions