0% found this document useful (0 votes)

45 views8 pages

A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System

Uploaded by

Vutla Murali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views8 pages

A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System

Uploaded by

Vutla Murali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

A Cluster-based Analysis for Targeting Potential

Customers in a Real-world Marketing System

1st Sheikh Sharfuddin Mim 2nd Doina Logofatu
Dept. of Computer Science Dept. of Computer Science
Frankfurt University of Applied Sciences Frankfurt University of Applied Sciences
Frankfurt, Germany smim@stud.fra- Frankfurt, Germany
uas.de logofatu@fb2.fra-uas.de
Abstract—In today’s fast-paced, customer-focused marketing behaviors. This strategy does not work nowadays,
world, customer service management is key to revenue growth especially in business [2]. That’s why customer
and profitability. Customer behavior knowledge can help
marketing managers reevaluate existing customer tactics and
segmentation comes in as opposed to a “one-size-fits-all”
plan to improve and increase effective strategies. Maintaining a strategy [3]. Customer segmentation divides the market
productive relationship with business customers is crucial into homogeneous groups based on demographics,
because business transactions require more decision-making and regional data, economic situations, and behavioural
professional buying effort than consumer purchases. Most
tendencies. The customer segmentation strategy helps a
customer segmentation methods based on customer value fail to
account for time and value changes. Today’s business is based company maximize its marketing spending and gain a
on new concepts because so many customers are unsure of what competitive edge. Clustering groups unlabeled data points
to buy. Businesses can’t identify target customers. Machine is an unsupervised ML task, aim is to group similar
learning algorithms detect hidden data patterns to make better
customers or groups into clusters. Customer
decisions. Customer segmentation groups customers by gender,
age, interests, and spending habits. Customers with unique needs segmentation, image segmentation, and document
are segmented by companies. Companies want to know their cauterization are clustering applications.
customers. Their goals must be clear and individualized. By
analyzing data, businesses can better understand customer Clustering algorithms can be divided into two basic
preferences and identify profitable segments. This helps them
strategize their marketing while minimizing investment risk.
types [4]: Hierarchical and Partitional.
Customer segmentation depends on many factors. Demographic, Hierarchical algorithms recursively split a dataset into
geographic, economic, and behavioural data help the company smaller subsets until one item remains. A Dendrogram
approach different sectors. Customer segmentation uses shows this as a tree. It can be built from the leaves to the
clustering to determine which consumer segment to target. This
paper demonstrates machine learning-based customer
root (agglomerative) or vice versa (divisive approach). In
segmentation. This is the unsupervised clustering problem, and hierarchical clustering, we don’t need to specify the
we apply four of the most popular clustering algorithms: K- number of clusters, but we must define a termination
Means, Affinity Propagation, DBSCAN, and Hierarchical condition for splitting or merging.
Clustering. We find the optimal number of clusters from a
shopping mall dataset using the Elbow method and Silhouette
Partitional algorithms divide a dataset into subsets
score with the key factors like age, spending score, and annual (clusters) based on criteria. Some algorithms (K-Means)
income of customers. It helps shopping malls improve business require clusters to be defined a priori [5], while others do
by finding customer behavioural patterns. We compare the not (DBSCAN) [6] [7]. Defining the number of clusters
results using the Silhouette and DaviesBouldin scores. We find
before running an algorithm can be difficult (or
that Affinity Propagation and K-Means work best.
impossible) in many applications. This led to the
Index Terms—Clustering Algorithms, EDA, Customer
Segmentation, Targeting Potential Customers, K-Means, development of heuristics and simplified approaches for
DBSCAN, Affinity Propagation, Hierarchical Clustering. analysts without domain knowledge. There are many
clustering algorithms, but none is superior [8]. The best
I. INTRODUCTION one depends on the database, application domain, and
client needs and expectations. This paper concentrates on
Because of intense competition in the business world, four partitional algorithms: K-Means, DBSCAN, Affinity
companies have had to find ways to make more profit and Propagation, Hierarchical Clustering.
attract new customers by meeting customer needs. It
might be challenging to identify and satisfy the specific Companies use customer segmentation to target
needs of each consumer. Customers have different needs specific, smaller groups of customers with relevant
and preferences. messaging that would compel them to make a purchase.
Businesses also aim to have a deeper grasp of the
978-1-6654-6437-6/22/$31.00 ©2022 IEEE
preferences and needs of their clients in order to focus
In contrast to a “one-size-fits-all” strategy [1], customer
marketing materials at each client category more
segmentation groups customers with similar features or

159

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
effectively. In an effort to increase their revenue, algorithms. The dataset includes 8950 customer
shopping centers frequently compete with one another for transactions. The research question is how many clusters
new customers. The potential of machine learning to can differentiate customers by transactions or behaviors.
support such ambitions is astounding. The shopping The methods he used are K-Means, Minibatch K-Means
centers use the data about their customers and create ML Clustering Algorithm, Hierarchical Clustering, and the
models in order to target the appropriate customers. As a Elbow Method. [11]
result, the complexes become more effective and their Shreya T. et al. have discussed clustering approaches
sales go up. This paper demonstrates how important and techniques in segmentation. They explained CRM
customer segmentation is and how to segment customers and the importance of customer segmentation in various
with the help of different clustering techniques. industries. They applied K-Means, Elbow Method, and
The optimal number of clusters can be determined. In Hierarchical Clustering. Dendrograms were used to
this paper, we focus on Elbow method and average visualize clusters in the dataset. They observe that K-
Silhouette method. Determining the optimal cluster size is Means performs better than Hierarchical clustering.
not always easy. The average silhouette method calculates Hierarchical clustering can handle fewer data points,
a silhouette value for each data point and uses its mean to whereas K-Means clustering performs better for a high
determine optimal cluster size [9]. Our clustering quality number of observations. [12]
was evaluated using the average silhouette method. A Vaidisha Mehta et al. have explained different
large average silhouette width indicates successful clusteringrelated algorithms (K-Means, Mean Shift,
clustering. The average silhouette method finds the mean Hierarchical) that aid in segmenting customers based on
silhouette observations for varying k. The optimal number their needs. These algorithms are examined and compared
of k clusters maximizes the average silhouette over with regard to their datasets using metrics like Silhouette
significant values. The kmeans function and silhouette and Davies-Bouldin. KMeans was found to have the best
function from the cluster package can calculate silhouette performance. [14]
width. The Davies-Bouldin score is also used to compare To segment the customer and implement the various
the accuracies of our models. marketing strategies appropriately, Patel Monil et al.
presented various clustering approaches. It has also been
This paper is divided into seven main sections. At the speculated that a hybrid clustering method might
end of the paper, you will also find a list of references we outperform a set of separate models. To determine which
used during the preparation of this paper. The rest of the clustering algorithm to employ when, they discussed and
sections are organized as follows: Section II is about the compared several clustering algorithms (K-Means,
problem statement. Section III contains short summaries Hierarchical, DBSCAN, Affinity Propagation). [16]
of related works and research. Section IV gives an Robert Kwiatkowski applied K-Means, DBSCAN, and
overview of our dataset. Section V is about the Affinity Propagation to the same dataset we are working
methodology and experiments. This is the most important on. The main aim of his project was performing a mall
section. Section VI includes discussion, comparison, and customers segmentation using machine learning
knowledge. And section VII is about a final summary. algorithms. He discovered that K-Means and Affinity
Propagation generate reasonable clusters of six. We
II. PROBLEM STATEMENT follow his approaches and, additionally, we apply
You own a shopping mall and have customer ID, age, Hierarchica clustering. We use Silhouette and Davies-
gender, annual income, and spending score from Bouldin scores to compare the accuracy as well. [18]
membership cards. You assign a customer a Spending
Score based on their behavior and purchases. You want to IV. DATASET OVERVIEW
know who can easily converge (target customers) so The sample dataset summarizes the behavior of 200
marketing can plan accordingly. By the end of this paper, active mall customers over the last 3 months. The dataset
you can answer the questions below: • How to segment is from Kaggle [19]. Features include (Fig. 1):
customers using machine learning algorithms easily? • CustomerID: Customer’s unique ID.
• Who are your target customers?
• Gender – categorical-binary: Gender of the customer
• How real-world marketing works?
(Male & Female).

III. RELATED WORKS • Age - numerical: Customer’s age.

• Annual Income (k$) - numerical: Annual income of
M.A. Ishantha has done research on a dataset about
credit card customers’ behavior using unsupervised the customer.

160

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
• Spending Score - numerical: A score (out of 100) 1) Distributions: Here, we will explore numerical
given to a customer by mall authorities based on variable distributions. Data will be stratified by gender,
money spent and behavior. the only categorical variable.

Fig. 1. Preview of Dataset.

Fig. 3. Histogram - Male vs Female customers.
V. METHODOLOGY AND EXPERIMENT
Fig. 2 illustrates our work step by step. After In Fig. 3, the average age of male customers is slightly
initializing the dataset, we do some statistical analysis as higher than females (39.8 vs 38.1). Male age distribution
our second step. Then we will apply clustering is more uniform than female, where the biggest age group
algorithms. It consists of 3 steps, as we can see in the is 3035 years old. These groups don’t differ statistically.
figure. Then, in the next step of manual investigation, we There are slightly more female customers than male ones
will discuss and compare the results. And finally, we will (112 vs 88).
try to create some knowledge and insight for our problem
statement . All experiments are done using appropriate
Python libraries. We will discuss all of the steps briefly in
the following sections.

Fig. 4. Annual Income - Male vs Female.

In Fig. 4, males earn more than females (62.2 k$ vs

59.2 k$). The median income of male customers (62.5 k$)
is higher than female customers (60 k$). Both groups
have similar standard deviation. One male outlier has a
140 k$ annual income. These groups are statistically
equal. Women spend more (51.5) than men (48.5) as
shown in Fig. 5. Surprisingly, females have a lower
average income, but we have more female customers,
indicating that females visit despite their income. Perhaps
men are more interested in saving than spending.

Fig. 2. Work Flow Diagram.

A. Exploratory Data Analysis

This section examines the dataset statistically. It’s a Fig. 5. Spending Score - Male vs Female.
crucial step in any analysis because it clarifies the data.
The dataset contains no missing data. This simplifies the Now we will check the median income of all ages. In
analysis, but in real life analysts clean their data before Fig. 6, a bar chart shows that the wealthiest customers are
performing the core analysis. This section covers between 25 and 45 years old. Men and women differ most
distributions and correlations. by age. We can see that the richest males and females are
between the age groups of (25, 30) and (40, 45).

161

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
2) Calculate sample distances to group centroids and
assign nearest samples to their cluster.
3) Calculate updated cluster centroids.
Follow steps 2 and 3 until the algorithm converges. K-
Means users must define three parameters:
• Initialization criteria: A smart initialization scheme
is implemented: “k-means++” by Arthur and
Vassilvitskii creates initial centroids far apart to
improve results [20]. A random point generator is
Fig. 6. Median income of all ages customers. another option. There are ongoing efforts to create
the most efficient K-Means seeding method, one of
2) Correlations: This section examines numerical them is based on independent component analysis
correlations. Based on correlations, the dataset variables [22].
• Number of clusters: It’s very challenging to choose
don’t have strong relationships, there is a -0.33 weak
negative association between Age and Spending Score, so the number of clusters. There are many
older customers tend to spend less as shown in Fig. 7. heuristic/simplified methods. Elbow method is one
Customers’ annual income has a negligible correlation of the easiest and most popular.
with their spending score and with their age as well. • A distance metric (not required in scikit learn
There are some “clusters” that can be seen implementation): Distance between points can be
calculated several ways. Most popular is Euclidean,
which scikit-learn uses. It’s called spherical k-
means. It finds only spherical-like groups and
becomes inflated in multidimensional analyses
(“curse of dimensionality”).

Numeric columns are used for clustering. Elbow

method will be used to determine cluster size. In this case,
the inertia of 2 to 10 clusters is calculated to choose the
optimal number of clusters where a graph kinks or an
elbow found.

Fig. 7. Correlations between numerical features.

for the feature combination “Annual Income vs. Spending

Score”, which shows that there is a relationship between
those two factors. In terms of customers’ age, there are no
distinct groups. Fig. 8. Silhouette Elbow curve for optimal number of clusters.

B. Clustering
In Fig. 8, there’s no clear “elbow”, 5 or 6 clusters seem
This section describes and demonstrates K-Means, fair. But the Silhouette metric with Elbow helps us choose
DBSCAN, Affinity Propagation, and Hierarchical the optimal number of clusters. The silhouette score lies
clustering. between [-1, 1]. The cluster is more dense and well-
1) K-Means: K-Means is a popular partitional separated than other clusters if the score is 1. A value
clustering algorithm. Its simplicity makes it popular. K- close to 0 denotes overlapping clusters with samples that
Means algorithm has 3 steps [20], also known as Lloyd’s are extremely close to the decision boundary of the
algorithm [21]: adjacent clusters, or we can conclude that there is no real
1) Using seed points, splits samples into initial groups. difference in the distance between clusters. The samples
The nearest samples to these seed points create may have been put in the wrong clusters if the score is
initial clusters. negative [-1, 0] [15] [12].

162

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
Fig. 8 suggests that 5 (sil score - 0.44) or 6 (sil score point costs time, poorly identifies clusters with varying
0.452) clusters would be the optimal choices. As the densities. Machine learning researchers have proposed
silhouette scores are getting closer to 1, we can say that DBSCAN variations and extensions. For example: A
the clusters are well apart from one another. K-Means cloud-based DBSCAN algorithm that improves
algorithm generates the following 5 (0-4) and 6 (0-5) scalability [23], and a three-way clustering method based
clusters as shown in Fig. 9 and Fig. 10 respectively. on an improved DBSCAN algorithm to overcome cluster
method problems [24].

It’s hard to arbitrarily determine the best values. First,

we’ll create a matrix of combinations. Because DBSCAN
creates clusters based on Eps and min samples (MinPts),
as discussed earlier. We examined Heatmap first, the
number of clusters ranged from 17 to 4. Most
combinations produce 4-7 clusters. To choose a best
combination of clusters, we plotted Silhouette score as a
Fig. 9. Clusters of customers using K-Means (no clusters - 5). Heatmap. Global maximum is 0.26 for Eps( and
min samples = 4 as shown in Fig. 11.

Fig. 10. Clusters of customers using K-Means (no clusters - 6).

Fig. 11. Silhoutte score as Heatmap for best number of clusters.

Blue points are centroids. Cluster 0 has 79 observations

(‘average income - average spending’ customers). In Fig. In Fig. 12, we can see that DBSCAN created 5 clusters
9, there are two smallest clusters: cluster 3 (‘low income plus a cluster for outliers, sizes of clusters (0-4) differ
high spending’ customers) and cluster 4 (‘low income low significantly, some have only 4 or 8 observations are
spending’ customers) contain 22 and 23 observations shown in Table I. There are 18 outliers, or points that
respectively shown in Table I. We see that the points are don’t meet distance and sample requirements, the cluster
of different sizes, which indicates the age of the for those yellow points is labeled as ‘-1’.
customers. It’s pretty clear from the legend ‘Age’. Section
VI discusses this further.

2) DBSCAN: The full form of DBSCAN is Density-

Based Spatial Clustering of Applications with Noise.
DBSCAN focuses on dense regions, as the name suggests
[4]. The assumption is that natural clusters consist of
dense points. ”Dense region” must be defined. These two
parameters are needed for DBSCAN algorithm [10]:
Fig. 12. Clusters of customers using DBSCAN.
• Eps, - Distance.
• MinPts - Minimum number of points within 3) Affinity Propagation: Affinity Propagation is a
distance . clustering algorithm proposed in 2007 by Brendan F. and
Delbert D. [25]. It is based on the principle of sending
DBSCAN finds number of clusters itself based on Eps messages between two points until they converge. These
and MinPts parameters. It can differentiate elongated messages measure how similar and exemplary two points
clusters or clusters surrounded by other clusters, unlike are. The algorithm establishes the optimum cluster size.
K-Means, where clusters are always convex. DBSCAN’s This also implies very high time complexity cost of the
biggest flaw: Executing a neighborhood query for each order. AP’s insensitivity to initialization criteria is a big

163

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
advantage. When the appropriate number of clusters is
unknown, this unsupervised machine learning approach
performs effectively [26]. To work with Affinity
Propagation, User needs to specify the parameters:
Preference, a negative number that controls how many
exemplars are used [27], And a Damping factor that
prevents numerical oscillations when updating messages.
As with other algorithms, there are ongoing
improvements. Ping Li’s team proposed an algorithm that
improves clustering results by calculating element Fig. 14. Dendrogram to find optimal number of clusters.
preferences [28]. Wenlong Hang’s team proposed transfer
affinity propagation-based clustering, which outperforms A proximity matrix must be created in order to
current algorithm in a small dataset [29]. calculate the euclidean distances before the data can
actually begin to be clustered (distance between each
point). Updates to the distance between each cluster are
then made using the same matrix. By displaying the data
as a Dendrogram, we can figure out the optimal number
of clusters. To minimize the variance between the
clusters, the ‘ward’ linkage method is used while creating
the dendrogram. In actuality, the Ward method aims to
minimize variance within each cluster [30] [13]. In Fig.
Fig. 13. Clusters of customers using Affinity Propagation. 14, ‘Customers’ makes up the x-axis of the diagram,
while the ‘Euclidean distance’ between clusters makes up
In Fig. 13, we can see that Affinity Propagation creates the y-axis. Looking at the dendrogram and counting the
relatively 6 even-sized clusters (0-5) similar to those number of long branches without horizontal splitting will
created by K-Means. Table I shows the observations for reveal the optimal number of clusters. We can infer that
each of the clusters. there are 5 good clusters from the preceding figure.

4) Hierarchical Clustering: An alternate method to

kmeans clustering for identifying groupings in a dataset is
hierarchical clustering. Unlike k-means, which requires
us to pre-specify the number of clusters, hierarchical
clustering will produce a hierarchy of clusters. The results
of hierarchical clustering can also be quickly displayed
using an appealing tree-based representation known as a
dendrogram, giving it an edge over k-means clustering.
Hierarchical clustering comes in two flavors: Fig. 15. Clusters of customers using Hierarchical clustering.
Agglomerative (Bottom Up) and Divisive (Top Down)
[30]. Divisive clustering involves grouping all of the The next step is to display and fit the data to the
observations into a single cluster and then dividing that identified clusters. In Fig. 15, we can see that Hierarchical
cluster into subgroups based on the attributes that are the clustering produces 5 clusters (0-4). Table I shows the
least similar. The process is then repeated until at least observations for each of the clusters.
one cluster can match each observation. Due to their VI. COMPARISON AND DISCUSSION
comparative analysis and mapping of each observation to This section compares and discusses the clusters
a global model, divisive algorithms are typically more generated by four investigated algorithms. Table I shows
accurate at grouping data than Agglomerative ones. In a the sizes of clusters created using the algorithms we
process known as Agglomerative clustering, each applied to the mall dataset as we discussed in the previous
observation is given its very own cluster. Next, the section.
similarity between each cluster is computed, and clusters
that exhibit the most comparable qualities are then joined. TABLE I
Once there is just one cluster at the top, we continue CLUSTERS WITH NUMBER OF CUSTOMERS FOR EACH OF THE
INVESTIGATED ALGORITHMS
recursively. The clustering method we use in this paper is
Cluster KM-5 KM-6 DBSCAN AF HC
Agglomerative.

164

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
-1 - - 18 - - Fig. 17. Representation of clusters by the age of customers created using
0 79 39 112 22 83 K-Means.
1 37 35 8 22 35
2 39 45 34 44 39
3 22 38 24 39 20 In cluster 1, 37 customers earn high but spend low,
4 23 22 4 34 23 average age is little bit less than cluster 0. Perhaps these
5 - 22 - 39 - are the mall’s dissatisfied customers. They’re prime
Fig. 16 is about to compare the accuracy of our models targets for the mall because they can spend money. So,
using their Silhouette and Davies-Bouldin scores. A mall authorities will add new facilities to attract and meet
higher Silhouette score and a lower Davies-Bouldin score their needs.
relate to a model with better defined clusters [17] [14]. Cluster 2 has 39 high-earning, high-spending customers
Silhouette and Davies-Bouldin scores are best for K- who are under 35. Customers belong to this cluster are the
Means. The KMeans have the potential to be more ideal profit sources. These may be loyal and target
effective in customer segmentation. For customer customers.
segmentation, K-means outperforms other clustering In cluster 3, 22 customers earn low but spend high, they
algorithms, and it also works well with larger datasets are young. They buy more products despite having a low
[14]. The Affinity Propagation algorithm produces nearly income. Maybe they’re happy with mall services. The
identical results to K-Means. The scores of Hierarchical mall may not target them well, but they won’t lose them.
model shows that it performed almost the same as K- Cluster 4 has 23 low-income and low-spend customers,
Means and AP on our dataset. The downside of which is reasonable because low-income people buy less.
Hierarchical is that it is very slow for large datasets [16]. This cluster’s customers are old. Customers of this cluster
The scores of DBSCAN are not good. DBSCAN failed to are the financially savvy. The mall won’t want them.
create reasonable clusters because DBCSAN finds
clusters based on the density of points. If one of our VII. CONCLUSION
clusters is less dense than the others, DBSCAN will give Customer segmentation is well-liked because it makes
us less than ideal results because it won’t recognize the marketing and sales more effective. This is so that you
less dense group as a cluster. This algorithm performed can have a better grasp on what your customers’ desires
poorly on our dataset. and needs are. This has even greater financial
implications, and using efficient customer segmentation
will help you raise client lifetime value. This implies that
they will spend more money and stay longer. You can
make customers more loyal by getting to know them
better so you can target them better. In this paper, four
different clustering models were developed in order to
investigate the various kinds of customers. Each model
uncovered unique swaths of consumers who could be
Fig. 16. Accuracy comparison of models using Silhouette and Davies- served by the enterprise in accordance with the nature of
Bouldin scores. their requirements. We compared their Silhouette and
DaviesBouldin scores in order to find the best one. K-
K-Means (KM-5) groups mall customers into 5 groups Means and Affinity Propagation are the two methods that
(04) as we seen in Fig. 9 and Fig. 17 shows the clusters produce the most accurate results and reasonable clusters,
created by the average age of the customers belong to especially for our problem statement and dataset.
those clusters. Hierarchical also performed very well, whereas DBSCAN
In cluster 0, 79 customers belong to this cluster with performed very poorly and identified the outliers within
the average ages of 43 have average income and the data. Clustering allows us to gain a deeper
spending. Shops and malls won’t target them, but other comprehension of the factors at play, which in turn
data analysis techniques may boost their spending score. compels us to make deliberative judgments. Companies
are able to release products and services that target
customers based on several parameters, such as income,
age, spending patterns, and so on once they have
identified their customers. For better market segmentation
in the future, more complicated patterns, such as product
reviews, can also be taken into consideration.

165

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions
REFERENCES Learning”, IJRASET, vol. 8, issue VI, June 2020. [Online].
Available: https: //www.ijraset.com/fileserve.php?FID=29880.
[1] Wikipedia, “One size fits all”, 2016. [Online]. Available: https:// [17] Haitian Wei, “How to measure clustering performances when
en.wikipedia.org/wiki/One size fits all. there are no ground truth?”, January 2020. [Online]. Available:
[2] Retention Science, “Why One Size Fits All Marketing Doesn’t https://medium.com/@haataa/how-to-measure-
Work?”. [Online]. Available: clusteringperformances-when-there-are-no-ground-truth-
https://www.retentionscience.com/blog/why-onesize-fits-all- db027e9a871c.
marketing-doesnt-work/. [18] Robert Kwiatkowski, “Mall Customers Segmentation”, 2022.
[3] David F. Zirkle, “Market Segmentation: One Size Does Not Fit [Online]. Available:
All”. [Online]. Available: https://www.kaggle.com/code/datark1/customers-clusteringk-
https://barlowmccarthy.com/blog/marketsegmentation-one-size- means-dbscan-and-ap/notebook.
does-not-fit-all/. [19] Vijay Choudhary, “Mall Customer Segmentation Data”, Kaggle,
[4] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based 2018. [Online]. Available:
Algorithm for Discovering Clusters in Large Spatial Databases https://www.kaggle.com/datasets/vjchoudhary7/ customer-
with Noise”, KDD’96: Proceedings of the Second International segmentation-tutorial-in-python?select=Mall Customers.csv.
Conference on Knowledge Discovery and Data Mining, August [20] D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of
1996, pp. 226-231, 1996. Careful Seeding”, Proceedings of the Eighteenth Annual ACM-
[5] Sanatan Mishra, “Unsupervised Learning and Data Clustering”, SIAM Symposium on Discrete Algorithms, New Orleans,
May 2017. [Online]. Available: Louisiana, USA, 2007, doi:
https://towardsdatascience.com/unsupervisedlearning-and-data- 10.1145/1283383.1283494.
clustering-eeecb78b422a. [21] Data Science Lab, “Clustering With K-Means in Python”, 2013.
[6] Soroush Hashemifar, “KMeans vs. DBScan”, April 2018. [Online]. Available:
[Online]. Available: https://datasciencelab.wordpress.com/tag/lloydsalgorithm/.
https://soroushhashemifar.medium.com/kmeans-vs- [22] Takashi Onoda, Miho Sakai, and Seiji Yamada, “Careful Seeding
dbscand9d5f9dbee8b. based on Independent Component Analysis for k-means
[7] Fisseha Berhane, “Data distributions where Kmeans clustering Clustering”, IEEE/WIC/ACM International Conference on Web
fails, can DBSCAN be a solution?”. [Online]. Available: Intelligence and Intelligent Agent Technology, Toronto, Canada,
https://datascienceenthusiast.com/Python/DBSCAN Kmeans.html. 2010, doi: 10.1109/WIIAT.2010.102.
[8] Alex Williams, “Why is clustering hard?”, September 2015. [23] W. Jing, C. Zhao and C. Jiang, “An improvement method of
[Online]. Available: DBSCAN algorithm on cloud computing”, Procedia Computer
http://alexhwilliams.info/itsneuronalblog/2015/09/11/ clustering1/. Science, vol. 147, 2019, pp. 596-604, doi:
[9] Sukavanan Nanjundan, Shreeviknesh Sankaran, C.R. Arjun, and 10.1016/j.procs.2019.01.208.
G. Paavai Anand, “Identifying the number of clusters for K- [24] H. Yu, L. Chen and X. Wang, “A three-way Clustering Method
Means: Based on an Improved DBSCAN Algorithm”, Physica A:
A hypersphere density based approach”, Computer Science, Statistical Mechanics and its Applications, vol. 535, 2019, doi:
ArXiv, December 2019. [Online]. Available: 10.1016/j.physa.2019.122289.
https://arxiv.org/ftp/arxiv/papers/ 1912/1912.00643.pdf. [25] B. J. Frey and D. Dueck, “Clustering by Passing Messages
[10] Ekta Sharma, “K-Means vs. DBSCAN Clustering — For Between Data Points”, Science, vol. 315, issue 5814, 2007, pp.
Beginners”, May 2020. [Online]. Available: 972-976, doi: 10.1126/science.11368.
https://towardsdatascience.com/ k-means-vs-dbscan-clustering- [26] Cory Maklin, “Affinity Propagation Algorithm Explained”, 2019.
49f8e627de27. [Online], Available: https://towardsdatascience.com/unsupervised-
[11] M. A. Ishantha, Project: “Mall Customer Segmentation Using machinelearning-affinity-propagation-algorithm-explained-
Clustering Algorithm”, LNBTI machine learning conference, d1fef85f22c8.
Colombo, 2021. [Online]. Available: [27] Aneesha Bakharia, “Using Affinity Propagation to Find the
https://www.researchgate.net/ publication/349714847 MALL Number of Clusters in a Dataset”, 2016. [Online]. Available:
CUSTOMER SEGMENTATION USING CLUSTERING https://aneesha.medium.com/using-affinity-propagation-to-find-
ALGORITHM. thenumber-of-clusters-in-a-dataset-52f5dd3b0760.
[12] Shreya T., Aditya B., and Poovammal E., “Approaches to [28] Ping Li, Haifeng Ji, Baoliang Wang, Zhiyao Huang, and Haiqing
Clustering in Customer Segmentation”, International Journal of Li, “Adjustable preference affinity propagation clustering”,
Engineering Technology, vol 7, no 3.12, pp. 802 -807, July 2018, Pattern Recognition Letters, Vol. 85, pp. 72-78, 2017, doi:
doi: 10.1016/j.patrec.2016.11.017.
10.14419/ijet.v7i3.12.16505. [29] Wenlong Hang, Fu-lai Chung, and Shitong Wang, “Transfer
[13] Samet Girgin, “Hierarchical Clustering Model in 5 Steps with affinity propagation-based clustering”, Information Sciences, Vol.
Python”, April 2019. [Online]. Available: 348, 2016, pp. 337-356, doi: 10.1016/j.ins.2016.02.009.
https://medium.com/@sametgirgin/ hierarchical-clustering-model- [30] Bradley Boehmke Brandon Greenwell, “Chapter 21 Hierarchical
in-5-steps-with-python-6c45087d4318. Clustering”, in Hands-On Machine Learning with R, 2020.
[14] Vaidisha Mehta, Ritvik Mehra, and Sourabh Singh Verma, “A [Online]. Available:
Survey on Customer Segmentation using Machine Learning https://bradleyboehmke.github.io/HOML/hierarchical.html.
Algorithms to Find Prospective Clients”, 9th International
Conference on Reliability, Infocom Technologies and
Optimization (Trends and Future Directions) (ICRITO), Noida,
India, 2016, doi: 10.1109/ICRITO51393.2021.9596118.
[15] Ajitesh Kumar, “KMeans Silhouette Score Explained With Python
Example”, September 2020. [Online]. Available:
https://dzone.com/articles/ kmeans-silhouette-score-explained-
with-python-exam.
[16] Patel Monil, Patel Darshan, Rana Jecky, Chauhan Vimarsh, and
Prof. B. R. Bhatt5, “Customer Segmentation using Machine

166

ized licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on April 12,2023 at 05:55:18 UTC from IEEE Xplore. Restrictions

A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
No ratings yet
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
8 pages
K-Means Customer Segmentation Analysis
No ratings yet
K-Means Customer Segmentation Analysis
6 pages
Data Science for Customer Segmentation
No ratings yet
Data Science for Customer Segmentation
7 pages
K-Means Variants for Customer Segmentation
No ratings yet
K-Means Variants for Customer Segmentation
15 pages
Behavioral Customer Segmentation ML
No ratings yet
Behavioral Customer Segmentation ML
7 pages
Customer Segmentation Using Machine Learning
No ratings yet
Customer Segmentation Using Machine Learning
8 pages
Employee Mangement System
No ratings yet
Employee Mangement System
60 pages
Updated Thesis
No ratings yet
Updated Thesis
29 pages
Customer Segmentation with Clustering Techniques
No ratings yet
Customer Segmentation with Clustering Techniques
11 pages
BT 4065 Report
No ratings yet
BT 4065 Report
32 pages
Research Paper Mini Project
No ratings yet
Research Paper Mini Project
13 pages
Customer Segmentation with Clustering Algorithms
No ratings yet
Customer Segmentation with Clustering Algorithms
9 pages
Customer Segmentation With Machine Learning
No ratings yet
Customer Segmentation With Machine Learning
7 pages
Customer Segmentation with K-Means ML
No ratings yet
Customer Segmentation with K-Means ML
6 pages
Machine Learning for Customer Segmentation
No ratings yet
Machine Learning for Customer Segmentation
6 pages
Enhanced Customer Segmentation in E-commerce
No ratings yet
Enhanced Customer Segmentation in E-commerce
5 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Machine Learning for Customer Segmentation
No ratings yet
Machine Learning for Customer Segmentation
7 pages
Customer Segmentation with K-Means ML
No ratings yet
Customer Segmentation with K-Means ML
4 pages
Honey Research Paper
No ratings yet
Honey Research Paper
4 pages
Flying Fox Algorithm for Market Segmentation
No ratings yet
Flying Fox Algorithm for Market Segmentation
20 pages
Customer Segmentation Using K-Means
No ratings yet
Customer Segmentation Using K-Means
48 pages
Mall Customer Segmentation Analysis
No ratings yet
Mall Customer Segmentation Analysis
12 pages
Dynamic Customer Segmentation Using Unsupervised Machine Learning in Python
No ratings yet
Dynamic Customer Segmentation Using Unsupervised Machine Learning in Python
42 pages
IJCSP23D1055
No ratings yet
IJCSP23D1055
9 pages
JPSP202244
No ratings yet
JPSP202244
7 pages
Customer Segmentation Using Clustering
No ratings yet
Customer Segmentation Using Clustering
7 pages
Utkaarshhhhhhhhhhhhhhhhh
No ratings yet
Utkaarshhhhhhhhhhhhhhhhh
50 pages
RFM Customer Segmentation Analysis
No ratings yet
RFM Customer Segmentation Analysis
7 pages
K-Means for Targeted Customer Segmentation
No ratings yet
K-Means for Targeted Customer Segmentation
5 pages
Market Segmentation for Profit Maximization
No ratings yet
Market Segmentation for Profit Maximization
9 pages
Mall Customer Segmentation Using Cluster
100% (1)
Mall Customer Segmentation Using Cluster
6 pages
Ensemble Clustering for Customer Segmentation
No ratings yet
Ensemble Clustering for Customer Segmentation
21 pages
Customer Segmentation
No ratings yet
Customer Segmentation
15 pages
Updated Thesis
No ratings yet
Updated Thesis
28 pages
Customer Segmentation in Online Retail
No ratings yet
Customer Segmentation in Online Retail
21 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
Final Synopsis
No ratings yet
Final Synopsis
9 pages
Customer Segmentation Based On Machine Learning Method
No ratings yet
Customer Segmentation Based On Machine Learning Method
7 pages
Customer Segmentation with K-Means Clustering
No ratings yet
Customer Segmentation with K-Means Clustering
5 pages
Customer Segmentation with Clustering Techniques
No ratings yet
Customer Segmentation with Clustering Techniques
11 pages
RFM-Based Customer Segmentation in Banking
No ratings yet
RFM-Based Customer Segmentation in Banking
6 pages
Customer Segmentation Literature Review 1
No ratings yet
Customer Segmentation Literature Review 1
8 pages
IGI - Book 270 292
No ratings yet
IGI - Book 270 292
24 pages
Customer Segmentation in E-Commerce Analysis
No ratings yet
Customer Segmentation in E-Commerce Analysis
7 pages
Symmetry 13 01789 v2
No ratings yet
Symmetry 13 01789 v2
15 pages
Advanced Customer Clustering
No ratings yet
Advanced Customer Clustering
10 pages
Customer Data Clustering Techniques
No ratings yet
Customer Data Clustering Techniques
8 pages
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
67% (3)
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
66 pages
Comparison of K-Means and DBSCAN
No ratings yet
Comparison of K-Means and DBSCAN
20 pages
Customer Segemntation
No ratings yet
Customer Segemntation
26 pages
Automated Customer Segmentation System
No ratings yet
Automated Customer Segmentation System
29 pages
Market Segmentation for Profit Maximization
No ratings yet
Market Segmentation for Profit Maximization
9 pages
Customer Segmentation Marketing Strategy Based On Big Data Analysis and Clustering Algorithm
No ratings yet
Customer Segmentation Marketing Strategy Based On Big Data Analysis and Clustering Algorithm
16 pages
2629 Gembali Maneesh
No ratings yet
2629 Gembali Maneesh
59 pages
Customer Segmentation Using Machine Learning
No ratings yet
Customer Segmentation Using Machine Learning
4 pages
Teenage Market and Buying Behaviour PDF
No ratings yet
Teenage Market and Buying Behaviour PDF
196 pages
Theme 7 To 13 MGT301 Final
No ratings yet
Theme 7 To 13 MGT301 Final
259 pages
Integrated Marketing Communication Overview
No ratings yet
Integrated Marketing Communication Overview
53 pages
Banwacottageindustry Proposal
No ratings yet
Banwacottageindustry Proposal
9 pages
Tourists' Shopping Behavior Insights
No ratings yet
Tourists' Shopping Behavior Insights
12 pages
Marketing Strategies of Ford Motor India
73% (11)
Marketing Strategies of Ford Motor India
77 pages
Evolution of Shampoo in India
100% (1)
Evolution of Shampoo in India
15 pages
Churn Rate Analysis in Retail Using RFM
No ratings yet
Churn Rate Analysis in Retail Using RFM
46 pages
Advanced Marketing Resumos
No ratings yet
Advanced Marketing Resumos
16 pages
Interior Design Business Plan Example
0% (1)
Interior Design Business Plan Example
42 pages
The Ultimate Guide To Strategic Marketing: Real World Methods For Developing Successful, Long-Term Marketing Plans, 1st Edition
100% (6)
The Ultimate Guide To Strategic Marketing: Real World Methods For Developing Successful, Long-Term Marketing Plans, 1st Edition
17 pages
Positioning and Differentiating
100% (2)
Positioning and Differentiating
10 pages
Multiple Choice DIRECTION: Read and Analyze The Questions Carefully and Write The Letter of The Correct Answer On The Space
No ratings yet
Multiple Choice DIRECTION: Read and Analyze The Questions Carefully and Write The Letter of The Correct Answer On The Space
7 pages
Wedding Planners Final Draft
0% (1)
Wedding Planners Final Draft
43 pages
Unconventional Marketing for Modula
No ratings yet
Unconventional Marketing for Modula
20 pages
Lectures
No ratings yet
Lectures
744 pages
After Reading This Lesson You Should Be Able To:: Module 2: Retail Customer
No ratings yet
After Reading This Lesson You Should Be Able To:: Module 2: Retail Customer
9 pages
Customer Retention &amp Customer Loyalty - MFS
No ratings yet
Customer Retention &amp Customer Loyalty - MFS
26 pages
Project Report On AI Marketing 2 of 2024
100% (1)
Project Report On AI Marketing 2 of 2024
51 pages
Case Study On Lakme Reinvent Marketing Strategy
No ratings yet
Case Study On Lakme Reinvent Marketing Strategy
26 pages
IMC Planning & Market Segmentation
No ratings yet
IMC Planning & Market Segmentation
35 pages
Brazil's Solar Installer Strategy
No ratings yet
Brazil's Solar Installer Strategy
12 pages
Full Syllabus of Marketing
100% (1)
Full Syllabus of Marketing
170 pages
Asha's Own1
No ratings yet
Asha's Own1
2 pages
TS 006 2025
No ratings yet
TS 006 2025
31 pages
Extension Monitoring and Evaluation Framework
No ratings yet
Extension Monitoring and Evaluation Framework
21 pages
Aspiring Entrepreneurs Guide
No ratings yet
Aspiring Entrepreneurs Guide
5 pages
Global Corrugated Box Market Trends 2015-2019
No ratings yet
Global Corrugated Box Market Trends 2015-2019
8 pages
Assignment Business Model Canvas Report
No ratings yet
Assignment Business Model Canvas Report
24 pages
Maliban Docx2 PDF
100% (2)
Maliban Docx2 PDF
18 pages

A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System

Uploaded by

A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System

Uploaded by

A Cluster-based Analysis for Targeting Potential

Customers in a Real-world Marketing System

III. RELATED WORKS • Age - numerical: Customer’s age.

Fig. 1. Preview of Dataset.

Fig. 4. Annual Income - Male vs Female.

In Fig. 4, males earn more than females (62.2 k$ vs

Fig. 2. Work Flow Diagram.

A. Exploratory Data Analysis

Numeric columns are used for clustering. Elbow

Fig. 7. Correlations between numerical features.

for the feature combination “Annual Income vs. Spending

It’s hard to arbitrarily determine the best values. First,

Fig. 10. Clusters of customers using K-Means (no clusters - 6).

Blue points are centroids. Cluster 0 has 79 observations

2) DBSCAN: The full form of DBSCAN is Density-

4) Hierarchical Clustering: An alternate method to

You might also like