0% found this document useful (0 votes)

76 views7 pages

Clustering Categorical Data Using The K Means Algorithm and The Attributes Relative Frequency

This document discusses clustering categorical data using the k-means algorithm. Specifically, it proposes transforming categorical values into numeric ones using the relative frequency of each category in the attributes. This approach is compared to transforming categorical data into binary values. The paper aims to experiment the scalability and accuracy of the two methods for clustering categorical data, with the goal of outperforming the binary transformation approach. It provides background on clustering categorical data and the k-means algorithm, details the proposed relative frequency transformation approach, and discusses experimental results.

Uploaded by

Kiran Keeru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views7 pages

Clustering Categorical Data Using The K Means Algorithm and The Attributes Relative Frequency

Uploaded by

Kiran Keeru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/317692315

Clustering Categorical Data Using the K-Means Algorithm and the Attribute's
Relative Frequency

Conference Paper · June 2017

CITATIONS READS

5 1,008

1 author:

Semeh Ben Salem

Military Research Center MoD Tunisia
20 PUBLICATIONS 72 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

clustering categorical data View project

IBM WATSON services for Artificial Intelligence View project

All content following this page was uploaded by Semeh Ben Salem on 20 June 2017.

The user has requested enhancement of the downloaded file.

World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

Clustering Categorical Data Using the K-Means

Algorithm and the Attribute’s Relative Frequency
Semeh Ben Salem, Sami Naouali, Moetez Sallami

1
description for each obtained cluster to extract the
Abstract—Clustering is a well known data mining technique corresponding proprieties and knowledge.
used in pattern recognition and information retrieval. The initial k-means is a well known clustering algorithm proposed for
dataset to be clustered can either contain categorical or numeric data. numeric datasets (containing numeric values) which makes it
Each type of data has its own specific clustering algorithm. In this
context, two algorithms are proposed: the k-means for clustering
not adapted for clustering categorical datasets. This fact is a
numeric datasets and the k-modes for categorical datasets. The main great restriction and limited the performance of this algorithm
encountered problem in data mining applications is clustering since, in many data mining applications, most considered
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

categorical dataset so relevant in the datasets. One main issue to datasets may contain categorical values. To deal with
achieve the clustering process on categorical values is to transform categorical datasets, the k-means was extended to obtain the k-
the categorical attributes into numeric measures and directly apply modes algorithm that will be detailed in the next section.
the k-means algorithm instead the k-modes. In this paper, it is
proposed to experiment an approach based on the previous issue by
However, one other interesting issue is to convert the
transforming the categorical values into numeric ones using the categorical data into numeric values and directly apply the k-
relative frequency of each modality in the attributes. The proposed means algorithm which is also interesting to discover.
approach is compared with a previously method based on This paper is organized as follows: in the second section,
transforming the categorical datasets into binary values. The we present previous approaches towards clustering categorical
scalability and accuracy of the two methods are experimented. The data with their limits and provides a detailed description of the
obtained results show that our proposed method outperforms the
binary method in all cases.
k-means that will be adopted in this study. In the third section,
our proposed approach is detailed. Experimental results and
Keywords—Clustering, k-means, categorical datasets, pattern discussion are provided in the fourth section, and the last
recognition, unsupervised learning, knowledge discovery. section is devoted to the conclusion and perspectives.

I. INTRODUCTION II. LITERATURE REVIEW IN CLUSTERING CATEGORICAL

T HE considerable increase of information technology DATASETS

devices manufacturing and the advances in scientific data A. Categorical Clustering Algorithms
collection methods lead to the creation of growing data Although several proposals were made in the context of
repositories. Besides, traditional exploratory methods have clustering categorical datasets, the most popular developed
shown their inefficiency in dealing with such data quantities to algorithm is the k-mode [4] and its variants [5]-[7]. It is an
discover new findings. Thus, recent developed knowledge- extension of the k-means algorithm where the Euclidean
discovery systems should implement an innovative and distance is replaced by the simple matching dissimilarity
appropriate machine learning algorithms to explore these huge function, more suitable for categorical values, and the means
structures and to identify initially hidden patterns [1], [2]. by the modes, to identify the most representative element in a
In data mining, clustering [3] is the most commonly cluster (centroid). Besides, the modes are based on a
encountered knowledge-discovery technique applied in frequency based method used in each iteration to update the
information retrieval and pattern recognition. It refers to centroids. The k-prototype algorithm [4] permits clustering
unsupervised learning aiming to partition a dataset composed mixed datasets with categorical and numeric values.
of N individuals embedded in and-dimensional space into K Numerous variants were also proposed: the fuzzy k-modes
distinct clusters without any prior knowledge about the algorithm [8] and the fuzzy k-modes algorithm with fuzzy
distribution of the resulting clusters. The resulting data points centroids [9]. However, the main limitation when using the
in the same cluster are more similar to each other than to data simple dissimilarity matching distance is that it does not
points in other clusters. Three sub-problems are addressed by provide efficient results since the simple matching often
this process: (i) the similarity measure (distance) used to results in clusters with weak intra-similarity [10].
compare the data points, (ii) the iterative process of the In [11], the authors showed that the similarity between two
designed algorithm to discover the clusters in an unsupervised categorical values can also be referred as their co-occurrence
way to guarantee the efficiency and (iii) derive a significant according to a common value or a set of values which
represents the second techniques to clustering categorical data
Semeh Ben Salem, Sami Naouali and Moetez Sallami are with the Virtual considering the co-occurrence of the attributes. The most
Reality and Information Technology (VRIT), Military Academy of Fandouk
Jedid, Tunisia (e-mail: [email protected], [email protected],
popular algorithm that falls into this category is the ROCK
[email protected]). [12]. It measures the similarity between the categorical

International Scholarly and Scientific Research & Innovation 11(6) 2017 657 scholar.waset.org/1999.4/10007221
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

patterns using the concept of links, i.e. the similarity between The k-means clustering algorithm is well known for its
any two categorical patterns depends on the number of their efficiency in clustering large datasets, and few previous
common neighbors. Thus, the aim of this algorithm is to proposals aimed to use its original version in clustering
merge the patterns into a group that have relatively large categorical data. On the other hand, some attempts were
number of links. proposed to cluster categorical datasets using hierarchical
The notion of relative frequency was used in [13] to define algorithms but would not present interesting issues due to their
a new dissimilarity coefficient for the k-modes algorithm in quadratic time complexity that hindered their usage.
which the frequency of the categorical values in current cluster The main motivation behind this approach is to take the
has been considered to calculate the dissimilarity between a benefits from the k-means algorithm in terms of complexity: it
data point and a cluster mode since the simple matching is well known for its low computational cost O(KNTd) that is
distance metric is not a good measure as it results in poor linear to the number of clusters K, the number of observations
intra-cluster similarity. N, the number of iterations T and the number of attributes d. In
Although the k-modes based algorithms have shown their [15], the author proposed an approach to using the k-means
efficiency in clustering large categorical datasets, like the k- algorithm to cluster categorical data. The approach is based on
means types algorithms, they still have two major limitations: converting multiple category attributes into binary values
(1) impossibility to cover the global information effectively, using either 1 or 0 to represent if the category is absent or
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

i.e. the provided solutions are only local optimal and a global present and to consider the binary attributes as numeric data.
solution is not easy to find [14], (2) the accuracy of the However, once used in data mining applications, this approach
obtained results is sensitive to the number and shape of the needs to handle a huge number of parameters and an
initial centroids. Besides, the modes are more difficult to move increasing number of attributes corresponding to huge number
in iterative optimization processes because the attribute values of modalities. This fact will increase both the computational
of categorical data are not continuous. The mode represents and space costs of the k-means algorithm. Besides, according
the most frequent element in the considered modality which to the algorithm’s process, the cluster means computed
means that if two modalities have close frequencies, only one representing the centroids will be contained into 0 and 1,
will be retained and the other one will be dismissed, which which does not indicate the real characteristics of the clusters.
results in information loss.
In this paper, two approaches are discussed and III. PROPOSED APPROACH FOR CLUSTERING CATEGORICAL
experimented for clustering categorical datasets using k-means DATASETS
algorithm: in the first method, we use a binary data In this paper, it is proposed to experiment a method to
representation to convert the initial categorical dataset into cluster qualitative data using the original version of the k-
numeric values. In the second method, the relative frequency means algorithm. The considered dataset is assumed to be
of the modalities in the attributes is used to execute the stored in a table, where each row (tuple) represents the
transformation. observations described by the attributes arranged in columns.
B. The k-Means Algorithm Encountered objects in many Data Mining applications are
many times described by categorical information systems.
The k-means algorithm is a widely used clustering
Definition 1. Formally, a categorical information system is
technique where an initial training set SN={x(i), i=1,..,N}
represented by the quadruple CIS=(U,A,V,f), where:
composed of N elements x(i)∈ described by d attributes is
 U is a non empty set of objects (universe).
divided into K clusters , , … , . The clustering process is
 Aa non empty set of attributes.
based on measuring the distance between the initially
 V is a finite unordered set representing the union of all the
randomly selected centroids z(i) and the observations. The
attributes domain.
algorithm is described as follows:
 f is a mapping information function.
1. Initialize centroids ,…, ; Although the initial version of the k-means is not adapted
2. Repeat until there is no further changes in cost function for categorical data which represents its main limitation, in
a. ∀j=1,…,K: ; is closest to this paper we propose a new efficient approach to cluster
b. ∀j=1,…,K: ∑∈ (cluster mean) categorical datasets based on k-means. To make it possible,
our proposed solution consists of transforming the initial
The k-means aims to minimize a criterion known
dataset into numeric values by considering the relative
as with in cluster sum of squares. This function is defined as
frequency of the modalities in each attribute.
follows:
Definition 2. The relative frequency is the number of
occurrences of the kth category Ck,j in attribute divided by
∑ ∑ ∈
the number of observations N in the dataset and is defined
The resulting clusters are described by the mean of the as follows:
samples in the cluster called “centroids” which may not be
,
points from the dataset, although they live in the same space. , /

International Scholarly and Scientific Research & Innovation 11(6) 2017 658 scholar.waset.org/1999.4/10007221
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

where , is the number of occurrences of the category Ck,j. TABLE II.B

RELATIVE FREQUENCY OF THE MODALITY IN EACH ATTRIBUTE
This proposed approach will be compared to Modality Male Female Student Employee Jobless Yes No
Ralambondrainy’s method [15]. The corresponding clustering 0.75 0.25 0.25 0.5 0.25 0.75 0.25
;
algorithm proposed in this context is described as follows:
TABLE II.C
Inputs: NUMERIC TRANSFORMATION OF THE INITIAL DATASET CONSIDERING THE
={ , ,…, }⊆ a set of N individuals; RELATIVE FREQUENCY
K (≪ ∈ desired clusters; Obsi Sex (M/F) Work Criminal Records
: x → the Euclidean distance; Obs1 0.75 0.25 0.75
Outputs: Obs2 0.75 0.5 0.75
a set of K clusters , { , ,…, } Obs3 0.25 0.25 0.25
Data Transformation (qualitative → quantitative) Obs4 0.75 0.5 0.75
STEP1: FOR each observation obsi from DO
FOR each attribute DO The relative frequency of each modality in the example is
Compute of the kth category Ck,j in
provided in Table II.B.
The obtained result when considering the proposed
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

,
; / approach will be as follows in Table II.C.
STEP 2: Randomly select K initial centroids (objects) from for
the clusters; IV. EXPERIMENTAL ANALYSIS RESULTS
, ,…, In this section, the experimental environment and the initial
WHILE ( ,…, ,…, ) DO dataset are described. The efficiency is evaluated using the
FOR each cluster ∈ DO accuracy. Besides, the contribution on scalability is also tested
FOR each individual , ∈ DO considering different values of the number of clusters K in a
Compute , first step and then with 50 runs of four different values of K
Assign each to the nearest with different initial centroids.
: , The complexity of the k-means algorithm depends on the
Re-compute new cluster centroid using the means; number of iterations T, attributes (dimensions) d, observations
∑ ∈ N and clusters K. In the experiments, N and K are equal for the
two methods, however, the experimental results show that T
for the binary method is higher than for the frequency based
The following example, gives an idea on how to implement method. Besides, the resulting datasets to be experimented
the two methods. have different number of dimensions: for the binary
TABLE I
transformation, this parameter is higher than for the frequency
EXAMPLE OF THE INITIAL CONSIDERED CATEGORICAL DATASET based method. These facts show that our new proposed
Obsi Sex (M/F) Work Criminal Records technique permits reducing the complexity of the k-means
Obs1 M S Y once executed. Some proposals were made to reduce the
Obs2 M E Y dimensionality [16], [17] and can be considered if it is
Obs3 F J N proposed to experiment the issue of reducing the dimensions
Obs4 M E Y of the resulting binary transformation.
*S: Student, E: Employee, J: Jobless, **Y: Yes, N: No
A. Experimental Environment and Evaluation Criterion
Table I provides an example of a categorical dataset The algorithm was coded with JAVA language and
containing four observations described by three categorical experimented on an Intel Core i3-2.1 GHz machine with a 4
attributes. The first attribute (Sex) has two modalities GB RAM running on Windows 7 operating system. To
(Male/Female), the second attribute (Work) has three evaluate the efficiency of the k-means in clustering categorical
modalities (Student, Employee, Jobless), and the third attribute datasets using the relative frequency of attributes
(Criminal Records) has two modalities (Yes/No). When transformation, the accuracy is considered as an evaluation
considering the first transformation method to obtain binary criterion and as this metric increase, better clusters are
dataset, the obtained result will be as follows. obtained. The accuracy is defined as follows:
Definition 3. The accuracy AC of a clustering is an external
TABLE II.A evaluation criterion that permits comparing the effectiveness
BINARY TRANSFORMATION OF THE INITIAL DATASET
of two clusterings as follows:
Sex Work Criminal Record
Obsi
Male Female Student Employee Jobless Yes No
∑
Obs1 1 0 1 0 0 1 0
| |
Obs2 1 0 0 1 0 1 0
Obs3 0 1 0 0 1 0 1
K is the number of predefined classes, is the number of
Obs4 1 0 0 1 0 1 0

International Scholarly and Scientific Research & Innovation 11(6) 2017 659 scholar.waset.org/1999.4/10007221
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

correctly assigned objects. B. Evaluation on Scalability

In the experiments, the dataset contains a list of 50 terrorist In this subsection, the scalability of the k-means applied for
attacks that occurred in several countries worldwide extracted the two datasets is evaluated. This process is based on
from the Global Terrorism Database (GTD) [18]. Each dataset estimating two factors: the required execution time (run time)
is described by eleven qualitative attributes: year, month, day, and the number of iterations necessary for the convergence.
country, region, city, type, target, target nationality, group, Besides, since the final results of the clustering depend on the
mean. Although the first three attributes are numeric, they initial centroids and to avoid the influence of their casual
were considered as qualitative measures since they have fixed selection, we performed additional experiments to better
ranges and would provide more significance. Two datasets are experiment our proposed approach when fixing the number of
then generated: the first one contains binary values (58 Ko) clusters: for each experiment, the number of clusters K is fixed
and the other contains numeric values computed using the (K=3,4,5,6) and the initial centroids are modified to run the
relative frequency of the modalities (26 Ko). algorithm 50 times. Therefore, the average of 50 times runs is
also provided to better illustrate the contribution on improving
the scalability and effectiveness of our proposed technique.
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

Fig. 1 Execution Time comparison using the two methods for the considered datasets over the number of clusters

Fig. 2 NB of iterations comparison using the two methods for the considered datasets over the number of clusters

The two previous figures represent the scalability of the k- dataset into numeric values were experimented according to
means clustering algorithm when considering two different different values of K. To better experiment it, it is proposed to
initial datasets obtained according to our approach. The test the scalability for four values of K with 50 runs and
scalability is defined by two parameters: the run time and the compute the average, the minimum and maximum values of
number of iterations required by the algorithm to converge. the run time and number of iterations. The obtained results are
According to these two factors, the relative frequency summarized in Tables III.A-C.
transformation is lower than the run time required by the
binary method. This fact highlights the convenience of the TABLE III.A
AVERAGE OF THE RUN TIME AND NUMBER OF ITERATIONS REQUIRED BY THE
proposed approach and the value of our contribution. The TWO APPROACHES FOR VARIOUS VALUES OF K
difference in the execution time is very significant and makes Binary dataset Dataset with relative frequency
K
our proposed approach more adapted for data mining Run time iteration Run time iteration
applications when dealing with huge datasets. 3 860.6 16.6 369.13 7.16
4 1228.12 25.4 433.28 10.22
In the previous experiments, the scalability of the algorithm 5 1792 28.68 511.08 11.64
and its performance in clustering a transformed categorical 6 2022.98 37.78 551.9 13.66

International Scholarly and Scientific Research & Innovation 11(6) 2017 660 scholar.waset.org/1999.4/10007221
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

TABLE III.B According to the previous results, it is obvious that the

MINIMUM VALUES OF THE RUN TIME AND NUMBER OF ITERATIONS
COMPUTED FOR THE TWO APPROACHES
proposed approach, consisting on transforming the categorical
Binary dataset Dataset with relative frequency
data into numeric values using the relative frequency of each
K modality in the attributes, is more scalable than the
Run time iteration Run time iteration
3 608 10 312 6 Ralambondrainy’s technique: the average execution time and
4 780 8 312 6 number of iterations calculated for 50 runs of the algorithm on
5 811 11 331 7 the relative frequency dataset is lower when compared with
6 952 14 390 8 the binary dataset. This fact highlights the importance of our
proposed approach and its adaptability for huge datasets.
TABLE III.C
MINIMUM VALUES OF THE RUN TIME AND NUMBER OF ITERATIONS C. Evaluation on Clustering Efficiency
COMPUTED FOR THE TWO APPROACHES In this subsection, the clustering efficiency is experimented
Binary dataset Dataset with relative frequency using the accuracy presented in section IV. Good clustering
K
Run time iteration Run time iteration corresponds to higher values of the accuracy that represents
3 1545 29 468 9 the average of well clustered elements in their corresponding
4 3261 77 1435 41
classes. In the first step of the experiments, the accuracy is
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

5 2886 54 1108 30
computed for different values of the number of clusters K
6 4664 79 1170 29
(2→20). The obtained results are shown in the following
figure.

Fig. 3 Accuracy computed for various number of K

According to the previous results, it is obvious that the The provided results confirm again that the proposed
accuracy computed using the proposed approach is superior in approach is more effective in clustering categorical data if we
major cases to the accuracy computed when considering the consider the relative frequency of the modalities in the
Ralambondrainy’s approach [15]. The proposed approach is attribute in transforming the categorical data into numeric
not only effective in terms of the run time and the number of values. The obtained accuracies are higher for the proposed
iteration but also the efficiency is enhanced with the new approach than the results provided for the Ralambondrainy’s
proposal. The evaluation on clustering efficiency can be technique.
considered as more important than the scalability since it
permits characterizing and identifying more imminent profiles, V. CONCLUSION
which is the aim scope of the clustering process. Clustering categorical data is a heavy and complex task, and
As executed with the scalability experiments, it is proposed specific clustering algorithm should be designed. In this paper,
to consider accuracy computation over 50 runs of the the relative frequency of each modality in their attributes is
algorithm for the two datasets. The same previous values of used to transform the categorical measures into numeric
K(3,4,5,6) are considered. In Table IV, the average of the values. The k-means algorithm is the applied to the resulting
values of the accuracy computed in each case is provided. dataset. Experimental results conducted show that our
proposed approach permitted enhancing three parameters: (i)
TABLE IV
ACCURACY COMPUTED FOR 50 RUNS OF THE K-MEANS FOR THE TWO the scalability: the run time and number of iterations, (ii) the
METHODS efficiency experimented using the accuracy and (iii) the
3 4 5 6 complexity due to the reduction of the number of iterations
Ralambondrainy 0.5 0.57 0.51 0.65 and dimensions of the original dataset. These findings show
Proposed approach 0.675 0.748 0.686 0.712 the considerable contribution resulting from to the use of the
relative frequency. This criterion is considered as the most
appropriate statistical parameter to convert categorical into

International Scholarly and Scientific Research & Innovation 11(6) 2017 661 scholar.waset.org/1999.4/10007221
World Academy of Science, Engineering and Technology
International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol:11, No:6, 2017

numeric measures. However, more additional experiments

should be conducted to evaluate the effectiveness of our
proposed approach: in our future work, we propose to
compare the experimented approach in this paper with other
more advanced techniques proposed for clustering categorical
datasets.

REFERENCES
[1] Jiawei Han, Jian Pei, Micheline Kamber, “Data Mining: Concepts and
Techniques”, Elsevier, 3rd edition, 2011, 744 p.
[2] Charu C. Aggarwal, “Data Mining: the textbook”, Springer 2015, 734
pages.
[3] GuojunGan, Chaoqun Ma, Jianhong Wu, “Data Clustering: Theory,
Algorithms, and Applications”, ASA-SIAM Series on Statistics and
Applied Probability, 2007.
[4] Zhexue Huang, “Extension to the k-means algorithm for clustering large
data sets with categorical values.” Data Mining and Knowledge
Discovery 2, 283-304 (1998).
International Science Index, Computer and Systems Engineering Vol:11, No:6, 2017 waset.org/Publication/10007221

[5] Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang, “A
dissimilarity measure for the k-modes clustering algorithm”, Knowledge
Based Systems 26 (2012), Elsevier, pp 120-127.
[6] Z. He, X. Xu, S. Deng, ”Squeezer: an efficient algorithm for clustering
categorical data” Journal of Computational Science and Technology 17
(5) (2002) 611-624.
[7] Z. He, X. Xu, S. Deng, “Scalable algorithms for clustering large datasets
with mixed type attributes”, International Journal of Intelligent Systems
20 (10) (2005) 1077-1089.
[8] Z. X, Huang, M. K Ng, “A fuzzy k-modes algorithm for clustering
categorical data”, IEEE transactions on Fuzzy systems 7(4) (1999) 446-
452.
[9] D. W Kim, K. H Lee, D. Lee, “Fuzzy clustering of categorical data using
fuzzy centroids”, Pattern recognition letters 25 (2004) 1263-1271.
[10] M. K Ng, M. J Li, Z. X Huang, Z. Y He “On the impact of dissimilarity
measure in k-modes clustering algorithm.” IEEE transactions on Pattern
Analysis and Machine Intelligence 29 (3) (2007) 503-507.
[11] D. Gibson, J. Kleinberg, P. Raghavan, “Clustering categorical data: an
approach based on dynamical systems”, Proceedings of the 24th VLDB
Conference, New York, 1998, pp 311-322.
[12] S. Guha, R. Rastogi, K. Shim, “ROCK: a robust clustering algorithm for
categorical attributes”Proceedings of the IEEEInternationalConference
on Data Engineering, Sydney, Australia 1999, pp 512-521.
[13] Ng M. K., Li M. J, Huang J. H, He Z, “On the impact of dissimilarity
measure in k-modes clustering algorithm.” IEEE transactions on Pattern
Analysis and Machine Intelligence 29 (3); 503-507, 2007.
[14] A. Chaturvedi, Paul E. Green and J.D Caroll, “K-modes clustering.”,
Journal of classification, Vol.18, No 1, pp 35-55, 2001.
[15] Ralambondrainy, H, “A conceptual version of the k-means algorithm.”
Pattern recognition Letters 16, 1147-1157, 1995.
[16] Semeh Ben Salem, Sami Naouali, “Reducing the multidimensionality of
OLAP cubes with Genetic Algorithms and Multiple Correspondence
Analysis”, international conference on Advanced Wireless, Information,
and Communication Technologies (AWICT 2015), Tunisia.
[17] Semeh Ben Salem, Sami Naouali, “Towards Reducing the
multidimensionality of OLAP cubes using the Evolutionary Algorithms
and Factor Analysis Method”, International Journal of Data Mining and
Knowledge Management Process (IJDKM 2016).
[18] Semeh Ben Salem and Sami Naouali, “Pattern Recognition Approach in
Multidimensional Databases: Application to the Global Terrorism
Database” International Journal of Advanced Computer Science and
Applications (IJACSA), 7(8), 2016.

International Scholarly and Scientific Research & Innovation 11(6) 2017 662 scholar.waset.org/1999.4/10007221

The Modern Information Environment PDF
No ratings yet
The Modern Information Environment PDF
62 pages
Dke Clustering
No ratings yet
Dke Clustering
26 pages
Approval Data Sets To Demonstrate The Clustering Performance of The Two Algorithms. Our
No ratings yet
Approval Data Sets To Demonstrate The Clustering Performance of The Two Algorithms. Our
1 page
Clustering Algorithms On Data Mining in Categorical Database
No ratings yet
Clustering Algorithms On Data Mining in Categorical Database
4 pages
A Fast and Effective Partitional Clustering Algorithm For Large Categorical Datasets Using A K-Means Based Approach
No ratings yet
A Fast and Effective Partitional Clustering Algorithm For Large Categorical Datasets Using A K-Means Based Approach
21 pages
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
No ratings yet
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
8 pages
A Two Step Clustering Method For Mixed Categorical and Numerical Data
No ratings yet
A Two Step Clustering Method For Mixed Categorical and Numerical Data
9 pages
The K-Modes Algorithm With Entropy Based Similarity Coefficient
No ratings yet
The K-Modes Algorithm With Entropy Based Similarity Coefficient
6 pages
Pham 2011
No ratings yet
Pham 2011
17 pages
A Fuzzy K-Modes Algorithm For Clustering Categorical Data
No ratings yet
A Fuzzy K-Modes Algorithm For Clustering Categorical Data
8 pages
Salem2021 Article ARoughSetBasedAlgorithmForUpda
No ratings yet
Salem2021 Article ARoughSetBasedAlgorithmForUpda
34 pages
Fast_and_Robust_General_Purpose_Clustering_Algorit
No ratings yet
Fast_and_Robust_General_Purpose_Clustering_Algorit
29 pages
A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters
No ratings yet
A Conceptual Version of The K-Means Algorithm: Pattern Recognition Letters
11 pages
A Fuzzy SV-k-modes Algorithm For Clustering Categorical Data With Set-Valued Attributes
No ratings yet
A Fuzzy SV-k-modes Algorithm For Clustering Categorical Data With Set-Valued Attributes
15 pages
Research on k Mean Algorithm
No ratings yet
Research on k Mean Algorithm
5 pages
Comparison of K-Means and Fuzzy C-Means Algorithms On Different Cluster Structures
No ratings yet
Comparison of K-Means and Fuzzy C-Means Algorithms On Different Cluster Structures
11 pages
ComparisonofK MeansandFuzzyC MeansAlgorithmsonDifferentClusterStructures
No ratings yet
ComparisonofK MeansandFuzzyC MeansAlgorithmsonDifferentClusterStructures
11 pages
K ANMIClusteringCategoricalData
No ratings yet
K ANMIClusteringCategoricalData
11 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
K-Means and K-Modes Algorithms To Allow For Clustering Objects Described by Mix The K
No ratings yet
K-Means and K-Modes Algorithms To Allow For Clustering Objects Described by Mix The K
1 page
K-Means and K-Modes Algorithms To Allow For Clustering Objects Described by Mix The K
No ratings yet
K-Means and K-Modes Algorithms To Allow For Clustering Objects Described by Mix The K
1 page
V5I5201647
No ratings yet
V5I5201647
13 pages
anupama luthra_2011
No ratings yet
anupama luthra_2011
21 pages
Clustering Large Data Sets With Mixed Numeric and Categorical Values
No ratings yet
Clustering Large Data Sets With Mixed Numeric and Categorical Values
14 pages
Similarity Measure For Categorical Data
No ratings yet
Similarity Measure For Categorical Data
8 pages
Fuzzy K-Modes Algorithm
No ratings yet
Fuzzy K-Modes Algorithm
7 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
Clustering Algorithm With Learnable Distance for Categorical Data With Nominal and Ordinal Attributes
No ratings yet
Clustering Algorithm With Learnable Distance for Categorical Data With Nominal and Ordinal Attributes
5 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
No ratings yet
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
5 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
Alalyan MASc S2021
No ratings yet
Alalyan MASc S2021
56 pages
Spherical K-Means Clustering
No ratings yet
Spherical K-Means Clustering
22 pages
K-Means Clustering Algorithm and Its Improvement R
No ratings yet
K-Means Clustering Algorithm and Its Improvement R
6 pages
DMDW 5th Module
No ratings yet
DMDW 5th Module
28 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
AK-means: An Automatic Clustering Algorithm Based On K-Means
No ratings yet
AK-means: An Automatic Clustering Algorithm Based On K-Means
6 pages
K Prototype Mixed
No ratings yet
K Prototype Mixed
1 page
Noah Laith
No ratings yet
Noah Laith
62 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
Big Data
No ratings yet
Big Data
7 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Genetic Algorithms For Multi-Criterion Classification and Clustering in Data Mining
No ratings yet
Genetic Algorithms For Multi-Criterion Classification and Clustering in Data Mining
12 pages
A Relational Graph Based Approach Using Multi-Attribute Closure Measure For Categorical Data Clustering
No ratings yet
A Relational Graph Based Approach Using Multi-Attribute Closure Measure For Categorical Data Clustering
5 pages
07-Clustering
No ratings yet
07-Clustering
54 pages
Expert Systems With Applications: Jing Xiao, Yuping Yan, Jun Zhang, Yong Tang
No ratings yet
Expert Systems With Applications: Jing Xiao, Yuping Yan, Jun Zhang, Yong Tang
8 pages
I Jsa It 01132012
No ratings yet
I Jsa It 01132012
5 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Standardization and Its Effects On K-Means Clustering Algorithm
No ratings yet
Standardization and Its Effects On K-Means Clustering Algorithm
6 pages
A Global K-Modes Algorithm For Clustering Categorical Data
No ratings yet
A Global K-Modes Algorithm For Clustering Categorical Data
6 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
February 2024-: Top Read Articles in Computer Science & Information Technology
No ratings yet
February 2024-: Top Read Articles in Computer Science & Information Technology
35 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
3 Comparison-Of-Conventional-And-Rough-Kmeans-Clustering
No ratings yet
3 Comparison-Of-Conventional-And-Rough-Kmeans-Clustering
8 pages
applsci-11-11246-v2
No ratings yet
applsci-11-11246-v2
62 pages
Dynamic Approach To K-Means Clustering Algorithm-2
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
16 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
E-Business Project. ERP WMS Systems
No ratings yet
E-Business Project. ERP WMS Systems
15 pages
5k mix 2
No ratings yet
5k mix 2
126 pages
ACM Journal Format
No ratings yet
ACM Journal Format
6 pages
Shortcuts NetBeans
No ratings yet
Shortcuts NetBeans
2 pages
Ranjaitha Presentati
No ratings yet
Ranjaitha Presentati
18 pages
C750 Bagging Controller
No ratings yet
C750 Bagging Controller
6 pages
HDK Switch Mode Power Supply
No ratings yet
HDK Switch Mode Power Supply
6 pages
Mysql & SQL - Introduction: Course: Nosql
100% (1)
Mysql & SQL - Introduction: Course: Nosql
88 pages
The Control Unit: The Control Unit Manages Four Basic Operations (Fetch, Decode, Execute, and Write-Back)
No ratings yet
The Control Unit: The Control Unit Manages Four Basic Operations (Fetch, Decode, Execute, and Write-Back)
7 pages
MR28 Datasheet
No ratings yet
MR28 Datasheet
17 pages
Number Bases
No ratings yet
Number Bases
26 pages
Comm Exp Merged
No ratings yet
Comm Exp Merged
41 pages
Modern Substation Automation Protection Systems
100% (1)
Modern Substation Automation Protection Systems
65 pages
PDC CrewConnex App
No ratings yet
PDC CrewConnex App
21 pages
2024 State of The DC Report
No ratings yet
2024 State of The DC Report
20 pages
Otp Validation: Project Report On
100% (1)
Otp Validation: Project Report On
10 pages
Hetman: Artificial Leader Ver. 1.22
0% (1)
Hetman: Artificial Leader Ver. 1.22
43 pages
CCE Detailed Syllabus
No ratings yet
CCE Detailed Syllabus
106 pages
Harsh Chaudhary: 2018UME1704@mnit - Ac.in
No ratings yet
Harsh Chaudhary: 2018UME1704@mnit - Ac.in
1 page
Annual Exams Time TableI 1 to 8
No ratings yet
Annual Exams Time TableI 1 to 8
1 page
Current and Future Trends in Media and Information
No ratings yet
Current and Future Trends in Media and Information
4 pages
Organic Fertilizer Company Profile by Slidesgo
No ratings yet
Organic Fertilizer Company Profile by Slidesgo
41 pages
The Digitally Savvy CFO: Executive Summary
No ratings yet
The Digitally Savvy CFO: Executive Summary
7 pages
MELSEC PLC USB Driver For Windows7 64bit Systems
No ratings yet
MELSEC PLC USB Driver For Windows7 64bit Systems
3 pages
Dapus Uro
No ratings yet
Dapus Uro
3 pages
Authorization Letter
100% (1)
Authorization Letter
2 pages
Hive 1
No ratings yet
Hive 1
7 pages
EIUL-BBP-PS-STRUCTURES - Erection-V4 - Water
No ratings yet
EIUL-BBP-PS-STRUCTURES - Erection-V4 - Water
12 pages
Dbb-06 Technical Manual 879en-R1
No ratings yet
Dbb-06 Technical Manual 879en-R1
154 pages