0% found this document useful (0 votes)
3 views13 pages

unsupervised-learning

The document discusses the differences between supervised and unsupervised learning, emphasizing that unsupervised learning, particularly clustering, identifies intrinsic structures in data without predefined target attributes. It outlines various clustering techniques, their applications in real-life scenarios such as marketing and document organization, and highlights the advantages and disadvantages of unsupervised learning. Clustering is noted as a widely used data mining technique across multiple fields, with various algorithms and types available for implementation.

Uploaded by

Renee Winters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

unsupervised-learning

The document discusses the differences between supervised and unsupervised learning, emphasizing that unsupervised learning, particularly clustering, identifies intrinsic structures in data without predefined target attributes. It outlines various clustering techniques, their applications in real-life scenarios such as marketing and document organization, and highlights the advantages and disadvantages of unsupervised learning. Clustering is noted as a widely used data mining technique across multiple fields, with various algorithms and types available for implementation.

Uploaded by

Renee Winters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Unsupervised

Learning
Supervised learning vs.
unsupervised learning
 Supervised learning: discover patterns in the
data that relate data attributes with a target (class)
attribute.
 These patterns are then utilized to predict the

values of the target attribute in future data


instances.
 Unsupervised learning: The data have no
target attribute.
 We want to explore the data to find some intrinsic

structures in them.

2
Clustering
 Clustering is a technique for finding similarity groups
in data, called clusters.
 Clustering is often called an unsupervised learning
task as no class values denoting an a priori grouping
of the data instances are given, which is the case in
supervised learning.
 Due to historical reasons, clustering is often
considered synonymous with unsupervised learning.
 In fact, association rule mining is also

unsupervised.

3
An illustration
 The data set has three natural groups of data points,
i.e., 3 natural clusters.

4
What is clustering for?
 Let us see some real-life examples
 Example 1: groups people of similar sizes
together to make “small”, “medium” and
“large” T-Shirts.
 Tailor-made for each person: too expensive
 One-size-fits-all: does not fit all.
 Example 2: In marketing, segment
customers according to their similarities
 To do targeted marketing.

5
What is clustering for?
(cont…)
 Example 3: Given a collection of text documents,
we want to organize them according to their content
similarities,
 To produce a topic hierarchy

 In fact, clustering is one of the most utilized


data mining techniques.
 It has a long history, and used in almost every
field, e.g., medicine, psychology, botany,
sociology, biology, archeology, marketing,
insurance, libraries, etc.
 In recent years, due to the rapid increase of online
documents, text clustering becomes important.

6
Why Unsupervised Learning?
 Unsupervised machine learning finds all kind of
unknown patterns in data.
 Unsupervised methods help you to find features
which can be useful for categorization.
 It is taken place in real time, so all the input data
to be analyzed and labeled in the presence of
learners.
 It is easier to get unlabeled data from a
computer than labeled data, which needs
manual intervention.

7
Aspects of clustering
 A clustering algorithm
 Partition clustering
 Hierarchical clustering
 A distance (similarity, or dissimilarity) function
 Clustering quality
 Inter-clusters distance  maximized
 Intra-clusters distance  minimized
 The quality of a clustering result depends on
the algorithm, the distance function, and the
application.

8
Clustering Types
 There are different types of clustering you
can utilize:
 Exclusive (partitioning) : In this clustering method,
Data are grouped in such a way that one data can
belong to one cluster only.
 Example: K-means

 Agglomerative: In this clustering technique,


every data is a cluster. The iterative unions
between the two nearest clusters reduce the
number of clusters.
 Example: Hierarchical clustering
9
Clustering Types(Contd..)

 Probabilistic: This technique uses probability


distribution to create the clusters.

 Overlapping: In this technique, fuzzy sets is used


to cluster data. Each point may belong to two or
more clusters with separate degrees of
membership.
Here, data will be associated with an appropriate
membership value.

10
Algorithm

 Apriori algorithm
 K-mean
 Agglomerative Clustering
 DBSCAN
 SVM
 Density based Cluster

11
Applications
 Clustering automatically split the dataset into groups
base on their similarities
 Anomaly detection can discover unusual data points
in your dataset. It is useful for finding fraudulent
transactions
 Association mining identifies sets of items which
often occur together in your dataset
 Latent variable models are widely used for data
preprocessing. Like reducing the number of features
in a dataset or decomposing the dataset into
multiple components

12
Disadvantages
 You cannot get precise information regarding data
sorting, and the output as data used in unsupervised
learning is labeled and not known
 Less accuracy of the results is because the input
data is not known and not labeled by people in
advance. This means that the machine requires to
do this itself.
 The spectral classes do not always correspond to
informational classes.
 The user needs to spend time interpreting and label
the classes which follow that classification.

13

You might also like