unsupervised-learning
unsupervised-learning
Learning
Supervised learning vs.
unsupervised learning
Supervised learning: discover patterns in the
data that relate data attributes with a target (class)
attribute.
These patterns are then utilized to predict the
structures in them.
2
Clustering
Clustering is a technique for finding similarity groups
in data, called clusters.
Clustering is often called an unsupervised learning
task as no class values denoting an a priori grouping
of the data instances are given, which is the case in
supervised learning.
Due to historical reasons, clustering is often
considered synonymous with unsupervised learning.
In fact, association rule mining is also
unsupervised.
3
An illustration
The data set has three natural groups of data points,
i.e., 3 natural clusters.
4
What is clustering for?
Let us see some real-life examples
Example 1: groups people of similar sizes
together to make “small”, “medium” and
“large” T-Shirts.
Tailor-made for each person: too expensive
One-size-fits-all: does not fit all.
Example 2: In marketing, segment
customers according to their similarities
To do targeted marketing.
5
What is clustering for?
(cont…)
Example 3: Given a collection of text documents,
we want to organize them according to their content
similarities,
To produce a topic hierarchy
6
Why Unsupervised Learning?
Unsupervised machine learning finds all kind of
unknown patterns in data.
Unsupervised methods help you to find features
which can be useful for categorization.
It is taken place in real time, so all the input data
to be analyzed and labeled in the presence of
learners.
It is easier to get unlabeled data from a
computer than labeled data, which needs
manual intervention.
7
Aspects of clustering
A clustering algorithm
Partition clustering
Hierarchical clustering
A distance (similarity, or dissimilarity) function
Clustering quality
Inter-clusters distance maximized
Intra-clusters distance minimized
The quality of a clustering result depends on
the algorithm, the distance function, and the
application.
8
Clustering Types
There are different types of clustering you
can utilize:
Exclusive (partitioning) : In this clustering method,
Data are grouped in such a way that one data can
belong to one cluster only.
Example: K-means
10
Algorithm
Apriori algorithm
K-mean
Agglomerative Clustering
DBSCAN
SVM
Density based Cluster
11
Applications
Clustering automatically split the dataset into groups
base on their similarities
Anomaly detection can discover unusual data points
in your dataset. It is useful for finding fraudulent
transactions
Association mining identifies sets of items which
often occur together in your dataset
Latent variable models are widely used for data
preprocessing. Like reducing the number of features
in a dataset or decomposing the dataset into
multiple components
12
Disadvantages
You cannot get precise information regarding data
sorting, and the output as data used in unsupervised
learning is labeled and not known
Less accuracy of the results is because the input
data is not known and not labeled by people in
advance. This means that the machine requires to
do this itself.
The spectral classes do not always correspond to
informational classes.
The user needs to spend time interpreting and label
the classes which follow that classification.
13