0% found this document useful (0 votes)

20 views63 pages

Lec12 4

The document discusses unsupervised learning techniques for analyzing unlabeled data. It introduces clustering as a way to group unlabeled data points into clusters based on similarity. It describes k-means clustering, which aims to minimize the sum of squared distances between data points and their assigned cluster centers. The k-means algorithm works by alternating between assigning points to the nearest cluster center, and updating the cluster centers to be the average of points within each cluster.

Uploaded by

lucaszhuuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views63 pages

Lec12 4

Uploaded by

lucaszhuuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Unsupervised Learning

吉建民

USTC
[email protected]

2023 年 6 月 4 日

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Used Materials

Disclaimer: 本课件采用了 S. Russell and P. Norvig’s Artificial

Intelligence –A modern approach slides, 徐林莉老师课件和其他网
络课程课件，也采用了 GitHub 中开源代码，以及部分网络博客
内容

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents

Unsupervised Learning
Clustering
Principle Component Analysis

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Supervised learning has many successes

▶ Document classification
▶ Protein prediction
▶ Face recognition
▶ Speech recognition
▶ Vehicle steering etc.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
However...

▶ Labeled data can be rare or expensive in many real

applications

- Speech
- Medical data
- Protein
- ···

▶ Unlabeled data is much cheaper and abundant

Question: Can we use unlabeled data to help?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Unsupervised learning

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

data?
▶ Groups or clusters in the data

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Unsupervised learning

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

data?
▶ Groups or clusters in the data
▶ Density estimation (密度估计)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Unsupervised learning

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

data?
▶ Groups or clusters in the data
▶ Density estimation (密度估计)
▶ Low-dimensional structure
▶ Principal Component Analysis 主元分
析 (PCA) (linear)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Unsupervised learning

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

data?
▶ Groups or clusters in the data
▶ Density estimation (密度估计)
▶ Low-dimensional structure
▶ Principal Component Analysis 主元分
析 (PCA) (linear)
▶ Manifold learning 流行学习
(non-linear)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents

Unsupervised Learning
Clustering
Principle Component Analysis

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Clustering

▶ Are there any “groups” in the data ?

▶ What is each group ?
▶ How many ?
▶ How to identify them?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Clustering
▶ Group the data objects into subsets
or “clusters”:
▶ High similarity within clusters
▶ Low similarity between clusters

▶ A common and important task

that finds many applications in
Science, Engineering, information
Science, and other places
▶ Group genes that perform the
same function
▶ Group individuals that has similar
political view
▶ Categorize documents of similar
topics
▶ Identify similar objects from
pictures

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Clustering

▶ Input: training set of input point

Dtrain = {x1 , . . . , xn }
▶ Output: assignment of each point to a cluster

( C(1), . . . , C(n) ) where C(i) ∈ { 1, . . . , k }

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means clustering

Create centers and assign points to centers to minimize sum of

squared distance

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means objective

▶ Each cluster is represented by a centroid µ

▶ Encode each point by its cluster center, pay a cost for
deviation
▶ Loss function based on reconstruction
∑
n
2
Losskmeans µC(j) − xj
j=1

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means algorithm

∑n 2
▶ Goal: minµ minC j=1 µC(j) − xj

▶ Strategy: alternating minimization

▶ Step 1: if know cluster centers µ, can find best C
▶ Step 2: if know cluster assignments C, can find best cluster
centers

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means algorithm
Optimize loss function Loss(µ, C)

∑
n
2
min min µC(j) − xj
µ C
j=1

(1) Fix µ, optimize C

∑
n
2
min µC(j) − xj
C(1),C(2),...,C(n)
j=1

Assign each point to the nearest cluster center

(2) Fix C, optimize µ

∑
n
2
min µC(j) − xj
µ(1),µ(2),...,µ(k)
j=1

Solution: average of points in cluster i, exactly second step

(re-center) .
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.
K-Means

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means clustering: Example

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means clustering: Example
Repeat until convergence

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Properties of K-means algorithm

▶ Guaranteed to converge in a finite number of iterations

▶ To a local minimum
▶ The objective is non-convex, so coordinate descent on is not
guaranteed to converge to the global minimum
▶ Running time per iteration: simple and efficient
▶ Assign data points to closest cluster center

O(KN)

▶ Change the cluster center to the average of its assigned points

O(N)

▶ Different initialization will lead to different results

▶ K-means problem is NP-hard （之前公式的最优解）
▶ Not robust to noise and outliers
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means convergence

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means getting stuck

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
K-means not able to properly cluster

Changing the features (distance function) can help

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Table of Contents

Unsupervised Learning
Clustering
Principle Component Analysis

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Principle component analysis

▶ What is dimensionality reduction?

▶ Why dimensionality reduction?
▶ Principal Component Analysis (PCA)
▶ Nonlinear PCA using Kernels

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
What is dimensionality reduction?

▶ Dimensionality reduction refers to the mapping of the original

high-dimensional data onto a lower-dimensional space.
- Criterion for dimensionality reduction can be different based on
different problem settings.
▶ Unsupervised setting: minimize the information loss
最近重构性：样本点到这个超平面的距离都足够近
▶ Supervised setting: maximize the class discrimination
最大可分性：样本点在这个超平面上的投影能尽可能分开
▶ 对样本进行中心化处理以后，两者等价

▶ Given a set of data points of d dimension variables

{x1 , x2 , . . . , xn }
▶ Compute the linear transformation (projection)

P ∈ Rd×m : x ∈ Rd → y = P⊤ x ∈ Rm (m << d)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
What is dimensionality reduction?

P ∈ Rd×m : x ∈ Rd → y = P⊤ x ∈ Rm

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
High-dimensional data

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Why dimensionality reduction?

▶ Most machine learning and data mining techniques may

not be effective for high-dimensional data
▶ Curse of Dimensionality
▶ Query accuracy and efficiency degrade rapidly as the dimension
increases.
▶ The intrinsic dimension may be small.
▶ For example, the number of genes responsible for a certain
type of disease may be small.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Curse of Dimensionality (维数灾难)
▶ When dimensionality increases, data
becomes increasingly sparse in the space
that it occupies
▶ Definitions of density and distance
between points, which is critical for
clustering and outlier detection, become
less meaningful
▶ If N1 = 100 represents a dense sample for
a single input problem, then N10 = 10010
is the sample size required for the same ▶ Randomly generate
sampling density with dimension 10. 500 points
▶ The proportion of a hypersphere (超球面) ▶ Compute difference
with radius r and dimension d, to that of a between max and
hyercube (超立方体) with sides of length min distance
2r and dimension d converges to 0 as d between any pair of
goes to infinity —nearly all of the points
high-dimensional space is “far away” from
the center . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
High dimensional spaces are empty

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lost in space
Let’s consider a hypersphere of radius r inscribed in a hypercube
with sides of length 2r. Then take the ratio of the volume (体积)
of the hypersphere to the hypercube. We observe the following
trends.
▶ in 2 dimensions:
V(S2 (r)) πr2
= 2 = 78.5%
V(H2 (2r)) 4r

▶ in 3 dimensions:
4
V(S3 (r)) πr3
= 3 3 = 52.4%
V(H3 (2r)) 8r

▶ when the dimensionality d increases

asymptotically

V(Sd (r)) π d/2

lim = lim d d →0
d→∞ V(Hd (2r)) d→∞ 2 Γ( + 1)
2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Why dimensionality reduction?

▶ Visualization: projection of high-dimensional

data onto 2D or 3D.
▶ Data compression: efficient storage and retrieval
▶ Noise removal: positive effect on query accuracy.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Application of feature reduction

▶ Face recognition
▶ Handwritten digit recognition
▶ Text mining
▶ Image retrieval
▶ Microarray data analysis
▶ Protein classification
▶ ···

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
What is Principal Component Analysis?

▶ Principal component analysis (PCA)

- Reduce the dimensionality of a data set by finding a new set of
variables, smaller than the original set of variables
- Retains most of the sample’s information.
- Useful for the compression and classification of data.
▶ By information we mean the variation present in the sample,
given by the correlations between the original variables.
▶ The new variables, called principal components (PCs), are
uncorrelated, and are ordered by the fraction of the total
information each retains.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Principal components (PCs)

Given n points in a d dimensional space, for large d, how does one

project on to a low dimensional space while preserving broad trends
in the data and allowing it to be visualized?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Geometric picture of principal components

▶ Given n points in a d dimensional space, for large d, how does

one project on to a 1 dimensional space

▶ Choose a line that fits the data so the points are spread out
well along the line

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Let us see it on a figure

PCA 希望降维后信息损失最小，可以理解为投影后的数据尽可能的分
开，这种分散程度可以用方差来表示 (µ 为均值)：

1∑
n
Var(a) = (ai − µ)2
n i=1

对数据进行中心化后，即 µ = 0：

1∑ 2
n
Var(a) = a
n i=1 i
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Geometric picture of principal components
对数据进行中心化:

1∑
n
x̄ = xi ,
n
i=1
x′i = xi − x̄, 1 ≤ i ≤ n.

对于中心化以后的数据，即 x̄′ = 0，以下说法等价: Find a line

that
▶ maximize the variance of the projected data
▶ maximize the sum of squares of data samples’ projections on
that line
▶ minimize the sum of squares of distances to the line

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic Interpretation — 1D

▶ Minimizing sum of squares of distances to the line is the same

as maximizing the sum of squares of the projections on that
line, thanks to Pythagoras (毕达哥拉斯).

投影长度为: x⊤ ∥w∥
w

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic Interpretation — 1D

投影长度为: x⊤ u = u⊤ x subject to u⊤ u = 1

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Geometric picture of principal components

▶ the 1st PC u1 is a minimum distance fit to a line in X space

▶ the 2nd PC u2 is a minimum distance fit to a line in the
plane perpendicular (垂直于) to the 1st PC
PCs are a series of linear least squares fits to a sample, each
orthogonal (垂直于) to all the previous.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic derivation of PCs

▶ Given a sample of n observations on a vector of d variables

{ x1 , x2 , . . . , xn } ∈ Rd

▶ First project the data onto a one-dimensional space with a d

-dimensional vector u1 (where u⊤ 1 u1 = 1):
{ }
u⊤ ⊤ ⊤
1 x1 , u1 x2 , · · · , u1 xn

▶ Find u1 to maximize the variance the projected data:

1 ∑( ⊤ )2
n
u1 x i − u⊤
1 x̄ = u⊤
1 Su1
n
i=1
∑n ∑n
Where x̄ = 1
n i=1 xi and S = 1
n i=1 (xi − x̄) (xi − x̄)⊤

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic derivation of PCs
▶ To solve maxu1 u⊤ ⊤
1 Su1 subject to u1 u1 = 1
▶ Let λ1 be a Lagrangian multiplier (拉格朗日乘子)
( )
L = u⊤1 Su 1 + λ1 1 − u ⊤
1 u1

∂L
= Su1 − λ1 u1 = 0
∂u1
Su1 = λ1 u1
⇒ u1 is an eigenvector (特征向量)
u⊤
1 Su1 = λ1
⇒ u1 corresponds to the eigenvector with the largest eigenvalue λ1

▶ 即，maxu1 u⊤ ⊤
1 Su1 subject to u1 u1 = 1 的结果就是矩阵 S
的最大特征值
▶ 矩阵 S 特征值计算方法：构造特征多项式 | S − λI | = 0 (I 为
单位矩阵)，特征值为线性方程组的解
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic derivation of PCs

▶ To find the second component u2

▶ Solve the following

max u⊤ ⊤ ⊤
2 Su2 subject to u2 u2 = 1 & u1 u2 = 0
u2

- u2 is the eigenvector with the second largest eigenvalue λ2

···

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic derivation of PCs
▶ Main steps for computing PCs
▶ Calculate the covariance matrix S

1∑
n
⊤
S= (xi − x̄) (xi − x̄)
n i=1

or first center the data: { x′1 , x′2 , . . . , x′n } and x̄′ = 0

[ ] 1
let X = x′1 , x⊤ ′
2 , . . . , xn ∈ R
d×n
; then S = XX⊤
n
m
▶ Find the first m eigenvectors {ui }i=1
▶ Form the projection matrix

P = [u1 u2 · · · um ] ∈ Rd×m

▶ A new test point can be projected as:

xnew ∈ Rd → P⊤ xnew ∈ Rm
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Algebraic derivation of PCs

y = P⊤ x ∈ Rm

▶ Getting the old data back?

- If P is a square matrix (方阵), we can recover x by
( )−1
x = P⊤ y = Py = PP⊤ x

注：u⊤ ⊤ ⊤
i ui = 1 and ui uj = 0 for i ̸= j, then P P = Im (where
⊤ −1
m = n) and (P ) = P
▶ Here P is not full (m << d), but we can still recover x by
x = Py = PP⊤ x, and lose some information
▶ Objective:
▶ Lose least amount of information

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Optimality property of PCA

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Optimality property of PCA
Main theoretical result:
The matrix P consisting of the first m eigenvectors of the
covariance matrix S solves the following min problem:

Notice that, for a matrix A m × n and B n × m,

∑m ∑n
trace(AB) = trace(BA) = i=1 j=1 aij bji
∑d ∑n ∑d ∑n
arg minP i=1 j=1 (xij − x′ij )2 is equivalent to arg maxP i=1 j=1 xij x′ij ,
∑d ∑n
as i=1 j=1 x′ij = trace((PPT X)T PPT X) = trace(XT PPT X)
2

PCA projection minimizes the reconstruction error among all linear

projections of size m. .
. .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
PCA for image compression

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Nonlinear PCA using Kernels

Rewrite PCA in terms of dot products

∑
▶ Assume the data has been centered, i.e., i xi =0
▶ 1∑ ⊤
The covariance matrix S can be written as S = n i xi xi
▶ If u is an eigenvector of S corresponding to nonzero eigenvalue
1∑ ⊤ 1 ∑( ⊤ )
Su = xi xi u = λu ⇒ u = xi u xi
n nλ
i i

▶ Eigenvectors of S lie in the space spanned by all data points

Kernel methods:
▶ denote the representation of x as φ(x)
▶ define the kernel function k : X × X → R by
k(xi , xj ) = φ(xi )⊤ φ(xj )
▶ define the kernel matrix K: Kij = k(xi , xj )
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Nonlinear PCA using Kernels

1∑ ⊤ 1 ∑( ⊤ )
Su = xi xi u = λu ⇒ u = xi u xi
n nλ
i i

The covariance matrix can be written in matrix form

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Nonlinear PCA

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Comments on PCA

▶ Linear dimensionality reduction method

▶ Can be kernelized
▶ Many nonlinear dimensionality reduction methods (Isomap,
graph Laplacian eigenmap, and locally linear embedding/LLE)
can be described as kernel PCA with a special kernel

▶ Non-convex optimization problem

▶ But easy to solve…

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Want to Learn More?
▶ Machine Learning: a Probabilistic Perspective, K. Murphy
▶ Pattern Classification, R. Duda, P. Hart, and D. Stork.
Standard pattern recognition textbook. Limited to
classification problems. Matlab code.
http://rii.ricoh.com/~stork/DHS.html
▶ Pattern recognition and machine learning. C. Bishop
▶ The Elements of statistical Learning: Data Mining, Inference,
and Prediction. T. Hastie, R. Tibshirani, J. Friedman,
Standard statistics textbook. Includes all the standard
machine learning methods for classification, regression,
clustering. R code. http://www-stat-class.stanford.
edu/~tibs/ElemStatLearn/
▶ Introduction to Data Mining, P.-N. Tan, M. Steinbach, V.
Kumar. AddisonWesley, 2006
▶ Principles of Data Mining, D. Hand, H. Mannila, and P.
Smyth. MIT Press, 2001
▶ 统计学习方法，李航 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Machine Learning in AI

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Machine Learning History

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Summary

▶ Supervised learning
▶ Learning Decision Trees
▶ K Nearst Neighbor Classfier
▶ Linear Predictions
▶ Support Vector Machines
▶ Unsupervised learning
▶ Clustering
▶ Principle Component Analysis

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
作业

▶ K-means 算法是否一定会收敛? 如果是，给出证明过程；如

果不是，给出说明。

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

9 Som
No ratings yet
9 Som
32 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
43 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Lect 6 - Clustering
No ratings yet
Lect 6 - Clustering
50 pages
Session 37 CO4 Unsupervised Learning
No ratings yet
Session 37 CO4 Unsupervised Learning
34 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
ML Lecture14
No ratings yet
ML Lecture14
17 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
24 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
10 Lecture AI 10
No ratings yet
10 Lecture AI 10
48 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
Lec10 Clustering
No ratings yet
Lec10 Clustering
19 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Lec 04
No ratings yet
Lec 04
70 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
Clustering Lec 1 Introduction To Clustering
No ratings yet
Clustering Lec 1 Introduction To Clustering
48 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
20 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Data Clustering & Classification
No ratings yet
Data Clustering & Classification
102 pages
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
No ratings yet
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
48 pages
Machine Learning: Clustering & Algorithms
No ratings yet
Machine Learning: Clustering & Algorithms
66 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Unit 3
No ratings yet
Unit 3
34 pages
Clustering
No ratings yet
Clustering
82 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
55 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
UNIT 4 ML Notes
No ratings yet
UNIT 4 ML Notes
22 pages
Lect 4
No ratings yet
Lect 4
34 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
02 - KNN & Regression
No ratings yet
02 - KNN & Regression
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Clustering and K-Means Lecture
No ratings yet
Clustering and K-Means Lecture
36 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
61 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Class: XI Subject: Computer Question Bank Fill in The Blanks
No ratings yet
Class: XI Subject: Computer Question Bank Fill in The Blanks
2 pages
Ch18 Excel365 AdvancedDataAnalysisEx1 62
No ratings yet
Ch18 Excel365 AdvancedDataAnalysisEx1 62
57 pages
Cristiane Aniceto Da Silva - CV - Fev2023 - ETL DEv
No ratings yet
Cristiane Aniceto Da Silva - CV - Fev2023 - ETL DEv
3 pages
Linux File System Management Overview
No ratings yet
Linux File System Management Overview
18 pages
Discussion Difference Between Collecting Data Using Individual Interviews and A Focus Group
No ratings yet
Discussion Difference Between Collecting Data Using Individual Interviews and A Focus Group
3 pages
Introduction to DBMS and SQL Basics
No ratings yet
Introduction to DBMS and SQL Basics
145 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
Smo PDF
No ratings yet
Smo PDF
4 pages
DWDM 1st Mid R2031053
No ratings yet
DWDM 1st Mid R2031053
7 pages
Dsbda 4
No ratings yet
Dsbda 4
16 pages
Digital Forensics Final Exam Solutions
No ratings yet
Digital Forensics Final Exam Solutions
3 pages
Analyst, Business Planning - Review
No ratings yet
Analyst, Business Planning - Review
9 pages
Information Retrieval Systems Exam Papers
No ratings yet
Information Retrieval Systems Exam Papers
6 pages
Selected Writings On Race and Difference - (Cover)
No ratings yet
Selected Writings On Race and Difference - (Cover)
5 pages
SQL Data Manipulation Exercise
No ratings yet
SQL Data Manipulation Exercise
14 pages
Entry-Level Data Scientist Profile
No ratings yet
Entry-Level Data Scientist Profile
3 pages
Mil l6 Media and Information Resources For Students
No ratings yet
Mil l6 Media and Information Resources For Students
27 pages
A Metadata Repository Tables
No ratings yet
A Metadata Repository Tables
2 pages
YAC: Bridging Natural Language and Interactive Visual Exploration With Generative AI For Biomedical Data Discovery
No ratings yet
YAC: Bridging Natural Language and Interactive Visual Exploration With Generative AI For Biomedical Data Discovery
25 pages
GP 00 02 Engineering Technical Practice (ETP) Governance and Management Principles
No ratings yet
GP 00 02 Engineering Technical Practice (ETP) Governance and Management Principles
10 pages
Aol Assessment Form-Dtsc6013001 - Data Mining and Visualization
No ratings yet
Aol Assessment Form-Dtsc6013001 - Data Mining and Visualization
4 pages
Iso 27269 2021
No ratings yet
Iso 27269 2021
86 pages
Project Thesis2
No ratings yet
Project Thesis2
13 pages
Travel App Design and Cloud Migration Insights
100% (1)
Travel App Design and Cloud Migration Insights
6 pages
1 Bit 20803 Chapter 1 Dbconcepts
No ratings yet
1 Bit 20803 Chapter 1 Dbconcepts
41 pages
Akash (Whitepaper)
No ratings yet
Akash (Whitepaper)
4 pages
Data Analytics and Blockchain
No ratings yet
Data Analytics and Blockchain
22 pages
Unit Test 5-Cataloguing and Classification-Answer Key: Join Our Google Groups
No ratings yet
Unit Test 5-Cataloguing and Classification-Answer Key: Join Our Google Groups
6 pages
ADC-IMS-PR-17 Document and Record Control
No ratings yet
ADC-IMS-PR-17 Document and Record Control
5 pages
Virtual Reference Service
No ratings yet
Virtual Reference Service
7 pages

Lec12 4

Uploaded by

Lec12 4

Uploaded by

Unsupervised Learning

Disclaimer: 本课件采用了 S. Russell and P. Norvig’s Artificial

▶ Labeled data can be rare or expensive in many real

▶ Unlabeled data is much cheaper and abundant

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

Learning from unlabeled data (without supervision)

▶ What can we predict from unlabeled

▶ Are there any “groups” in the data ?

▶ A common and important task

▶ Input: training set of input point

( C(1), . . . , C(n) ) where C(i) ∈ { 1, . . . , k }

Create centers and assign points to centers to minimize sum of

▶ Each cluster is represented by a centroid µ

▶ Strategy: alternating minimization

(1) Fix µ, optimize C

Assign each point to the nearest cluster center

Solution: average of points in cluster i, exactly second step

▶ Guaranteed to converge in a finite number of iterations

▶ Change the cluster center to the average of its assigned points

▶ Different initialization will lead to different results

Changing the features (distance function) can help

▶ What is dimensionality reduction?

▶ Dimensionality reduction refers to the mapping of the original

▶ Given a set of data points of d dimension variables

▶ Most machine learning and data mining techniques may

▶ when the dimensionality d increases

V(Sd (r)) π d/2

▶ Visualization: projection of high-dimensional

▶ Principal component analysis (PCA)

Given n points in a d dimensional space, for large d, how does one

▶ Given n points in a d dimensional space, for large d, how does

对于中心化以后的数据，即 x̄′ = 0，以下说法等价: Find a line

▶ Minimizing sum of squares of distances to the line is the same

▶ the 1st PC u1 is a minimum distance fit to a line in X space

▶ Given a sample of n observations on a vector of d variables

▶ First project the data onto a one-dimensional space with a d

▶ Find u1 to maximize the variance the projected data:

▶ To find the second component u2

- u2 is the eigenvector with the second largest eigenvalue λ2

or first center the data: { x′1 , x′2 , . . . , x′n } and x̄′ = 0

▶ A new test point can be projected as:

▶ Getting the old data back?

Notice that, for a matrix A m × n and B n × m,

PCA projection minimizes the reconstruction error among all linear

Rewrite PCA in terms of dot products

▶ Eigenvectors of S lie in the space spanned by all data points

The covariance matrix can be written in matrix form

▶ Linear dimensionality reduction method

▶ Non-convex optimization problem

▶ K-means 算法是否一定会收敛? 如果是，给出证明过程；如

You might also like