ATutorialonPrincipalComponentAnalysis[2014]资源-CSDN下载

需积分: 23 11 浏览量 2018-11-26 21:38:56 上传评论收藏 377KB PDF 举报

主成分分析（PCA）是一种在现代数据分析中广泛使用的技术，它作为一个“黑箱”，常被应用于各种领域，从神经科学到计算机图形学。PCA的强大之处在于它是一个简单且非参数化的工具，用以从复杂的、混乱的数据集中提取出有用信息。 PCA的基本思想是通过减少数据集的维度来简化结构，这个过程也被称为降维。降维后的数据集保留了原始数据的主要特征，从而更易于分析和可视化。在执行PCA时，目标是将数据投影到由数据主要方向定义的新坐标轴上，这些轴称为“主成分”。每个主成分都是原始数据特征空间中的一个方向，按照保留数据方差的能力进行排序，第一个主成分保留了最多的方差，第二个主成分保留了次多的方差，依此类推。这样，原始数据的结构可以在尽量少的主成分上得到揭示。在深入探讨PCA的具体技术细节之前，Shlens通过一个玩具示例来直观地解释PCA的目的。这个例子通常会展示数据集在经过PCA转换前后的变化，以及如何通过PCA揭示数据的简化结构。通过这种方式，即使是不具备深厚数学背景的读者也能获得对PCA的直观感受。接下来，Shlens在文中引入了线性代数的框架，逐步深入地讲解了PCA背后的数学原理。特别地，他强调PCA与奇异值分解（SVD）技术之间的密切关系。SVD是一种强大的数学工具，能够对任意矩阵进行分解。对于PCA而言，通过执行SVD可以找到数据的最佳线性表示，即主成分。在这篇教程中，Shlens还提供了将PCA应用于现实世界的指导，并对PCA的底层假设进行了探讨。理解这些假设对于正确使用PCA至关重要，因为它们决定了PCA是否适用于特定的数据分析任务。为了全面理解PCA，作者在文中不仅采用非正式的方式来解释概念，而且并未回避数学细节。Shlens希望通过对PCA的直观感受和深入讨论，帮助不同层次的读者更好地理解PCA，包括何时、如何以及为什么要应用这一技术。文章的目的是教育，因此，尽管有时需要严格的数学证明，但这些证明通常被放置在论文的其他部分。 PCA在机器学习和降维领域的应用是广泛的。它不仅可以用于数据预处理，还可以作为特征提取的方法，从而改善机器学习模型的性能。在降维的背景下，PCA有助于去除特征之间的冗余信息，使模型更专注于最重要的特征。需要注意的是，尽管PCA是一个强大的工具，但它也存在局限性。例如，PCA假设数据的最主要结构可以通过线性变换来捕捉。如果数据的真实结构是高度非线性的，那么PCA可能就不是最佳选择。此外，PCA对异常值非常敏感，异常值可能会对主成分的方向产生较大影响，从而影响PCA结果的有效性。这篇2014年的教程详细地介绍了PCA从直观感到数学推导的完整过程，不仅为初学者提供了入门的台阶，也为有经验的数据科学家提供了深入了解PCA内部工作原理的机会。通过对PCA的全面学习，读者可以更好地掌握其应用条件和潜在的限制，为进入机器学习和数据降维领域打下坚实的理论基础。

资源推荐

资源详情

资源评论

A Tutorial on Principal Component Analysis

Jonathon Shlens

∗

Google Research

Mountain View, CA 94043

(Dated: April 7, 2014; Version 3.02)

Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used

but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This

manuscript focuses on building a solid intuition for how and why principal component analysis works. This

manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This

tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The

hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as

well as the when, the how and the why of applying this technique.

I. INTRODUCTION

Principal component analysis (PCA) is a standard tool in mod-

ern data analysis - in diverse ﬁelds from neuroscience to com-

puter graphics - because it is a simple, non-parametric method

for extracting relevant information from confusing data sets.

With minimal effort PCA provides a roadmap for how to re-

duce a complex data set to a lower dimension to reveal the

sometimes hidden, simpliﬁed structures that often underlie it.

The goal of this tutorial is to provide both an intuitive feel for

PCA, and a thorough discussion of this topic. We will begin

with a simple example and provide an intuitive explanation

of the goal of PCA. We will continue by adding mathemati-

cal rigor to place it within the framework of linear algebra to

provide an explicit solution. We will see how and why PCA

is intimately related to the mathematical technique of singular

value decomposition (SVD). This understanding will lead us

to a prescription for how to apply PCA in the real world and an

appreciation for the underlying assumptions. My hope is that

a thorough understanding of PCA provides a foundation for

approaching the ﬁelds of machine learning and dimensional

reduction.

The discussion and explanations in this paper are informal in

the spirit of a tutorial. The goal of this paper is to educate.

Occasionally, rigorous mathematical proofs are necessary al-

though relegated to the Appendix. Although not as vital to the

tutorial, the proofs are presented for the adventurous reader

who desires a more complete understanding of the math. My

only assumption is that the reader has a working knowledge

of linear algebra. My goal is to provide a thorough discussion

by largely building on ideas from linear algebra and avoiding

challenging topics in statistics and optimization theory (but

see Discussion). Please feel free to contact me with any sug-

gestions, corrections or comments.

∗

Electronic address: [email protected]

II. MOTIVATION: A TOY EXAMPLE

Here is the perspective: we are an experimenter. We are trying

to understand some phenomenon by measuring various quan-

tities (e.g. spectra, voltages, velocities, etc.) in our system.

Unfortunately, we can not ﬁgure out what is happening be-

cause the data appears clouded, unclear and even redundant.

This is not a trivial problem, but rather a fundamental obstacle

in empirical science. Examples abound from complex sys-

tems such as neuroscience, web indexing, meteorology and

oceanography - the number of variables to measure can be

unwieldy and at times even deceptive, because the underlying

relationships can often be quite simple.

Take for example a simple toy problem from physics dia-

grammed in Figure 1. Pretend we are studying the motion

of the physicist’s ideal spring. This system consists of a ball

of mass m attached to a massless, frictionless spring. The ball

is released a small distance away from equilibrium (i.e. the

spring is stretched). Because the spring is ideal, it oscillates

indeﬁnitely along the x-axis about its equilibrium at a set fre-

quency.

This is a standard problem in physics in which the motion

along the x direction is solved by an explicit function of time.

In other words, the underlying dynamics can be expressed as

a function of a single variable x.

However, being ignorant experimenters we do not know any

of this. We do not know which, let alone how many, axes

and dimensions are important to measure. Thus, we decide to

measure the ball’s position in a three-dimensional space (since

we live in a three dimensional world). Speciﬁcally, we place

three movie cameras around our system of interest. At 120 Hz

each movie camera records an image indicating a two dimen-

sional position of the ball (a projection). Unfortunately, be-

cause of our ignorance, we do not even know what are the real

x, y and z axes, so we choose three camera positions

b and

at some arbitrary angles with respect to the system. The angles

between our measurements might not even be 90

! Now, we

record with the cameras for several minutes. The big question

remains: how do we get from this data set to a simple equation

arXiv:1404.1100v1 [cs.LG] 3 Apr 2014

camera A camera B camera C

FIG. 1 A toy example. The position of a ball attached to an oscillat-

ing spring is recorded using three cameras A, B and C. The position

of the ball tracked by each camera is depicted in each panel below.

of x?

We know a-priori that if we were smart experimenters, we

would have just measured the position along the x-axis with

one camera. But this is not what happens in the real world.

We often do not know which measurements best reﬂect the

dynamics of our system in question. Furthermore, we some-

times record more dimensions than we actually need.

Also, we have to deal with that pesky, real-world problem of

noise. In the toy example this means that we need to deal

with air, imperfect cameras or even friction in a less-than-ideal

spring. Noise contaminates our data set only serving to obfus-

cate the dynamics further. This toy example is the challenge

experimenters face everyday. Keep this example in mind as

we delve further into abstract concepts. Hopefully, by the end

of this paper we will have a good understanding of how to

systematically extract x using principal component analysis.

III. FRAMEWORK: CHANGE OF BASIS

The goal of principal component analysis is to identify the

most meaningful basis to re-express a data set. The hope is

that this new basis will ﬁlter out the noise and reveal hidden

structure. In the example of the spring, the explicit goal of

PCA is to determine: “the dynamics are along the x-axis.” In

other words, the goal of PCA is to determine that

x, i.e. the

unit basis vector along the x-axis, is the important dimension.

Determining this fact allows an experimenter to discern which

dynamics are important, redundant or noise.

A. A Naive Basis

With a more precise deﬁnition of our goal, we need a more

precise deﬁnition of our data as well. We treat every time

sample (or experimental trial) as an individual sample in our

data set. At each time sample we record a set of data consist-

ing of multiple measurements (e.g. voltage, position, etc.). In

our data set, at one point in time, camera A records a corre-

sponding ball position (x

). One sample or trial can then

be expressed as a 6 dimensional column vector

X =













where each camera contributes a 2-dimensional projection of

the ball’s position to the entire vector

X. If we record the ball’s

position for 10 minutes at 120 Hz, then we have recorded 10×

60 ×120 = 72000 of these vectors.

With this concrete example, let us recast this problem in ab-

stract terms. Each sample

X is an m-dimensional vector,

where m is the number of measurement types. Equivalently,

every sample is a vector that lies in an m-dimensional vec-

tor space spanned by some orthonormal basis. From linear

algebra we know that all measurement vectors form a linear

combination of this set of unit length basis vectors. What is

this orthonormal basis?

This question is usually a tacit assumption often overlooked.

Pretend we gathered our toy example data above, but only

looked at camera A. What is an orthonormal basis for (x

A naive choice would be {(1,0),(0,1)}, but why select this

basis over {(

√

),(

−

√

−

√

)}or any other arbitrary rota-

tion? The reason is that the naive basis reﬂects the method we

gathered the data. Pretend we record the position (2, 2). We

did not record 2

√

2 in the (

√

) direction and 0 in the per-

pendicular direction. Rather, we recorded the position (2, 2)

on our camera meaning 2 units up and 2 units to the left in our

camera window. Thus our original basis reﬂects the method

we measured our data.

How do we express this naive basis in linear algebra? In the

two dimensional case, {(1,0),(0,1)} can be recast as individ-

ual row vectors. A matrix constructed out of these row vectors

is the 2 ×2 identity matrix I. We can generalize this to the m-

dimensional case by constructing an m ×m identity matrix

B =



















1 0 ··· 0

0 1 ··· 0

0 0 ··· 1







= I

where each row is an orthornormal basis vector b

with m

components. We can consider our naive basis as the effective

starting point. All of our data has been recorded in this basis

剩余11页未读，继续阅读

评论收藏

内容反馈

larry_dongy

粉丝: 669

A Tutorial on Principal Component Analysis [2014]

最新资源

A Tutorial on Principal Component Analysis [2014]

A tutorial on Principal Components Analysis

Principal Components Analysis:

A tutorial on Principal Components Analysis.pdf

A Tutorial on Principal Component Analysis [2002]

主成分分析Principal Component Analysis

主成分分析（Principal Component Analysis，PCA）

PCA主成分分析(Principal Component Analysis)

Principal-Component-Analysis-PCA

Principal component Analysis

机器学习 A tutorial on Principal Components Analysis

principal_components

课上主成分分析（principal components analysis）.html

ndependent Component Analysis A Tutorial Introduction

Dimension Reduction A Guided Tour

kernal_PCA的一个MATLAB实现

Basics of Linear Algebra for Machine Learning (Python)

Basics of Linear Algebra for Machine Learning

Generalized Principal Component Analysis

Bayesian Maximum Margin Principal Component Analysis

Kernel-Principal-Component-Analysis-KPCA_KPCA_KPCAmatlab_主成分分析_源

Kernel Principal Component Analysis (KPCA)matlab代码.zip

Machine Learning in Action.pdf

PCA-Tutorial-Intuition_jp.pdf

基于核函数的特征提取

PCA tutorial

Auto-Encoding Variational Bayes

Kernel-Principal-Component-Analysis-KPCA_KPCA_KPCAmatlab_主成分分析.z

五大常用算法总结

hive执行报错code3

最新资源