Principal
Component
Analysis
Ricardo Wendell
Aug 2013
2
Feature Engineering
(Our motivation)
Introduction to Principal Component
Analysis
(And some statistical concepts)
Agile Analytics and PCA
(Helping visualization…)
Agenda
3
Feature
Engineering
4
Given a
classification
problem…
How do we choose
the right features?
5
Intuition
fails in high
dimensions
Building a classifier in two or three
dimensions is relatively easy…
It’s usually possible to find a
reasonable frontier between
examples of different
classes just by visual inspection.
6
Feature
engineering
Intuitively, one might
think that gathering
more features never
hurts, right?
At worst they provide
no new information
about the domain…
7
The curse of
dimensionality
Many algorithms that work fine in low
dimensions become intractable
when the input is high-dimensional.
Bellman, 1961
8
How do we
solve it?
Feature Selection
Feature Extraction
9
Feature
extraction
“In most applications
examples are not spread
uniformly throughout the
examples space, but are
concentrated on or near
a lower-dimensional
subspace.”
10
Introduction to
PCA
11
Objective of
PCA
To perform dimensionality
reduction while preserving
as much of the randomness
in the high-dimensional
space as possible
12
Principal
Component
Analysis
It takes your cloud of data
points, and rotates it such
that the maximum variability
is visible.
PCA is mainly concerned
with identifying correlations
in the data.
13
Measuring
Correlation
Degree and type of relationship
between any two or more quantities
(variables) in which they vary together
over a period
Correlation can vary from +1 to -1.
Values close to +1 indicate a high-
degree of positive correlation, and
values close to -1 indicate a high
degree of negative correlation.
Values close to zero indicate poor
correlation of either kind, and 0
indicates no correlation at all
14
Measuring
Correlation
15
Beware: Correlation does not
imply causation
16
Correlation
matrix
It shows at a glance how
variables correlate with
each other
17
Eingenvalues
and
eingevectors
18
Steps for PCA 1. Standardize the data
2. Calculate the covariance matrix
3. Find the eigenvalues and
eingenvectors of the covariance
matrix
4. Plot the eigenvectors / principal
components over the scaled data
19
Demo
with R
Let’s check the products
of PCA…
20
Agile analytics
and PCA
21
Agile
Analytics
Machine learning and data
mining tools and techniques
+
Knowledge of the
domain at hand
+
Short feedback cycles
22
Agile
Analytics
We could use PCA as a tool to
quickly identify correlation
between features, helping
feature extraction and
selection.
Reducing dimensionality using
PCA or other similar technique
can help us achieve better and
quicker results.
23	

QA & Next Steps
23

Principal Component Analysis