2. PCA
2. PCA
• We convert the dimensions of data from 2 dimensions (x1 and x2) to 1 dimension (z1).
• It transforms the variables into a new set of variables called as principal components.
• These principal components are linear combination of original variables and are
orthogonal.
• The first principal component accounts for most of the possible variation of original
data.
• The second principal component does its best to capture the variance in the data.
• There can be only two principal components for a two-dimensional data set.
PCA Algorithm:
The steps involved in PCA Algorithm are as follows-
• Step-05: Calculate the eigen vectors and eigen values of the covariance matrix.
• CLASS 1
• X=2,3,4
• Y=1,5,3
• CLASS 2
• X=5,6,7
• Y=6,7,8
Step-01:
Get data.
• x1 = (2, 1)
• x2 = (3, 5)
• x3 = (4, 3)
• x4 = (5, 6)
• x5 = (6, 7)
• x6 = (7, 8)
Step-02:
= (4.5, 5)
Step-03:
• x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0)
• x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1)
• x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2)
• x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3)
Step-04:
• Covariance matrix is given by
Covariance matrix = (m1 + m2 + m3 + m4 + m5 + m6) / 6
Calculate the eigen values and eigen vectors of the covariance matrix.
From here,
λ2 – 8.59λ + 3.09 = 0
Solving this quadratic equation, we get λ = 8.22, 0.38 Thus, two eigen values are λ 1 = 8.22 and λ2 = 0.38.
Clearly, the second eigen value is very small compared to the first eigen value. So, the second eigen vector
can be left out.
Eigen vector corresponding to the greatest eigen value is the principal component for the given data set. So.
we find the eigen vector corresponding to eigen value λ 1.
MX = λX
where-
• M = Covariance Matrix
• X = Eigen vector
• λ = Eigen value
Solving these, we get-
2.92X1 + 3.67X2 = 8.22X1
3.67X1 + 5.67X2 = 8.22X2
On simplification, we get-
5.3X1 = 3.67X2 ………(1)
3.67X1 = 2.55X2 ………(2)
• Reduces Overfitting.
• Improves Visualization.
LIMITATIONS OF PCA:
• Independent variables become less interpretable.
• Information loss.