0% found this document useful (0 votes)
81 views4 pages

Pca Implementation Notebook

This document demonstrates how to perform principal component analysis (PCA) on economic data with 6 variables and 16 observations. It loads and explores the data, applies PCA to reduce the data to 3 principal components, visualizes the first two components, and calculates that the first component explains 75.6% of the variance in the data.

Uploaded by

Walid Sassi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views4 pages

Pca Implementation Notebook

This document demonstrates how to perform principal component analysis (PCA) on economic data with 6 variables and 16 observations. It loads and explores the data, applies PCA to reduce the data to 3 principal components, visualizes the first two components, and calculates that the first component explains 75.6% of the variance in the data.

Uploaded by

Walid Sassi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

13/09/2023, 21:06 principal-component-analysis

Import all the libraries :

In [22]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Loading Data :

In [23]:

df = pd.read_csv('/kaggle/input/principal-component-analysis/Longley (1).csv')

In [24]:

df

Out[24]:

GNP.deflator GNP Unemployed Armed.Forces Population Employed

0 83.0 234.289 235.6 159.0 107.608 60.323

1 88.5 259.426 232.5 145.6 108.632 61.122

2 88.2 258.054 368.2 161.6 109.773 60.171

3 89.5 284.599 335.1 165.0 110.929 61.187

4 96.2 328.975 209.9 309.9 112.075 63.221

5 98.1 346.999 193.2 359.4 113.270 63.639

6 99.0 365.385 187.0 354.7 115.094 64.989

7 100.0 363.112 357.8 335.0 116.219 63.761

8 101.2 397.469 290.4 304.8 117.388 66.019

9 104.6 419.180 282.2 285.7 118.734 67.857

10 108.4 442.769 293.6 279.8 120.445 68.169

11 110.8 444.546 468.1 263.7 121.950 66.513

12 112.6 482.704 381.3 255.2 123.366 68.655

13 114.2 502.601 393.1 251.4 125.368 69.564

14 115.7 518.173 480.6 257.2 127.852 69.331

15 116.9 554.894 400.7 282.7 130.081 70.551

https://htmtopdf.herokuapp.com/ipynbviewer/temp/4518ed9eb5f5a3af5f67858dbb1814e4/principal-component-analysis.html?t=1694619339010 1/4
13/09/2023, 21:06 principal-component-analysis

In [25]:

df.dtypes

Out[25]:

GNP.deflator float64
GNP float64
Unemployed float64
Armed.Forces float64
Population float64
Employed float64
dtype: object

In [26]:

X = df.drop('Employed', axis=1)
Y = df['Employed']

In [27]:

correlation = df.corr()
correlation

Out[27]:

GNP.deflator GNP Unemployed Armed.Forces Population Employed

GNP.deflator 1.000000 0.991589 0.620633 0.464744 0.979163 0.970899

GNP 0.991589 1.000000 0.604261 0.446437 0.991090 0.983552

Unemployed 0.620633 0.604261 1.000000 -0.177421 0.686552 0.502498

Armed.Forces 0.464744 0.446437 -0.177421 1.000000 0.364416 0.457307

Population 0.979163 0.991090 0.686552 0.364416 1.000000 0.960391

Employed 0.970899 0.983552 0.502498 0.457307 0.960391 1.000000

Apply PCA :

In [28]:

from sklearn.preprocessing import StandardScaler

In [29]:

# Scale data before applying PCA


scaling=StandardScaler()

https://htmtopdf.herokuapp.com/ipynbviewer/temp/4518ed9eb5f5a3af5f67858dbb1814e4/principal-component-analysis.html?t=1694619339010 2/4
13/09/2023, 21:06 principal-component-analysis

In [30]:

# Use fit and transform method


scaling.fit(df)
Scaled_data=scaling.transform(df)

In [31]:

from sklearn.decomposition import PCA

In [32]:

# Set the n_components=3


principal=PCA(n_components=3)
principal.fit(Scaled_data)
x=principal.transform(Scaled_data)

In [33]:

# Check the dimensions of data after PCA


print(x.shape)

(16, 3)

Check Components :

In [34]:

# Check the values of eigen vectors


# prodeced by principal components
principal.components_

Out[34]:

array([[-0.46695493, -0.46748987, -0.30646472, -0.21200613, -0.4656055


6,
-0.45579661],
[ 0.02628724, 0.02306569, -0.62227098, 0.77353962, -0.0762474
5,
0.08589854],
[-0.04906877, -0.16405382, 0.67228378, 0.58400807, -0.0917922
6,
-0.41136586]])

Plot the components (Visualization) :

https://htmtopdf.herokuapp.com/ipynbviewer/temp/4518ed9eb5f5a3af5f67858dbb1814e4/principal-component-analysis.html?t=1694619339010 3/4
13/09/2023, 21:06 principal-component-analysis

In [35]:

# plt.figure(figsize=(10,10))
plt.scatter(x[:,0],x[:,1],c=df['Employed'],cmap='plasma')
plt.xlabel('pc1')
plt.ylabel('pc2')

Out[35]:

Text(0, 0.5, 'pc2')

Calculate variance ratio :

In [36]:

# check how much variance is explained by each principal component


print(principal.explained_variance_ratio_)

[0.75584735 0.19778211 0.0419845 ]

In [ ]:

https://htmtopdf.herokuapp.com/ipynbviewer/temp/4518ed9eb5f5a3af5f67858dbb1814e4/principal-component-analysis.html?t=1694619339010 4/4

You might also like