0% found this document useful (0 votes)
129 views5 pages

1 Peer-Graded Assignment: Clustering

This document analyzes a banknote authentication dataset using clustering techniques. It normalizes the dataset, performs k-means clustering with k=2 to categorize data points as either authentic or fake banknotes based on their feature values. Scatter plots and centroids are used to visualize the two resulting clusters. The analysis demonstrates how k-means clustering can help classify new banknotes as real or fake.

Uploaded by

NurujjamanKhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views5 pages

1 Peer-Graded Assignment: Clustering

This document analyzes a banknote authentication dataset using clustering techniques. It normalizes the dataset, performs k-means clustering with k=2 to categorize data points as either authentic or fake banknotes based on their feature values. Scatter plots and centroids are used to visualize the two resulting clusters. The analysis demonstrates how k-means clustering can help classify new banknotes as real or fake.

Uploaded by

NurujjamanKhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

clustering_assignment

July 13, 2020

1 Peer-graded Assignment: Clustering


In [ 3]: import pandas as pd
import numpyas np
import matplotlib . pyplot as plt
import seaborn as sns
from sklearn . cluster import KMeans

banknote_data = pd. read_csv( ' banknote_authentication_dataset.csv ' )

value_one = [banknote_data[' V1' ] . mean(),banknote_data[' V2' ] . mean()]


value_two = [banknote_data[' V1' ] . std(),banknote_data[ ' V2' ] . std()]

# Calculate V1/V2 Mean


print ( " Mean:" , value_one)

# Calculate V1/V2 Standard Deviation


print ( " Standard Deviation: ", value_two)

banknote_data. dropna()
banknote_data. describe()

Mean: [0.43373525728862977, 1.9223531209912539]


Standard Deviation: [2.8427625862451658, 5.869046743580378]

Out[3]: V1 V2
count 1372.000000 1372.000000
mean 0.433735 1.922353
std 2.842763 5.869047
min -7.042100 -13.773100
25% -1.773000 -1.708200
50% 0.496180 2.319650
75% 2.821475 6.814625
max 6.824800 12.951600

1
1.0.1 Data Analysis
We used fiseaborn.pairplot function to plot pairwise relationships in the provided banknote au-
thentication dataset. An example of scatterplots with joint relationships and histograms for uni-
variate distributions are below:

In [ 4]: sns. pairplot(banknote_data)

Out[4]: <seaborn.axisgrid.PairGrid at 0x7f529070a400>

We can deduce looking at the scatterplots the values are slightly high for fake banknotes.

1.0.2 Dataset Normalization


We can normalize the mean and standard deviation output to measure fake and authentic ban-
knotes.

In [ 6]: # Normalize the dataset


normalize_banknote_data= (banknote_data - banknote_data. min()) / (banknote_data. max() - banknote_data. min())
normalize_mean= [normalize_banknote_data[' V1' ] . mean(), normalize_banknote_data[' V2' ] . mean()]

2
normalize_std = [normalize_banknote_data[' V1' ] . std(), normalize_banknote_data[ ' V2' ] . std()]

print ( " Mean:" ,normalize_mean)


print ( " Standard Deviation: ",normalize_std)

Mean: [0.5391136632764807, 0.5873013774145737]


Standard Deviation: [0.20500346769971411, 0.2196113237409729]

In [ 7]: plt . xlabel( ' V1' )


plt . ylabel( ' V2' )

plt . scatter(normalize_banknote_data[ ' V1' ],normalize_banknote_data[ ' V2' ],alpha =0.25)


plt . scatter(normalize_mean[0],normalize_mean[1],label =" Mean")

plt . title( " OpenML Banknote Authentication Dataset


")
plt . legend()
plt . show()

In [ 11]: # K-means Algorithm


from sklearn . cluster import KMeans

for i in range( 1):


kmeans= KMeans(n_clusters=2) . fit(normalize_banknote_data)

3
# Centres of our clusters
clusters = kmeans. cluster_centers_

# Classification of the elements


y_kmeans= kmeans . predict(normalize_banknote_data)

# Create a column with the labels


normalize_banknote_data[' Class' ] = y_kmeans

class_1 = normalize_banknote_data[ normalize_banknote_data[


' Class ' ] ==0 ]
class_2 = normalize_banknote_data[ normalize_banknote_data[
' Class ' ] ==1 ]

# Two clusters plotting


plt . xlabel( ' V1' )
plt . ylabel( ' V2' )

plt . scatter(class_1[ ' V1' ], class_1[ ' V2' ], label =" Class 1" , alpha =0.5)
plt . scatter(class_2[ ' V1' ], class_2[ ' V2' ], label =" Class 2" , alpha =0.5)

plt . scatter(clusters[:, 0], clusters[:, 1], c =' blue' , s =10000


, alpha =0.2)

plt . title( "Fake and Authentic Banknotes


")
plt . legend()
plt . show()

4
1.0.3 Discussion

We use K-Mean Algorithm to identify the two clusters and categorize which element belongs
to a speci c class after normalizing the mean and standard deviation outputs. Additionally, we
labelled the column and displayed the graph to visualize the fake and authentic banknote classes.
As instructed by the question of the task, I ran the K-means severally to check its stability and
noticed the position of the centroids didn't change much.

1.0.4 Recommendation
The algorithm can help nancial institutions or independent researchers categorize new input into
two classes which will make it easier for them to differentiate between a real or fake banknote.

You might also like