0% found this document useful (0 votes)

129 views5 pages

1 Peer-Graded Assignment: Clustering

This document analyzes a banknote authentication dataset using clustering techniques. It normalizes the dataset, performs k-means clustering with k=2 to categorize data points as either authentic or fake banknotes based on their feature values. Scatter plots and centroids are used to visualize the two resulting clusters. The analysis demonstrates how k-means clustering can help classify new banknotes as real or fake.

Uploaded by

NurujjamanKhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views5 pages

1 Peer-Graded Assignment: Clustering

Uploaded by

NurujjamanKhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

clustering_assignment

July 13, 2020

1 Peer-graded Assignment: Clustering

In [ 3]: import pandas as pd
import numpyas np
import matplotlib . pyplot as plt
import seaborn as sns
from sklearn . cluster import KMeans

banknote_data = pd. read_csv( ' banknote_authentication_dataset.csv ' )

value_one = [banknote_data[' V1' ] . mean(),banknote_data[' V2' ] . mean()]

value_two = [banknote_data[' V1' ] . std(),banknote_data[ ' V2' ] . std()]

# Calculate V1/V2 Mean

print ( " Mean:" , value_one)

# Calculate V1/V2 Standard Deviation

print ( " Standard Deviation: ", value_two)

banknote_data. dropna()
banknote_data. describe()

Mean: [0.43373525728862977, 1.9223531209912539]

Standard Deviation: [2.8427625862451658, 5.869046743580378]

Out[3]: V1 V2
count 1372.000000 1372.000000
mean 0.433735 1.922353
std 2.842763 5.869047
min -7.042100 -13.773100
25% -1.773000 -1.708200
50% 0.496180 2.319650
75% 2.821475 6.814625
max 6.824800 12.951600

1
1.0.1 Data Analysis
We used fiseaborn.pairplot function to plot pairwise relationships in the provided banknote au-
thentication dataset. An example of scatterplots with joint relationships and histograms for uni-
variate distributions are below:

In [ 4]: sns. pairplot(banknote_data)

Out[4]: <seaborn.axisgrid.PairGrid at 0x7f529070a400>

We can deduce looking at the scatterplots the values are slightly high for fake banknotes.

1.0.2 Dataset Normalization

We can normalize the mean and standard deviation output to measure fake and authentic ban-
knotes.

In [ 6]: # Normalize the dataset

normalize_banknote_data= (banknote_data - banknote_data. min()) / (banknote_data. max() - banknote_data. min())
normalize_mean= [normalize_banknote_data[' V1' ] . mean(), normalize_banknote_data[' V2' ] . mean()]

2
normalize_std = [normalize_banknote_data[' V1' ] . std(), normalize_banknote_data[ ' V2' ] . std()]

print ( " Mean:" ,normalize_mean)

print ( " Standard Deviation: ",normalize_std)

Mean: [0.5391136632764807, 0.5873013774145737]

Standard Deviation: [0.20500346769971411, 0.2196113237409729]

In [ 7]: plt . xlabel( ' V1' )

plt . ylabel( ' V2' )

plt . scatter(normalize_banknote_data[ ' V1' ],normalize_banknote_data[ ' V2' ],alpha =0.25)

plt . scatter(normalize_mean[0],normalize_mean[1],label =" Mean")

plt . title( " OpenML Banknote Authentication Dataset

")
plt . legend()
plt . show()

In [ 11]: # K-means Algorithm

from sklearn . cluster import KMeans

for i in range( 1):

kmeans= KMeans(n_clusters=2) . fit(normalize_banknote_data)

3
# Centres of our clusters
clusters = kmeans. cluster_centers_

# Classification of the elements

y_kmeans= kmeans . predict(normalize_banknote_data)

# Create a column with the labels

normalize_banknote_data[' Class' ] = y_kmeans

class_1 = normalize_banknote_data[ normalize_banknote_data[

' Class ' ] ==0 ]
class_2 = normalize_banknote_data[ normalize_banknote_data[
' Class ' ] ==1 ]

# Two clusters plotting

plt . xlabel( ' V1' )
plt . ylabel( ' V2' )

plt . scatter(class_1[ ' V1' ], class_1[ ' V2' ], label =" Class 1" , alpha =0.5)
plt . scatter(class_2[ ' V1' ], class_2[ ' V2' ], label =" Class 2" , alpha =0.5)

plt . scatter(clusters[:, 0], clusters[:, 1], c =' blue' , s =10000

, alpha =0.2)

plt . title( "Fake and Authentic Banknotes

")
plt . legend()
plt . show()

4
1.0.3 Discussion

We use K-Mean Algorithm to identify the two clusters and categorize which element belongs
to a speci c class after normalizing the mean and standard deviation outputs. Additionally, we
labelled the column and displayed the graph to visualize the fake and authentic banknote classes.
As instructed by the question of the task, I ran the K-means severally to check its stability and
noticed the position of the centroids didn't change much.

1.0.4 Recommendation
The algorithm can help nancial institutions or independent researchers categorize new input into
two classes which will make it easier for them to differentiate between a real or fake banknote.

Grade 8 Math Collection PUISSANCE Al Ahlia New Edition 2008
100% (1)
Grade 8 Math Collection PUISSANCE Al Ahlia New Edition 2008
272 pages
02 - Sample Problems With Solutions - HW2
50% (2)
02 - Sample Problems With Solutions - HW2
3 pages
Euler Introduction To Analysis of The Infinite PDF
0% (1)
Euler Introduction To Analysis of The Infinite PDF
2 pages
Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
DLL-1 1
100% (1)
DLL-1 1
3 pages
Table of Specifications First Quarterly Assessment, SY 2019-2020
100% (1)
Table of Specifications First Quarterly Assessment, SY 2019-2020
1 page
Laplace Transforms, Numerical Methods and Complex Variables
0% (1)
Laplace Transforms, Numerical Methods and Complex Variables
2 pages
Determining Quadratic Equations
No ratings yet
Determining Quadratic Equations
15 pages
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
Project Purpose: This Project Aims To Classify Currency Notes As Fake or Genuine Using Clustering Algorithm
No ratings yet
Project Purpose: This Project Aims To Classify Currency Notes As Fake or Genuine Using Clustering Algorithm
7 pages
Calculus With MATLAB: Functions and Symbolic Differentiation
No ratings yet
Calculus With MATLAB: Functions and Symbolic Differentiation
4 pages
Synopsis of Bank Note Authentication System
No ratings yet
Synopsis of Bank Note Authentication System
1 page
Microsoft Paint
No ratings yet
Microsoft Paint
7 pages
Banknote Authentication Analysis Using Python K-Means Clustering
No ratings yet
Banknote Authentication Analysis Using Python K-Means Clustering
3 pages
Data Analysis Report of Banknote
No ratings yet
Data Analysis Report of Banknote
3 pages
2352 - Sequence Alignment: Input
No ratings yet
2352 - Sequence Alignment: Input
1 page
Bank Note Authentication
No ratings yet
Bank Note Authentication
13 pages
33 93 LM V1 S1 - Kmedoids
No ratings yet
33 93 LM V1 S1 - Kmedoids
3 pages
2349 - Minimal Cover of Prime Implicants: Asia - Dhaka - 2001/2002
No ratings yet
2349 - Minimal Cover of Prime Implicants: Asia - Dhaka - 2001/2002
2 pages
Vii - (Vol-1) Number System
100% (1)
Vii - (Vol-1) Number System
10 pages
Final Data Science Project-Modeling For A Bank: Fig (2) - Variance Vs Skewness
No ratings yet
Final Data Science Project-Modeling For A Bank: Fig (2) - Variance Vs Skewness
3 pages
Banknote Authentication
100% (1)
Banknote Authentication
3 pages
May 27 ITECH1103 Weka Assignment
No ratings yet
May 27 ITECH1103 Weka Assignment
8 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
hw1 ML IvanReyes
No ratings yet
hw1 ML IvanReyes
21 pages
Jackson 3.4 Homework Problem Solution
No ratings yet
Jackson 3.4 Homework Problem Solution
5 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
2353 - Turn All The Lights Off: Asia - Dhaka - 2001/2002
No ratings yet
2353 - Turn All The Lights Off: Asia - Dhaka - 2001/2002
3 pages
Banknotes Autentification Project-Report
No ratings yet
Banknotes Autentification Project-Report
3 pages
Clustering
No ratings yet
Clustering
53 pages
2350 - Perfect Numbers: Input
No ratings yet
2350 - Perfect Numbers: Input
2 pages
Problem K: Trash Removal
No ratings yet
Problem K: Trash Removal
2 pages
Problem I: Mummy Madness
No ratings yet
Problem I: Mummy Madness
2 pages
Problem J: Pyramids
No ratings yet
Problem J: Pyramids
2 pages
Problem H: Mining Your Own Business
No ratings yet
Problem H: Mining Your Own Business
2 pages
Question Most Important PYQs Sequences and Series Undefined Crash Course MathonGo
No ratings yet
Question Most Important PYQs Sequences and Series Undefined Crash Course MathonGo
2 pages
PP - Sequences - and - Sums - Markscheme - Ib Maths AA SL
No ratings yet
PP - Sequences - and - Sums - Markscheme - Ib Maths AA SL
19 pages
Data Science Project
No ratings yet
Data Science Project
3 pages
K Means Clustering Project - Sample
No ratings yet
K Means Clustering Project - Sample
9 pages
Tsne On Credit Card
No ratings yet
Tsne On Credit Card
9 pages
Indefinite Integral Study Material Part 1 Hsslive Remesh
No ratings yet
Indefinite Integral Study Material Part 1 Hsslive Remesh
11 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Report Forged Banknotes
No ratings yet
Report Forged Banknotes
2 pages
Remote Sensing PDF
100% (1)
Remote Sensing PDF
337 pages
2351 - Rubik's Cube: Asia - Dhaka - 2001/2002
No ratings yet
2351 - Rubik's Cube: Asia - Dhaka - 2001/2002
3 pages
Algebra 1 Study Guide
No ratings yet
Algebra 1 Study Guide
16 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
TD4 Unsupervised Machine Learning
No ratings yet
TD4 Unsupervised Machine Learning
10 pages
Cwiwkó - 1: A1.1 NB Kvy
No ratings yet
Cwiwkó - 1: A1.1 NB Kvy
5 pages
Trigonometry Curriculum
No ratings yet
Trigonometry Curriculum
12 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
Quadratic Equations
No ratings yet
Quadratic Equations
3 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Linear Programming
No ratings yet
Linear Programming
209 pages
'Ignore': Import As Import As Import As Import As From Import From Import From Import From Import Import
No ratings yet
'Ignore': Import As Import As Import As Import As From Import From Import From Import From Import Import
5 pages
Algorithms - Problems
No ratings yet
Algorithms - Problems
98 pages
Rajeek8 12
No ratings yet
Rajeek8 12
21 pages
K Means
No ratings yet
K Means
3 pages
Soalan 4-Circle and Elipse
No ratings yet
Soalan 4-Circle and Elipse
6 pages
Module5 PDF
No ratings yet
Module5 PDF
53 pages
Certificate
No ratings yet
Certificate
33 pages
Strauss PDEch 2 S 4 P 2
No ratings yet
Strauss PDEch 2 S 4 P 2
8 pages
Predictivemaintenance FaultDetection
No ratings yet
Predictivemaintenance FaultDetection
12 pages
Program 7
No ratings yet
Program 7
3 pages
Bank Nifty PDF
No ratings yet
Bank Nifty PDF
16 pages
Assignments Introduction To Machine Learning 2024
No ratings yet
Assignments Introduction To Machine Learning 2024
45 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
IML Assignment5
No ratings yet
IML Assignment5
10 pages
Module 9 Gen Math
No ratings yet
Module 9 Gen Math
4 pages
Kanish 9-12
No ratings yet
Kanish 9-12
18 pages
Practical 5
No ratings yet
Practical 5
6 pages
Aiml Lab
No ratings yet
Aiml Lab
37 pages
Apostol
No ratings yet
Apostol
8 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
KDD WS 24 25 Practical Tasks
No ratings yet
KDD WS 24 25 Practical Tasks
2 pages
Aula 4
No ratings yet
Aula 4
15 pages
Algebra Resource Sheet
No ratings yet
Algebra Resource Sheet
3 pages
ML Short
No ratings yet
ML Short
2 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Unsupervised
No ratings yet
Unsupervised
10 pages
Aml - Lab (1-6)
No ratings yet
Aml - Lab (1-6)
15 pages
ML Solution
No ratings yet
ML Solution
60 pages
2022CS665
No ratings yet
2022CS665
17 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
M PDF
No ratings yet
M PDF
13 pages
ML Practical Solutions
No ratings yet
ML Practical Solutions
15 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
Chap 1 LA
No ratings yet
Chap 1 LA
52 pages
10th Mathematics EM WWW - Tntextbooks.in
No ratings yet
10th Mathematics EM WWW - Tntextbooks.in
352 pages
11th 1st Chapter Dimension Analysis
No ratings yet
11th 1st Chapter Dimension Analysis
4 pages
Class Activity#7 Robert Skublen
No ratings yet
Class Activity#7 Robert Skublen
7 pages
Matrices Notes
No ratings yet
Matrices Notes
4 pages
Efficient Multi-Currencyclassification of CIS Banknotes
No ratings yet
Efficient Multi-Currencyclassification of CIS Banknotes
11 pages
Alg1SJ5 2
No ratings yet
Alg1SJ5 2
5 pages
11th Maths Important Questions Bank English Medium PDF Download
No ratings yet
11th Maths Important Questions Bank English Medium PDF Download
40 pages

1 Peer-Graded Assignment: Clustering

Uploaded by

1 Peer-Graded Assignment: Clustering

Uploaded by

clustering_assignment

July 13, 2020

1 Peer-graded Assignment: Clustering

banknote_data = pd. read_csv( ' banknote_authentication_dataset.csv ' )

value_one = [banknote_data[' V1' ] . mean(),banknote_data[' V2' ] . mean()]

# Calculate V1/V2 Mean

# Calculate V1/V2 Standard Deviation

Mean: [0.43373525728862977, 1.9223531209912539]

In [ 4]: sns. pairplot(banknote_data)

Out[4]: <seaborn.axisgrid.PairGrid at 0x7f529070a400>

1.0.2 Dataset Normalization

In [ 6]: # Normalize the dataset

print ( " Mean:" ,normalize_mean)

Mean: [0.5391136632764807, 0.5873013774145737]

In [ 7]: plt . xlabel( ' V1' )

plt . scatter(normalize_banknote_data[ ' V1' ],normalize_banknote_data[ ' V2' ],alpha =0.25)

plt . title( " OpenML Banknote Authentication Dataset

In [ 11]: # K-means Algorithm

for i in range( 1):

# Classification of the elements

# Create a column with the labels

class_1 = normalize_banknote_data[ normalize_banknote_data[

# Two clusters plotting

plt . scatter(clusters[:, 0], clusters[:, 1], c =' blue' , s =10000

plt . title( "Fake and Authentic Banknotes

You might also like