Neighbor Consistency in Clustering

The document discusses the k-Nearest Neighbor (KNN) algorithm and its limitations in classification tasks, proposing a new method called the Natural Neighborhood-Based Classification Algorithm (NNBCA) that dynamically determines the number of neighbors (k) based on data characteristics. The NNBCA aims to improve classification accuracy without the need for a fixed k, addressing issues such as the curse of dimensionality and outlier sensitivity. The document also outlines the system's modules, advantages, and future scope for further research and development.

Uploaded by

Punganuru Swathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views29 pages

Neighbor Consistency in Clustering

Uploaded by

Punganuru Swathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

k-NEAREST-NEIGHBOR CONSISTENCY IN DATA

CLUSTERING
Contents:
 Introduction
 Abstract
 Existing System
 Proposed System
 Modules
 Modules description
 Designs
 Results(Screen shots)
 Conclusion
 Future Scope
 Software & Hardware Requirements
 References
Introduction
• The continuous expansion of data availability in many areas of engineering and
science, identifying patterns from vast amounts of data and identifying members of a
predefined class, which is called classification, have become critical tasks. Therefore,
classification is a fundamental problem, especially in pattern recognition and data
mining.
• k-Nearest Neighbor (KNN) classifiers are basic classifiers that are used to classify a
query object into the category as its nearest example.
• The efficiency of NN classification heavily depends on the type of distance measure,
especially in a large-scale and high-dimensional database. In some applications, the
data structure is so complicated that the corresponding distance measure is
computationally expensive.
• Traditional KNN adopts a fixed k for all query samples regardless of their geometric
location and related specialties.
• To overcome the above limitations of classification algorithms, we propose a new
method called the Natural Neighborhood-Based Classification Algorithm (NNBCA).
With the help of our previously proposed Natural Neighbor (NaN) method
Abstract
• K-Nearest Neighbors algorithm can be used for classification. An object is classified by a
majority vote of its neighbors, with the object being assigned to the class most common
among its k nearest neighbors. It can also be used for regression. This value is the average
of the values of its k nearest neighbors.
• Various kinds of K-Nearest Neighbor algorithm based classification methods are the bases
of many well established and high performance pattern recognition techniques. However,
such methods are vulnerable to parameter choice K. Essentially, the challenge is to detect
the neighborhood of various data sets, while utterly ignorant of the data characteristic.
• In this we introduce a new supervised classification method: the natural neighbor
algorithm method, and shows that it provides a better classification result without
choosing the neighborhood parameter artificially. Unlike the original K-Nearest Neighbors
algorithm based method which needs a prior k, the natural neighbor algorithm method
predicts different k in different stages. Therefore, the natural neighbor algorithm method is
able to learn more from flexible neighbor information both in training stage and testing
stage, and provide a better classification result.
Existing System
K-Nearest-Neighbors (KNN)

 The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for more
complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines
(SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a
variety of applications such as economic forecasting, data compression and genetics. For
example, KNN was leveraged in a 2006 study of functional genomics for the assignment of genes
based on their expression profiles.
 KNN falls in the supervised learning family of algorithms. Informally, this means that we are
given a labeled dataset consisting of training observations (x,y) (x,y) and would like to capture
the relationship between xx and yy. More formally, our goal is to learn a function h:X→Yh:X→Y
so that given an unseen observation xx, h(x) h(x) can confidently predict the corresponding output
y. The KNN classifier is also a non parametric and instance-based learning algorithm.
Existing System Contd;
How does KNN work?

 In the classification setting, the K-nearest neighbor algorithm essentially boils down to
forming a majority vote between the K most similar instances to a given “unseen”
observation. Similarity is defined according to a distance metric between two data points.
 A popular choice is the Euclidean distance given by
d(x,x′)=sqrt((x1−x′1)2+(x2−x′2)2+…+(xn−x′n)2) .

 Disadvantages of Existing System

 K-NN is slow algorithm
 Curse of Dimensionality
 K-NN needs homogeneous features
 Optimal number of neighbors
 Imbalanced data causes problems
 Outlier sensitivity
 Missing Value treatment
Proposed System

 Various kinds of k-Nearest Neighbor (KNN) based classification methods are the bases of
many well established and high-performance pattern recognition techniques. However,
such methods are vulnerable to parameter choice. Essentially, the challenge is to detect the
neighborhood of various datasets while ignoring the data characteristics.
 This introduces a new supervised classification algorithm, Natural Neighborhood Based
Classification Algorithm (NNBCA).
 Findings indicate that this new algorithm provides a good classification result without
artificially selecting the neighborhood parameter. Unlike the original KNN-based method ,
which needs a prior k, NNBCA predicts different k for different samples.
 Therefore, NNBCA is able to learn more from flexible neighbor information both in the
training and testing stages. Thus, NNBCA provides a better classification result than other
methods.
Proposed System Contd;
Advantages of Proposed System

 NaN method can create an applicable neighborhood graph based on the local
characteristics of various data sets. This neighborhood graph can identify the
basic clusters in the data set, especially manifold clusters and noises.
 This method can provide a numeric result named NaN Eigenvalue (NaNE) to
replace the parameter k in the traditional KNN method, and the number of NaNE
is dynamically chosen for different data sets.
 The NaN number of each point is flexible , and this value is a dynamic number
ranging from 0 to NaNE. The center point of the cluster has more neighbors, and
the neighbor number of noise is equal to 0.
Modules :-

 Libraries
• Tkinter
• Matplotlib
• Statistics
 Reading the training data
 Reading the testing data
 Finding k
 Testing new data
Modules description:-
 Tkinter library
Tkinter is the standard GUI library for Python. Python when combined
with Tkinter provides a fast and easy way to create GUI applications. Tkinter provides a
powerful object-oriented interface to the Tk GUI toolkit.

 Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of
arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays.

 Statistics
The mode() is used to locate the central tendency of numeric or nominal data.
Modules description:-
 Reading the training data
It reads the already available given data sets

 Reading the test data

It reads the given test data

 Finding the K
It will predict the k value using the training data and testing data

 Testing the data

using the predicted parameter k value it classifies the data set class.
Use case diagram:-
Class diagram
Activity diagram
Sequence diagram
Deployment diagram
Results:

Fig1 Home page

Fig2 Input datasets
Fig3 Notification for invalid dataset
Fig4 Predicting k
Fig5 Testing
Fig6 Notification for invalid test data
Fig7 Predicting class
Fig8 : Example dataset of training data for manufacturing glass
Conclusion:
This project proposes a novel parameter free classification algorithm called NNBCA, and
the problem of parameter k selection in the training and testing stages is solved perfectly
by using the NaN method. Experimental results show that our classification result
intuitively reflects the characteristics of data sets. Moreover, compared with the KNN and
ENN algorithms, our algorithm increases the accuracy of classification results, adapts
well to different kinds of data sets, and solves the neighborhood selection problem, which
means that the value of parameter k is no longer needed.

NaN method is a new concept of the nearest neighbor method, and we have used
this method in clustering, outlier detection, classification. The improved algorithms
obtained excellent results by using the toolkits of the NaN method. In the proposed
algorithm, we use the toolkit of NaNG to achieve improved classification accuracy, and
further work should address the reduction of the complexity of the proposed algorithm
and NaN-based incorporation of better application areas.
Future Scope:
The outcomes in this research are based on results that involve only sample datasets. It is
necessary that additional datasets should be considered for the evaluation of different
classification problems as the information growth in the recent technology is extending to
heights beyond assumptions .Recent field of technology is growing and data are by
nature dynamic.
Hence , further classification of the entire system needs to be implemented right
from the scratch since the results from the old process have become obsolete. The scope
of future work can deal with Incremental learning, which stores the existing model and
processes the new incoming data more efficiently. More specifically, the models with
incremental learning can be used in categorization process .
Software & hardware requirement :-
Software :-

• Python 3.7
• Spyder IDE(Any python IDE)

Hardware :-
• Processor ( i3 or above)
• Ram ( 4 gb or above)
• Storage (250 gb or above)
References
1. Big data mining and analytics issn 2096-0654 01/06 pp257–265 Volume 1, Number 4,
December 2018 DOI: 10.26599/BDMA.2018.9020017
2. Z.H. Zhou , N.V. Chawla , Y.C.Jin , and G.J.Williams , Big data opportunities and
challenges: Discussions from data analytics perspectives, IEEE Comput. Intell. Mag.,
vol. 9, no. 4, pp. 62–74, 2014.
3. Y. T. Zhai, Y. S. Ong, and I. W. Tsang, The emerging “big dimensionality”, IEEE
Comput. Intell. Mag., vol. 9, no. 3, pp. 14–26, 2014.
4. T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory,
vol. 13, no. 1, pp. 21–27, 1967.
5. H. Zhang, A. C. Berg, M. Maire, and J. Malik, SVMKNN: Discriminative nearest
neighbor classification for visual category recognition, in Proc. 2006 IEEE Computer
Society Conf. Computer Vision and Pattern Recognition, New York, NY, USA, 2006, pp.
2126–2136.
6. D. Lunga and O. Ersoy, Spherical nearest neighbor classification: Application to
hyperspectral data, in Machine Learning and Data Mining in Pattern Recognition, P.
Perner, ed. Springer, 2011.
Thank you

Generative AI and ChatGPT 101
100% (1)
Generative AI and ChatGPT 101
27 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
RM
No ratings yet
RM
339 pages
Unit 4 - Domain Testing
100% (1)
Unit 4 - Domain Testing
76 pages
Arduino OBD2 Simulator - 3 Steps - Instructables
100% (1)
Arduino OBD2 Simulator - 3 Steps - Instructables
7 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Cross Sections Creating Annotating and Volumes Practice Workbook
No ratings yet
Cross Sections Creating Annotating and Volumes Practice Workbook
29 pages
Best Technical Proposal For CCTV and Access Control
74% (34)
Best Technical Proposal For CCTV and Access Control
6 pages
Architectural Mapping Using Data Flow
100% (5)
Architectural Mapping Using Data Flow
5 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
OSDB Prechecks
No ratings yet
OSDB Prechecks
2 pages
K-Means Consistency in Clustering
No ratings yet
K-Means Consistency in Clustering
10 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
SWR302
No ratings yet
SWR302
287 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Unit 4: User Interface Design
No ratings yet
Unit 4: User Interface Design
51 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
17 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Keil v5 STM32F446 Project
0% (1)
Keil v5 STM32F446 Project
14 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
IMAJOR DS Vol5 4
No ratings yet
IMAJOR DS Vol5 4
14 pages
Internet of Things Comparative Study
No ratings yet
Internet of Things Comparative Study
3 pages
K Nearest Neighbor (KNN)
No ratings yet
K Nearest Neighbor (KNN)
9 pages
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
No ratings yet
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
5 pages
new90程梅洁电子商务 202111080313
No ratings yet
new90程梅洁电子商务 202111080313
12 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
Research Paper
No ratings yet
Research Paper
6 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Session 9 KNN - 2024
No ratings yet
Session 9 KNN - 2024
23 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
CH 2
No ratings yet
CH 2
30 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Lecture 38 KNN
No ratings yet
Lecture 38 KNN
4 pages
K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
KNN
No ratings yet
KNN
53 pages
Dti Unit 4
No ratings yet
Dti Unit 4
6 pages
Week 07
No ratings yet
Week 07
24 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Classification (K-Nearest Neighbor)
No ratings yet
Classification (K-Nearest Neighbor)
22 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
VIG Node Replacement
No ratings yet
VIG Node Replacement
8 pages
Festo MotionControl
No ratings yet
Festo MotionControl
24 pages
Catalyst Plug-In For SAP HANA
No ratings yet
Catalyst Plug-In For SAP HANA
34 pages
Slide 20 21 Explanation
No ratings yet
Slide 20 21 Explanation
9 pages
445 Lecture 5
No ratings yet
445 Lecture 5
28 pages
J B Institute of Engineering and Technology: Course Plan For Software Testing Methodologies
No ratings yet
J B Institute of Engineering and Technology: Course Plan For Software Testing Methodologies
40 pages
Lecture Notes: Software Testing
No ratings yet
Lecture Notes: Software Testing
116 pages
Manual MLT 1 MLT 2 Cat 200 Foundation Fieldbus Communication Software 3rd Ed Rosemount en 69940
No ratings yet
Manual MLT 1 MLT 2 Cat 200 Foundation Fieldbus Communication Software 3rd Ed Rosemount en 69940
98 pages
SQL Commands: UPDATE Titles SET Title UPPER (LEFT (Title, 1) ) + LOWER (RIGHT (Title, LEN (Title) - 1) )
No ratings yet
SQL Commands: UPDATE Titles SET Title UPPER (LEFT (Title, 1) ) + LOWER (RIGHT (Title, LEN (Title) - 1) )
10 pages
Logcat CSC Update Log
No ratings yet
Logcat CSC Update Log
1,524 pages
DAX Interview Question Answers
No ratings yet
DAX Interview Question Answers
2 pages
Unit 1: Basic Python Programs, Functions
No ratings yet
Unit 1: Basic Python Programs, Functions
12 pages
Assignment 10
No ratings yet
Assignment 10
3 pages
Network Design: Draft v3.1
No ratings yet
Network Design: Draft v3.1
55 pages
Recovery System: Solutions To Practice Exercises
No ratings yet
Recovery System: Solutions To Practice Exercises
3 pages
Introduction To Requirement Engineering Requirements:: 1. Milk
0% (1)
Introduction To Requirement Engineering Requirements:: 1. Milk
2 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Direct File
No ratings yet
Direct File
48 pages
Nightingale 06
No ratings yet
Nightingale 06
14 pages
OS Lab Manual
No ratings yet
OS Lab Manual
33 pages
Relations Functions
No ratings yet
Relations Functions
20 pages
EZConfig 4.4 Release Notes
No ratings yet
EZConfig 4.4 Release Notes
12 pages
Success Stories: Code Care
No ratings yet
Success Stories: Code Care
98 pages
Chapter 4 DDC
No ratings yet
Chapter 4 DDC
38 pages
FIDES GroundSlab e
No ratings yet
FIDES GroundSlab e
37 pages
Experiment No 2: AIM: Program To Implement Binary Search
No ratings yet
Experiment No 2: AIM: Program To Implement Binary Search
21 pages
Campus Cloud
No ratings yet
Campus Cloud
52 pages
Web Based Recommendation
No ratings yet
Web Based Recommendation
46 pages
Campus Recruitment Management System
No ratings yet
Campus Recruitment Management System
46 pages
Road Accidents Dashboard: Submitted by
No ratings yet
Road Accidents Dashboard: Submitted by
10 pages
008987690
No ratings yet
008987690
2 pages
SANGFOR - IAM - v12.0.42 - Version Release Notes
No ratings yet
SANGFOR - IAM - v12.0.42 - Version Release Notes
9 pages
NAVCOM StarFire GPS Network PDF
No ratings yet
NAVCOM StarFire GPS Network PDF
2 pages
Lokesh (1) With Intern PDF
No ratings yet
Lokesh (1) With Intern PDF
2 pages
7 Challenges in IoT and How To Overcome Them
No ratings yet
7 Challenges in IoT and How To Overcome Them
8 pages

Neighbor Consistency in Clustering

Uploaded by

Neighbor Consistency in Clustering

Uploaded by

k-NEAREST-NEIGHBOR CONSISTENCY IN DATA

 Disadvantages of Existing System

 Reading the test data

 Testing the data

Fig1 Home page

You might also like