0% found this document useful (0 votes)
134 views10 pages

Earlier Prediction of Heart Disease Using Locality Sensitive Hashing

This document discusses using locality sensitive hashing (LSH) to improve heart disease prediction from medical data. LSH is a technique that hashes similar data points to the same "buckets", making it easier to find similar observations. The proposed system would apply LSH to a preprocessed heart disease dataset to build a classification model for predicting patient risk. This could provide more accurate early detection compared to other techniques like decision trees or neural networks, which have limitations like high computational costs, inability to handle certain data types, or poor performance with small datasets. The applications of LSH mentioned are near-duplicate detection, genome analysis, image search, and multimedia fingerprinting.

Uploaded by

vhdhanabal3339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views10 pages

Earlier Prediction of Heart Disease Using Locality Sensitive Hashing

This document discusses using locality sensitive hashing (LSH) to improve heart disease prediction from medical data. LSH is a technique that hashes similar data points to the same "buckets", making it easier to find similar observations. The proposed system would apply LSH to a preprocessed heart disease dataset to build a classification model for predicting patient risk. This could provide more accurate early detection compared to other techniques like decision trees or neural networks, which have limitations like high computational costs, inability to handle certain data types, or poor performance with small datasets. The applications of LSH mentioned are near-duplicate detection, genome analysis, image search, and multimedia fingerprinting.

Uploaded by

vhdhanabal3339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Earlier Prediction of Heart Disease using Locality Sensitive Hashing

Data Mining is an interesting field of research whose significant goal is to discover


interesting and useful patterns from large volume of data sets. The change in life style and
hereditary reasons, nowadays, health diseases are increasing drastically. Especially, a heart
disease has become widespread recently. About 17.3 million deaths per year are due to heart
disease and it is ranked high as the major cause of death in the world. Based on a person’s age,
blood pressure, cholesterol, pulse, gender etc.,, heart disease can be predicted using data mining
classification technique . The risk level of each person can be easily assessed by using Locality
Sensitive Hashing technique. LSH refers to a family of functions (known as LSH families) to
hash data points into buckets so that data points near each other are located in the same buckets
with high probability, while data points far from each other are likely to be in different buckets.
This makes it easier to identify observations with various degrees of similarity This Paper focuses
around the prediction of heart disease accuracy value using the LSH technique.
Key Words: Data mining, Heart Disease, Heart Disease Dataset, Classification, LSH.

Introduction

In day to day life many factors that affect a human heart. Many problems are occurring
at a rapid pace and new heart diseases are rapidly being identified. In today’s world of stress
Heart, being an essential organ in a human body which pumps blood through the body for the
blood circulation is essential and its health is to be conserved for a healthy living. Heart failure is
also an outcome of heart disease, and breathlessness can occur when the heart becomes too weak
to circulate blood. Some heart conditions occur with no symptoms at all, especially in older
adults and individuals with diabetes. The term 'congenital heart disease' covers a range of
conditions, but the general symptoms include sweating, high levels of fatigue, fast heartbeat and
breathing, breathlessness, chest pain. However, these symptoms might not develop until a person
is older than 13 years. In these type of cases, the diagnosis becomes an intricate task requiring
great experience and high skill.
Data Mining is a task of extracting the vital decision making information from a
collective of past records for future analysis or prediction. The medical data mining made a
possible solution to integrate the classification techniques and provide computerised training on
the dataset that further leads to exploring the hidden patterns in the medical data sets which is
used for the prediction of the patient’s future state.
In this research work, the supervised machine learning concept is utilized for making the
predictions. A comparative analysis of the three data mining classification algorithms namely k-
NN, SVM and LSH are used to make predictions. The analysis is done at several levels of Model
building time, correctly classified instances, incorrectly classified instance and accuracy %. The
StatLog dataset from UCI machine learning repository is utilized for making heart disease
predictions in this research work. The predictions are made using the classification model,
Locality Sensitive Hashing that is built from the classification algorithms when the heart disease
dataset is used for training. This final model can be used for prediction of any types of heart
diseases
Literature Survey
Data mining has been played an important role in the intelligent medical systems. Data
mining plays a vital role for the healthcare industry that helps health systems to effectively use
data and analytics to recognize inefficiencies and best ways that reduce costs and improve care.
The main disadvantage of implementing data mining techniques and analysis strategies
effectively is the adoption of technology and the complexity of healthcare. Also, the data
generated by the healthcare activities are more complex and huge, it is impractical for un-
automated analysis. For identifying the disease followed by effective treatment, data mining
techniques are more important for the entire patient and the stake holders.
The applications of machine learning techniques were examined by various researchers
previously. However, most of the studies are focusing on specific impact of those machines
learning techniques rather than optimizing these techniques. Hybrid methods were also proposed
to enhance the optimization.
Dangare et al. used feature selection for [1] predicting heart disease using Neural
network. Jabbar et al. used feature selection methods such as symmetrical uncertainty,
information gain and genetic algorithm . His proposed a method uses feature subset selection
and associative classification for risk score of disease [2]. Krishnaiah V(3) and Kumar (4)
proposed a method that uses fuzzy logic along with KNN for diagnosing of heart disease. The
performance of their algorithms was improved by discretization and filtering techniques. Syed et
al predicted the heart disease by using genetic neural networks (5). In [6] authors proposed
prediction of heart disease using genetic neural networks. Experiments were done on American
heart association data set. Their approach recorded an accuracy of 96.2%. Masethe et al. [7]
proposed a model using decision tree for heart disease prediction. Following are some of the data
mining techniques and its drawbacks while implementing in earlier prediction of diseases.
Palaniappan, et al. [8] have carried out a research work and have built a model known as
Intelligent Heart Disease Prediction System (IHDPS) by using several data mining techniques
such as Decision Trees, Naïve Bayes and Neural Network.
Shantakumar, et al. [9] have done a research work in which the intelligent and effective
heart attack prediction system is developed using Multi-Layer Perceptron with Back-
Propagation. Accordingly, the frequency patterns of the heart disease are mined with the MAFIA
algorithm based on the data extracted.
Yanwei, et.al [10] have built a classification method based on the origin of multi
parametric features by assessing HRV (Heart Rate Variability) from ECG and the data is pre-
processed and heart disease prediction model is built that classifies the heart disease of a patient.
Decision Tree

 Some decision trees can only deal with binary – valued target classes. Others are able to
assign records to an arbitrary no. of classes, but are error-prone when the no. of training
examples per class gets small. This can happen rather quickly in a tree with many levels
and/or many branches per node.
 The process of growing a decision tree is computationally expensive. At each node, each
candidate splitting field is examined before its best split can be found.
Association Rule
 This algorithm is for discovering frequent sets are not directly suitable, when the
underlying database is incremented intermittently.
 Discovery of poorly understandable rules
Naïve Bayes
 The main disadvantage is that it can’t learn interactions between features.
 In classification task we need a big data set in order to make reliable estimations of the
probability of each class.
 We can use Naïve Bayes classification algorithm with a small data set but precision and
recall will keep very low
Support Vector Machines
 Problem need to be formulated as 2-class classification
 Difficult to understand the learned function (weights).
 Learning takes long time (QP Optimization).
Neural Network
 Neural Networks cannot be retrained. If you add data later, this is almost impossible to
add to an existing network.
 Handling of time series data in neural networks is a very complicated topic

PROPOSED SYSTEM
One of the major drawbacks of these works is that the main focus has been on the
application of classification techniques for heart disease prediction, rather than studying various
data cleaning and pruning techniques that prepare and make a dataset suitable for mining. It has
been observed that a properly cleaned and pruned dataset provides much better accuracy than an
unclean one with missing values. Selection of suitable techniques for data cleaning along with
proper classification algorithms will lead to the development of prediction systems that give
enhanced accuracy.

So in our proposed work, we plan to implement Locality Sensitive Hashing technique for
better classification. The problem LSH solves is that finding nearest neighbors is a very
expensive, both in time and space when operating in large feature spaces. It hashes input vectors
(e.g. bag-of-word vectors) in a way such that similar vectors are likely to have the same hashes.
Because of this property, lookup of near neighbors becomes a very efficient operation. The most
important applications for LSH is usually in high-dimensional spaces. The other applications of
the proposed method is

 Near-duplicate detection: LSH is commonly used to duplicate large quantities of


documents, webpages, and other files.
 Genome-wide association study: Biologists often use LSH to identify similar gene
expressions in genome databases.
 Large-scale image search: Google used LSH along with PageRank to build their image
search technology Visual Rank.
 Audio/video fingerprinting: In multimedia technologies, LSH is widely used as a
fingerprinting technique A/V data.
Proposed architecture
The following figure 1 shows the proposed architecture for the early prediction of heart
disease.

Input : Dataset

Preprocessing

Splitting

Training Dataset Testing Dataset

Training (DT, KNN


Normal
etc.,
LSH
Classifier
Abnormal

Figure 1. Proposed Architecture

To enhance the performance of the classifier, the following algorithm is proposed.

Input: Heart disease data set HD

Output: Classification of data set into patients with heart disease and normal

Step 1: Input HD

Step 2: Apply pre-processing techniques-Fill in missing values


Step 3: Apply LSH Classifier

Step 4: Hash all n points from the data set S into each of the L hash tables.

Step 6: Based on the query q, the algorithm iterates over the L hash function g.

Step 7: Finally classify the queried data into normal and abnormal.

Algorithm 1. Heart disease prediction using Locality Sensitive Hashing

Algorithm takes the heart disease dataset and classify whether a person is having heart in
normal condition or in abnormal condition. The proposed algorithm works in two ways, first ,
preprocessing is done for filling the missing values followed by feature reduction. Then the
dataset is given as input to the proposed method. IN the second phase, LSH will find the similar
results based on the given query.

Experimental Results

In this paper, we used Locality Sensitive Hashing classifier for predicting Heart disease.
The main goal of this paper is to compare our results with different classification model. For that,
we have compared our result with k-Nearest Neighbor and Support Vector Machine. To
implement our proposed algorithm, Heart Disease dataset is taken from UCI repository dataset. It
consists of 270 instances and 14 features. This is shown in Table 1.

No. Attribute Type Description Range


Name
1 Age Numeric Age in years 29-65
2 Sex Nominal Sex in Male = 0,
number Female = 1
3 Cp Nominal Chest pain typical angina
type = 1, atypical
angina = 2,
non-anginal
pain = 3,
asymptomatic =
4
4 trestbpd Numeric Resting blood 92-200
pressure
5 serumCho Numeric serum 126-564
cholesterol in
mg/dl
6 fbs Nominal Fasting blood Yes =1, No = 0
sugar level
7 restecg Nominal Resting Normal = 0,
electrocardio having ST-T
graphic wave
results abnormality=1,
showing
probable or
definite left
ventricular
hypertrophy =
2
8 thalach Numeric Maximum 82-185
heart rate
achieved
9 exang Nominal Exercise Yes = 1, No = 0
induced
angina
10 oldpeak Numeric ST 71-202
depression
induced by
exercise
11 peakSlope Numeric the slope of 1-3
the peak
exercise ST
segment
12 numVessels Numeric number of 0-3
major vessels
(0-3)
coloured by
fluoroscopy
13 thal Nominal The defect 3 = normal; 6 =
type of the fixed defect; 7
heart = reversible
defect
14 Disease Nominal Identification Yes=2, No=1
of a heart
attack.
Table 1 Heart Disease Dataset – Attributes.

Table 2 shows the experimental result. Experiments are carried out to evaluate the usefulness and
the performance of different classification algorithm for predicting heart disease.
Evaluation Criteria Classifiers
K-NN SVM LSH
Model Building Time(in 0.25 0.9 0.6
sec)
Correctly classified 243 247 262
instances
Incorrectly Classified 27 23 8
Instances
Accuracy % 90% 91.48% 97.03%

Table 2 Performance Classifier

The following chart reveals the performance analysis of the LSH compared with the K-
NN and SVM models in terms of Model building time, Correctly classified instances, Incorrectly
classified instances and Accuracy %.
Figure 2 . a) Comparison chart of Model building Time b) a) Comparison chart of Correctly
classified instances c) Incorrectly Classified Instances d) Accuracy%
From the above figure, it is clear that that Model building time is very less ie., 0.6 sec in the case
of LSH compared to 0.25 of k-NN and 0.9 of SVM. Similarly, out of 270 instances , our
proposed method exactly classifies 262 instances with the accuracy of 97.03% compared with
90% and 91.48% of k-NN and SVM.

Conclusion

In this paper, different classifiers are studied and the experiments are conducted to find
the best classifier for predicting the patient of heart disease. We proposed an approach to predict
the heart diseases using machine learning techniques. Three techniques, k-NN , SVM and LSH
are compared. The results show that the proposed method LSH outperforms compared to other
two classifiers. Unlike conventional computer hashes that are designed to return exact matches in
O(1) time, an LSH algorithm uses dot products with random vectors to quickly find nearest
neighbors. LSH provides a probabilistic guarantee that it will return the correct answer. In
systems that have other sources of error (perhaps due to mislabeled data) one can reduce the LSH
error below the error due to other sources, while significantly improving the computational
performance. This makes LSH in particular and randomized algorithms in general, important in
today’s world of Internet-sized databases. This study can be improvised by improving in terms of
feature reduction and using optimization techniques.

References

1. DangareA.Data mining approach for prediction of heart disease using neural


network.IJCET 2012; 3: 30-40.
2. Jabbar MA, Deekshatulu BL,Priti C. Prediction of risk score for heart disease using
associative classification and hybrid feature Selection. IEEE ISDA 2012; 628-634.

3. Krishnaiah V. Diagnosis of heart disease patients using fuzzy classification techniques.


ICCCT 2014; 1-7.

4. Kumar. Detection of heart disease using fuzzy logic. IJETT 2013; 4.


5. Syed R, Agarwal B. Genetic neural network based data mining in prediction of
heart disease using risk factors. IEEE Conference on ICT 2013.
6. Latha Parthiban and R. Subramanian, “Intelligent Heart Disease Prediction System using
CANFIS and Genetic Algorithm”, International Journal of Biological, Biomedical and
Medical Sciences, Vol. 3, No. 3, pp. 1-8, 2008.

7. Masethe HD, Masathe MA. Prediction of heart disease using classification


algorithm.Wcess 2014.

8. Sellappan Palaniappan and Rafiah Awang, “Intelligent Heart Disease Prediction System
using Data Mining Techniques”, International Journal of Computer Science and Network
Security, Vol. 8, No. 8, pp. 1-6, 2008.

9. Shantakumar B. Patil and Y.S. Kumaraswamy, “Intelligent and Effective Heart Attack
Prediction System using Data Mining and Artificial Neural Network”, European Journal
of Scientific Research, Vol. 31, No. 4, pp. 642-656, 2009.

10. X. Yanwei et al., “Combination Data Mining Models with New Medical Data to Predict
Outcome of Coronary Heart Disease”, Proceedings of International Conference on
Convergence Information Technology, pp. 868-872, 2007.

You might also like