0% found this document useful (0 votes)
8 views12 pages

jut2

The document presents a project on predicting cardiovascular disease using machine learning algorithms, highlighting the significance of early diagnosis to reduce mortality rates. It outlines the methodology involving data collection, preprocessing, and the use of classification algorithms like Naïve Bayes, Decision Tree, and Random Forest to assess the risk of heart disease. The research aims to identify the most effective algorithm for accurate predictions, contributing to better healthcare decisions.

Uploaded by

Tina Solanki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views12 pages

jut2

The document presents a project on predicting cardiovascular disease using machine learning algorithms, highlighting the significance of early diagnosis to reduce mortality rates. It outlines the methodology involving data collection, preprocessing, and the use of classification algorithms like Naïve Bayes, Decision Tree, and Random Forest to assess the risk of heart disease. The research aims to identify the most effective algorithm for accurate predictions, contributing to better healthcare decisions.

Uploaded by

Tina Solanki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Cardiovascular Disease Prediction

By
Antima Solanki (2001320130023)

Submitted to the Department of Information Technology


In partial fulfillment of the requirements
For the degree of

BACHELOR OF TECHNOLOGY

Project Coordinators
Dr. Ajay Kumar Sahu
Dr. Shivani Dubey

Greater Noida Institute of Technology, Greater Noida


D
Dr. A.P.J. Abdul Kalam Technical University, Lucknow

1
Introduction

According to the World Health Organization, every year 12 million deaths occur worldwide
due to Heart Disease. Heart disease is one of the biggest causes of morbidity and mortality
among the population of the world. Prediction of cardiovascular disease is regarded as one of
the most important subjects in the section of data analysis. The load of cardiovascular disease
is rapidly increasing all over the world from the past few years. Many researches have been
conducted in attempt to pinpoint the most influential factors of heart disease as well as
accurately predict the overall risk. Heart Disease is even highlighted as a silent killer which
leads to the death of the person without obvious symptoms. The early diagnosis of heart
disease plays a vital role in making decisions on lifestyle changes in high-risk patients and in
turn reduces the complications. Machine learning proves to be effective in assisting in
making decisions and predictions from the large quantity of data produced by the health care
industry. This project aims to predict future Heart Disease by analyzing data of patients
which classifies whether they have heart disease or not using machine-learning algorithm.
Machine Learning techniques can be a boon in this regard. Even though heart disease can
occur in different forms, there is a common set of core risk factors that influence whether
someone will ultimately be at risk for heart disease or not. By collecting the data from
various sources, classifying them under suitable headings & finally analysing to extract the
desired data we can say that this technique can be very well adapted to do the prediction of
heart disease.

The main motivation of doing this research is to present a heart disease prediction model for
the prediction of occurrence of heart disease. Further, this research work is aimed towards
identifying the best classification algorithm for identifying the possibility of heart disease in a
patient. This work is justified by performing a comparative study and analysis using three
classification algorithms namely Naïve Bayes, Decision Tree, and Random Forest are used at
different levels of evaluations. Although these are commonly used machine learning
algorithms, the heart disease prediction is a vital task involving highest possible accuracy.
Hence, the three algorithms are evaluated at numerous levels and types of evaluation
strategies. This will provide researchers and medical practitioners to establish a better.

2
Literature Review

The purpose of a literature survey is to gain a comprehensive understanding of the current


state of knowledge, identify gaps or inconsistencies in existing research, and provide a
foundation for the research project or study. The literature we use are as follows:-

[1] Purushottam ,et ,al proposed a paper “Efficient Heart Disease Prediction System” using
hill climbing and decision tree algorithms .They used Cleveland dataset and preprocessing of
data is performed before using classification algorithms. The Knowledge Extraction is done
based on Evolutionary Learning (KEEL), an opensource data mining tool that fills the
missing values in the data set.A decision tree follows top-down order. For each actual node
selected by hill-climbing algorithm a node is selected by a test at each level. The parameters
and their values used are confidence. Its minimum confidence value is 0.25. The accuracy of
the system is about 86.7%

[2] Santhana Krishnan. J ,et ,al proposed a paper “Prediction of Heart Disease Using Machine
Learning Algorithms” using decision tree and Naive Bayes algorithm for prediction of heart
disease. In decision tree algorithm the tree is built using certain conditions which gives True
or False decisions. The algorithms like SVM, KNN are results based on vertical or horizontal
split conditions depends on dependent variables. But decision tree for a tree like structure
having root node, leaves and branches base on the decision made in each of tree Decision tree
also help in the understating the importance of the attributes in the dataset. They have also
used Cleveland data set. Dataset splits in 70% training and 30% testing by using some
methods. This algorithm gives 91% accuracy. The second algorithm is Naive Bayes, which is
used for classification. It can handle complicated, nonlinear, dependent data so it is found
suitable for heart disease dataset as this dataset is also complicated, dependent and nonlinear
in nature. This algorithm gives an 87% accuracy.

[3] Sonam Nikhar et al proposed paper “ Prediction of Heart Disease Using Machine
Learning Algorithms” their research gives point to point explanation of Naïve Bayes and
decision tree classifier that are used especially in the prediction of Heart Disease. 3 Some
analysis has been led to think about the execution of prescient data mining strategy on the
same dataset, and the result decided that Decision Tree has highest accuracy than Bayesian
classifier

3

Objectives and Opportunities

The Main objective of this project are as follows:

Phase 1: This phase we will collect the data of the heart and will also about all of the
healthy data of the heart and will define the structure and will be available for the
design. In this phase we also try to apply the needed algorithm.

Phase 2: In this phase we will make out all of the project on the web site available and
let it will be available for the users

4
Methodology
EXISTING SYSTEM:

Heart disease is even being highlighted as a silent killer which leads to the death of a person
without obvious symptoms. The nature of the disease is the cause of growing anxiety about
the disease & its consequences. Hence continued efforts are being done to predict the
possibility of this deadly disease in prior. So that various tools & techniques are regularly
being experimented with to suit the present-day health needs. Machine Learning techniques
can be a boon in this regard. Even though heart disease can occur in different forms, there is a
common set of core risk factors that influence whether someone will ultimately be at risk for
heart disease or not. By collecting the data from various sources, classifying them under
suitable headings & finally analysing to extract the desired data we can conclude. This
technique can be very well adapted to the do the prediction of heart disease. As the well-
known quote says “Prevention is better than cure”, early prediction & its control can be
helpful to prevent & decrease the death rates due to heart disease.

PROPOSED SYSTEM

The working of the system starts with the collection of data and selecting the important
attributes. Then the required data is preprocessed into the required format. The data is then
divided into two parts training and testing data. The algorithms are applied and the model is
trained using the training data. The accuracy of the system is obtained by testing the system
using the testing data. This system is implemented using the following modules.

1.) Collection of Dataset

2.) Selection of attributes

3.) Data Pre-Processing

4.) Balancing of Data

5.) Disease Prediction

Below is the correlation matrix.

5
Fig(1): KNN matrix of the heart data

The algorithms we can use as follows

1) SUPPORT VECTOR MACHINE (SVM): Support Vector Machine or SVM is one of


the most popular Supervised Learning algorithms, which is used for Classification as well as
Regression problems. However, primarily, it is used for Classification problems in Machine
Learning. The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future.

This best decision boundary is called a hyperplane.SVM chooses the extreme points/vectors
that help in creating the hyperplane. These extreme cases are called support vectors, and
hence the algorithm is termed as Support Vector Machine.

2)NAIVE BAYES ALGORITHM: Naive Bayes algorithm is a supervised learning


algorithm, which is based on Bayes theorem and used for solving classification problems.It is
mainly used in text classification that includes a high-dimensional training dataset.

Naive Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions. It
is a probabilistic classifier, which means it predicts on the basis of the probability of an

6
object. Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

3)DECISION TREE ALGORITHM: Decision Tree is a Supervised learning technique that


can be used for both classification and regression problems, but mostly it is preferred for
solving classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.

In a Decision Tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches. The
decisions or the test are performed on the basis of features of the given dataset. It is a
graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.

It is called a Decision Tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure. In order to build a tree, we
use the CART algorithm, which stands for Classification and Regression Tree algorithm

4)RANDOM FOREST ALGORITHM: Random Forest is a supervised learning algorithm.


It is an extension of machine learning classifiers which include the bagging to improve the
performance of Decision Tree. It combines tree predictors, and trees are dependent on a
random vector which is independently sampled. The distribution of all trees are the same.
Random Forests splits nodes using the best among of a predictor subset that are randomly
chosen from the node itself, instead of splitting nodes based on the variables. The time
complexity of the worst case of learning with Random Forests is O(M(dnlogn)) , where M is
the number of growing trees, n is the number of instances, and d is the data dimension

7
Designing of the system
Dataset collection is collecting data which contains patient details. Attributes selection
process selects the useful attributes for the prediction of heart disease. After identifying the
available data resources, they are further selected, cleaned, made into the desired form.
Different classification techniques as stated will be applied on preprocessed data to predict
the accuracy of heart disease. Accuracy measure compares the accuracy of different
classifiers

Fig(2) Design of system

Hardware Requirements:

Processer : Any Update Processer

Ram : Min 4GB

Hard Disk : Min 100GB

Software Requirements:

8
IDE : Google collab

Language : Python

Algorithms will be used

REFERENCES:

For PHP:
1}https://HYPERLINK"http://www.w3schools.com/php/default.asp"www.w3schools.com/
php/defaut.asp

2)https://HYPERLINK "http://www.sitepoint.com/php/"www.sitepoint.com/php/

3)https://HYPERLINK "http://www.php.net/"www.php.net/

For MySQL:
1) https://HYPERLINK "http://www.mysql.com/"www.mysql.com/

2)http://www.mysqltutorial.org

For XAMPP:
1)https://HYPERLINK

"http://www.apachefriends.org/download.html"www.apachefriends.org/download.html

2)Wangkhem, K., & Joshi, K. IOT FOR HEALTHCARE AND ITS CHALLENGES.
International Educational Journal of Science and Engineering (IEJSE) –Volume, 1.

3)Kaur, J., S., Ganjoo, P., Vaqur, M., & Joshi, K. A Review: Image Fusion using DCT and
DWT. International Journal of Scientific & Engineering Research (IJSER)- Volume, 10, 702-
707.

4)Joshi, K., Kashyap, D., Bisht, B., &Bagwari, A. GPS based Location Tracker: A Review.
International Journal of Advanced Research in Computer and Communication Engineering
(IJARCCE)- Volume, 8.

9
5)Joshi, K., Joshi, K, N., Diwakar, M. Image Fusion using Cross Bilateral Filter and Wavelet
Transform Domain. International Journal of Engineering and Advanced Technology
(IJEAT)- Volume, 8, 110-115.

6)Kumar, R., Singh, G., Joshi, K. Emotion Recognition System using Local Binary Pattern.
International Journal of Inventive Engineering and Sciences (IJIES)- Volume, 5.

7) Shubham Bobde, Suraj Chaudhari, Suraj Chaudhari and Jagupati Golguri,”Web Based
Online Examination System” ,Volume 2, April 2017.

1
0
WEBSITES:
• http://www.codeproject.com

• http://www.udemy.com

• http://www.learnpython.org

• http://www.support.microsoft.com

• http://www.codeacademy.com

• http://www.w3schools.com

1
1
1
2

You might also like