0% found this document useful (0 votes)
32 views16 pages

Predicting_Heart_Diseases_Using_Machine_Learning_a

ieee paper

Uploaded by

Arun K Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views16 pages

Predicting_Heart_Diseases_Using_Machine_Learning_a

ieee paper

Uploaded by

Arun K Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number

Predicting Heart Diseases Using Machine


Learning and Different Data Classification
Techniques
Hosam F. El-Sofany1,2
1
College of Computer Science, King Khalid University, Abha, Kingdom of Saudi Arabia
2
Cairo Higher Institute for Engineering, Computer Science and Management, Cairo, Egypt
Corresponding author: Hosam F. El-Sofany (e-mail: [email protected]).

ABSTRACT Heart disease (HD), including heart attacks, is a primary cause of death across the world. In
the area of medical data analysis, one of the most difficult problems to solve is determining the probability
of a patient having heart disease. Death rates can be lowered by the early detection of heart diseases and the
constant monitoring of patients by physicians. Unfortunately, heart disease cannot always be detected
accurately, and a doctor cannot be in touch with a patient 24/7. Machine learning (ML) has the potential to
aid in diagnostics by providing a more precise basis for prediction and making decisions using data given by
healthcare sectors throughout the world. This study aims to employ several feature selection methods to
develop an accurate ML technique for heart disease prediction in its earliest stages. The feature selection
process was performed using three distinct methods, namely, chi-square, analysis of variance (ANOVA), and
mutual information (MI). The three feature groups that were ultimately selected were referred to as SF-1, SF-
2, and SF-3, respectively. Then, ten different ML classifiers were used to determine the best technique, and
which feature subset was the greatest fit. These classifiers included Naive Bayes, support vector machine
(SVM), voting, XGBoost, AdaBoost, bagging, decision tree (DT), K-nearest neighbor (KNN), random forest
(RF), and logistic regression (LR), and they were denoted as (A1, A2, …, A10). The proposed approach for
predicting heart diseases was evaluated using a private dataset, a publicly available dataset, and multiple
cross-validation methods. To find the classifier that generates the best rate of accurate heart disease
predictions, we applied the Synthetic Minority Oversampling Technique (SMOTE) to fix the issue of
unbalanced data. The experimental findings demonstrated that the XGBoost classifier achieved the optimal
performance using the combined datasets and SF-2 feature subset with the following rates: 97.57% for
accuracy, 96.61% for sensitivity, 90.48% for specificity, 95.00% for precision, 92.68% for F1 score, and 98%
for AUC. The development of an explainable artificial intelligence approach that makes use of SHAP
methodologies is being done to get an understanding of how the system predicts its ultimate results. The
proposed technique had great promise for the healthcare sector to predict early-stage heart disease with cheap
cost and minimal time. Ultimately, the best ML method has been used to make a mobile app that lets users
enter HD symptoms and quickly receive a heart disease prediction.

INDEX TERMS Cardiovascular disease, heart disease, machine learning app, ML algorithms, SDG 3,
SHAP, SMOTE

I. INTRODUCTION collectively fall under the umbrella term cardiovascular


The heart is a muscular organ that represents the central disease (CVD), are brought on by disruptions in the regular
pumping organ of the circulatory system. It is responsible outflow of blood from the circulatory system. On a global
for pumping blood throughout the body and is one of the scale, heart diseases are continuously rated as the primary
components of the cardiovascular system. The term reason for people's death [1]. Heart disease and stroke
"cardiovascular system" can also refer to the system of account for 17.5 million annual deaths worldwide,
arteries, veins, and capillaries that carry blood throughout according to the World Health Organization's report. More
the body. Several distinct forms of heart illness, which than 75% of deaths caused by heart diseases take place

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

mostly in nations with middle and low income. In addition, following rates were achieved: 97.57% for accuracy,
heart attacks and strokes are responsible for 80 percent of 96.61% for sensitivity, 90.48% for specificity, 95.00%
all fatalities caused by CVDs [2]. for precision, 92.68% for F1 score, and 98% for AUC.
As stated in Sustainable Development Goal (SDG) 3 of the 4. To understand how the system predicts its outcomes, an
United Nations, each person should be healthy and happy, explainable artificial intelligence approach utilizing
this research investigates cardiovascular disease. Heart SHAP methodologies has been developed.
disease is often diagnosed by observing the patient's 5. The use of SMOTE to increase the overall number of
symptoms and conducting a physical examination. Some of balanced cases in the dataset is of additional importance
the risk factors for cardiovascular disease include smoking, to this study. The proposed technique is trained on a
age, heart disease history in the family, high cholesterol balanced dataset using SMOTE to increase the
levels, lack of time spent in physical activities, high blood performance of heart disease prediction.
pressure, obesity, diabetes, and stress [3]. Lifestyle 6. The ML techniques applied in this article were
modifications including stopping smoking, losing weight, additionally optimized with hyperparameters. We have
exercising, and managing stress might reduce some of these tuned the hyperparameters for all the ML classifiers. The
risk factors. Medical history, physical examination, and proposed method got 97.57% accuracy rates with
imaging tests including electrocardiograms, hyperparameters that were optimized when the combined
echocardiograms, cardiac MRIs, and blood tests are used to datasets and the SF-2 feature subset were used.
diagnose heart disease. Lifestyle adjustments, drugs, 7. Additionally, to identify the classifier that achieves the
medical treatments like angioplasty coronary artery bypass most accurate HD prediction rate, the study assessed 10
surgery, or implanted devices like pacemakers or distinct ML classification algorithms. The XGBoost
defibrillators can treat heart disease [4]. technique was identified as a highly accurate classifier to
It is now possible to construct prediction models for heart predict HD after assessing the performance of ten
disease with the assistance of the vast amounts of patient algorithms. The proposed app's capacity for adaptability
data that are easily accessible as a result of the growing is shown by applying a domain adaptation method. This
number of recent healthcare systems (also known as Big shows the ability of the proposed approach to be
Data in Electronic Health Record Systems). Machine implemented in various environments and communities,
learning is considered a data-sorting approach that analyzes in addition to the initial datasets used in this article.
large datasets from various viewpoints and then transforms Overall, this work introduces novel ideas and techniques that
the results into tangible knowledge [5]. significantly advance the field of ML-based HD prediction
The objective of the study is to provide an ML approach for systems. The healthcare sectors that are associated with heart
heart disease prediction. ML algorithms were evaluated on disease incidences in Egypt and Saudi Arabia may both benefit
large, open-access heart disease prediction datasets. Finally, from the research's findings.
the most accurate and dependable algorithm was chosen as
the final model for an Android mobile app. This study aims II. RELATED WORK AND COMPARATIVE STUDY
to construct an innovative machine learning technique that is
capable of properly classifying several high-definition A major death cause globally is heart disease. Accurate
datasets and then evaluate its performance in comparison to prediction of its likelihood can help in preventing it. ML
that of other first-rate models. The study provided the Algorithms have been proven to predict heart diseases
following important contributions: effectively based on various medical data parameters. This
1. One of the key contributions of this research is the use of section presents a review of current and previous research
a private HD dataset. Egyptian specialized hospitals that has utilized ML algorithms to predict heart diseases.
voluntarily provided 200 data samples between the years Several studies have utilized ML algorithms like SVM,
2022 and 2024. We were able to gather around 13 artificial neural network (ANN), DT, LR, and RF to analyze
features from these participants. medical data and predict heart diseases.
2. This work deals with the immediate requirement for early A recent study by [6] used models of ML to predict the risk
HD prediction in Egypt and Saudi Arabia, where the HD of cardiac disease in a multi-ethnic population. The authors
rate is rapidly increasing. Through the application of ML utilized a large dataset of electronic health record data and
classification algorithms to a combined dataset consisting linked it with socio-demographic information to stratify
of both CHDD and private datasets, the authors CVD risks. The models achieved high accuracy in predicting
developed a mobile-based app for the instantaneous CVD risk in the multi-ethnic population. Similarly, another
prediction of heart disease. study by [7] applied a deep learning (DL) algorithm to
3. This work makes an important contribution by predict coronary artery disease (CAD). The researchers
combining XGBoost and a semi-supervised model. This utilized clinical data and coronary computed tomography
method predicts HD accurately using a combined dataset. angiography (CCTA) images to train the DL model. The
It is a new method compared to earlier studies. The presented model achieved high accuracy in predicting the
research's stated goal was to predict HD using the presence of CAD. A study by [8] utilized different models of
combined datasets and the SF-2 feature subset. The ML for predicting CVD depending on clinical data. The
models used by the researchers included DTs, K-nearest

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

neighbor (KNN), and RFs. The authors reported high evaluates which algorithm is the best in terms of HD
accuracy in predicting CVD using these models. Likewise, a prediction.
study by [9] used ML techniques to determine what factors • Accurate prediction is challenging for the minority class
contribute to heart disease risk. The authors utilized the (HD-positive patients) due to imbalanced classes in HD
National Health and Nutrition Examination Survey prediction datasets. While some research has tried to
(NHANES) data to determine risk factors related to coronary solve this problem by employing oversampling or
heart disease. The authors reported that the proposed ML undersampling, an extensive evaluation of the methods
algorithm was effective in identifying risk factors. Another and how they affect prediction accuracy is necessary.
research study by [10] investigated different ML algorithms' The imbalanced classes issue is also addressed in the
accomplishments in predicting heart diseases. The authors proposed article, which eliminates this gap. To ensure
used several models, including ANN, DT, and LR. The that the dataset is balanced, SMOTE is used. The
authors reported that the models achieved high accuracy in effectiveness of SMOTE in enhancing the accuracy of
predicting heart diseases. HD predictions and its effects on the efficiency of
ML algorithms have become widely accepted in predicting different ML algorithms are examined in this work.
heart diseases and have shown high accuracies in various • There is a demand in the literature for practical apps that
studies. Considering medical data parameters like clinical can self-diagnose and detect HD. Mobile applications
data, socio-demographic information, and medical images, and other solutions have been recommended, but their
ML algorithms have been utilized to predict different heart efficacy, usability, and applicability to varied datasets
diseases such as CAD and CVD. The studies we have and demographics need additional study. The proposed
reviewed have showcased those models like DTs, DL, ANN, paper develops a smartphone app that allows users to
RF, and KNN can effectively predict heart diseases. With the enter HD-related symptoms for rapid predictions to fill
increasing advancements in ML algorithms, it is expected this gap. Usability, accessibility, and adaptability to
that more appropriate models and features will be developed varied datasets and demographics are the app's goals.
for accurate heart disease prediction. Domain adaptation is utilized to evaluate the proposed
Previous studies on HD prediction have shown that ML system's flexibility and ensure its real-world
approaches may effectively recognize features linked to the effectiveness. The research article aims to improve HD
disease and build trustworthy prediction models. However, research and early diagnosis and prevention in high-
more work is needed to close these gaps in the body of prevalence countries like Egypt and Saudi Arabia by
current knowledge. Here are some gaps and how the addressing these gaps.
proposed approach fills them.
• HD prediction research has used one ML algorithm, A. COMPARATIVE STUDY OF HD PREDICTION
such as DT, LR, RF, or SVM. Each of these algorithms APPLYING ML CLASSIFIERS
has shown promise, but there is no comprehensive
ML is a powerful tool for predicting HD. It has the
comparison or assessment of ML approaches. This
potential to enhance patient outcomes by its ability to
restricts generalizability and makes it difficult to find the
facilitate early detection and personalized treatment. This
best HD predictor. The proposed study addresses this
section introduces a comparative analysis of heart disease
gap. It compares and evaluates 10 ML classifiers
prediction using ten ML classifiers, including Naive
including Naive Bayes, SVM, voting, XGBoost,
Bayes, SVM, voting, XGBoost, AdaBoost, bagging, KNN,
AdaBoost, bagging, DT, KNN, RF, and LR. Using
DT, RF, and LR (see Table 1).
performance measures like accuracy, sensitivity,
precision, specificity, F1-score, and AUC, the article
TABLE I
COMPARATIVE STUDY OF USING ML CLASSIFIERS TO PREDICT HEART DISEASES.

Year Authors Datasets used Algorithms used No. of Accuracy


(ML classifiers) classifiers obtained

2021 Liu et al. UCI heart disease LR, RF, KNN, SVM, Naive Bayes 5 93%
[24]

2020 Hussein et Cleveland heart LR, KNN, DT 4 84%


al. disease
[25]
2020 Akbar et al. Cleveland heart RF, SVM, Naive Bayes 3 87%
[26] disease

2019 Zarshenas et Cleveland heart XGBoost, DT, SVM, Naive Bayes 4 91%
al. disease
[27]
2019 Kaur and UCI heart disease AdaBoost, DT, KNN, RF, LR 5 97%
Singh
[28]

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

2018 Li et al. Cleveland heart Voting, Bagging, RF, SVM, Naive 5 90%
[29] disease Bayes
2018 Zhang et al. Cleveland heart AdaBoost, DT, RF, KNN, LR, 7 92%
[30] disease SVM, Naive Bayes
2017 Wu et al. Cleveland heart RF, SVM, Naive Bayes 3 87%
[31] disease

2016 Ahmed et al. Cleveland heart LR, KNN, DT 3 77%


[32] disease
2007 Chen et al. Cleveland heart LR, KNN, DT 3 85%
[33] disease
2024 Proposed Cleveland heart Naive Bayes, SVM, Voting,
technique disease, and private XGBoost, AdaBoost, Bagging, DT, 10 97.57%
datasets KNN, RF, LR

patients. A similar study by [17] used the KNN


technique for predicting the risk of CVD, and it achieved
The results indicated that ML classifiers could improve heart an accuracy of 85.76%.
disease prediction accurately, with the highest achieved
• Decision Tree: A DT is a classifier used for predicting
being 97% by [28] using AdaBoost, DT, RF, KNN, and LR
the risk of CVD. In the research by [18], the DT
on the UCI dataset. Several studies utilized the Cleveland
achieved an accuracy of 79.3% on a dataset of 4231
heart disease dataset (CHDD), with accuracies ranging from
patients. In another study by [19], the algorithm of the
77% [32] to 92% [30] using various ML algorithms such as
DT was used for predicting cardiac event risk possibility
AdaBoost, DT, RF, KNN, LR, SVM, and Naive Bayes.
with an accuracy of 85.75% on a dataset of 303 patients.
Hence, ML classifiers could improve the certainty of heart
• Bagging: It is a technique of ensemble learning that
disease forecasting, enabling early detection and
couples many models to improve classification
personalized treatment. Nonetheless, more investigation is
accuracy. A study done by [20] used the bagging
essential to validate these classifiers' accuracy using larger
algorithm to predict the risk of CVD. It obtained an
datasets, to increase the generalizability and reproducibility
accuracy of 89.9% on a dataset of 303 patients.
of the results.
• Adaptive Boosting (AdaBoost): It is an algorithm of
III. ML CLASSIFICATION TECHNIQUES FOR ensemble learning that couples many weak classifiers to
PREDICTION produce one strong classifier. A study by [16] used
AdaBoost for predicting CVD, and it achieved an
The classification techniques of ML have been widely used accuracy of 73.60% with a dataset of 303 patients.
for predicting CVD on various datasets. This section aims to • eXtreme Gradient Boosting (XGBoost): It is another
discuss the current and previous research on ML technique of ensemble learning that couples many
classification techniques for prediction and apply ten ML models to improve accuracy. A study done by [23] used
classifiers to extract essential features that enhance CVD the XGBoost algorithm for predicting the risk of heart
prediction. diseases. It achieved an accuracy of 87.50% on a dataset
• Logistic Regression: It is a famous technique used by of 303 patients.
ML for the classification of CVD prediction. In the • Voting: It is a technique of ensemble learning that
research conducted by [11], LR was used on a dataset of couples many models to produce the final decision for
735 patients, which achieved higher accuracy for CVD classification. A study by [21] used voting for predicting
prediction with 87.63%. A similar study conducted by CVD, and it achieved an accuracy of 92.20% with a
[12] used LR for CVD prediction on a dataset of 3980 dataset of 303 patients.
patients, and it achieved 70.44% accuracy. The study of • Support Vector Machine: SVM is a strong technique
[13] employed LR for predicting the risk of CAD in used for classification and regression. A study by [22]
females and obtained a sensitivity of 70%. used the SVM algorithm for predicting the risk of CAD
• Random Forest: RF is another popular technique for and obtained an accuracy of 85.7% on a dataset of 445
classification using ML. In the research by [14], RF patients.
achieved an accuracy of 76.90% for predicting CVD in • Naive Bayes: It is a probabilistic algorithm used for
a dataset of 847 patients. Using leave-one-out cross- classification. A study by [13] employed Naive Bayes
validation, research by [15] demonstrated an algorithm for predicting the risk of CAD in females and obtained
that could detect the early or unusual phases of an accuracy of 50%.
cardiovascular autonomic neuropathy (CAN) with an
AUC score of 0.931. ML classification techniques have been widely used for
• K-Nearest Neighbor: KNN is another algorithm that predicting CVD. The ten classifiers discussed in this section
predicts CVD. In the research conducted by [16], KNN have shown promising results in detecting the risk of CVD.
achieved an accuracy of 80.40% on a dataset of 303

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

LR, RF, and KNN algorithms have shown high accuracy in are investigated in this study. These algorithms can aid
classifying the risk of CVD. Ensemble learning techniques, doctors and data analysts in making correct diagnoses of
such as bagging, AdaBoost, and voting, have improved the cardiac disease. Recent data on cardiovascular illness as well
classification accuracy compared to single classifiers. The as journals, recent research, and published publications are
accuracy of CVD risk prediction can be enhanced by all part of this article. A framework for the suggested model
employing several ML classifiers. Further research can be is provided by the methodology as in [1]. The methodology
conducted in this area to enhance the forecast and diagnosis is a set of steps that transforms raw data into consumable and
of CVD. identifiable data patterns. The proposed approach consists of
three stages: the first stage is data collection; the second stage
IV. THE PROPOSED HEART DISEASE PREDICTION APP extracts specific feature values; and the third stage is data
In this section, we explain the approach used and the ML exploration, as shown in Figure 1. Depending on the
algorithms applied in implementing the proposed ML app for procedures employed, data preprocessing deals with the
the prediction of cardiac illnesses. Figure 1 shows the missing values, cleansing of the data, and normalization [2].
proposed system's sequences for predicting heart diseases. The data that underwent pre-processing were then classified
To begin with, the dataset was required to be gathered and using the ten classifiers (A1, A2, …, A10). Finally, after
preprocessed so that any necessary inconsistencies could be putting the suggested model into practice, we evaluated its
removed from it (e.g., null occurrences needed to be replaced performance and accuracy using a range of performance
with average values). The dataset was divided into two measures. Using a variety of classifiers, a Reliable Prediction
distinct groups, which were referred to as the test dataset and System for Heart Disease (RPSHD) was developed in this
the training dataset, respectively. Following that, several model. This model uses 13 medical factors for prediction,
distinct classification algorithms were put into action to among which are age, sex, cholesterol, blood pressure, and
identify the one that provided the highest level of accuracy electrocardiograph [3].
concerning these datasets.
B. DATASETS AND DATASET FEATURES
This research employs both the CHDD and a private dataset
for heart disease prediction. The CHDD dataset has 303
samples, while the private dataset has 200, and they have the
same features. The combined dataset contains 503 records,
and 13 features are associated with each one (including
demographic, clinical, and laboratory parameters). The
datasets have many features that can be used for heart disease
prediction including age, gender, blood pressure, cholesterol
levels, electrocardiogram readings-ECG, chest pain,
exercise-induced angina, blood sugar with fasting condition,
max heart rate achieved, oldpeak, coronary artery,
thalassemia, and other clinical and laboratory
measurements, as shown in Table 2. The outcome variable
FIGURE 1. The proposed approach sequences for heart disease known as "Target" takes a binary value and refers to the heart
prediction. disease predicting feature (i.e., it indicates whether or not
cardiac disease is present).
A. THE PROPOSED METHODOLOGY
Naive Bayes, SVM, voting, XGBoost, AdaBoost, bagging,
DT, KNN, RF, and LR classifiers are the ML techniques that
TABLE II
THE USED FEATURES FROM THE CHDD.
Feature Feature Feature
no. name code Description Values type
1 Age AGE Age of patient Number of years
2 Gender GEN Patient sex Female = 0, male = 1
3 Chol CHOL Evaluation of a patient's mg/dl
cholesterol levels
4 Trestbps BRP Blood resting pressure Mm

5 CP CPT Chest pain types Typical angina = 1, atypical angina= 2,


nonanginal pain = 3, asymptomatic = 4
6 Fbs FBS Blood sugar in fasting case < or > 120 mg/dl
(true = 1, false = 0)
7 Thalach MHR Maximum rate achieved on heart Continuous

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

8 RestEcg REC Electrocardiograph by resting 0 = no abnormalities, 1 = normal, 2 = left


ventricular hypertrophy (possible or certain)
9 Oldpeak OP ST depression when compared to Continuous
rest taken quantity
10 Exang EIA Angina caused by exercise 1 = there is pain, 0 = there is no pain
11 Ca CMV Count of main vessels colored by 0-3
fluoroscopy
12 Slope PES Peak exercise ST segment slope Up sloping = 0, flat = 1, down = 2
13 Thal TS Thallium stress Negative = 0, positive = 1, inconclusive =
2

14 target variable representing 0 = no heart disease (< 50% diameter


Target diagnosis of heart disease using narrowing)
the angiographic disease status. 1 = heart disease (> 50% diameter
narrowing)

Figure 2 shows the percentage distribution of individuals


with heart disease in the combined datasets. A total of 503
samples have been gathered, and 45.9% of those have been
diagnosed with HD, while the remaining 54.1% of
individuals have not been infected with the disease.
Boxplots are an effective visualization technique for
understanding the distribution of data and identifying
potential outliers. By applying boxplots to a dataset related
to HD, one can get insights into the distribution of a variety
of HD-related features or variables. The HD dataset's
boxplots are illustrated in Figure 3. Boxplots are used to
illustrate the distribution of scores for HD detection in this
figure. Every graph we obtained had an anomaly. Removing
them will cause the median of the data to drop, which might
make it harder to detect HD accurately. On the other hand, FIGURE 2. The percentage distribution of heart disease in the
Combined dataset.
this method offers more benefits than the others; by
identifying heart disease infection at an early stage, when
medical care is most beneficial, this diagnostic could
preserve lives.

FIGURE 3. Boxplots of the combined heart disease dataset.

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

C. DATASETS PREPARATION information (MI) were applied. These strategies are


In this research, preprocessing was performed on collected explained in Table 3 and are indicated by the
data. The CHDD has four inaccurate CMV records and two acronyms FSM1, FSM2, and FSM3, respectively.
erroneous TS entries. Incorrect data is updated to reflect the Finally, the performance of several algorithms was
best possible values for all fields. Then, StandardScaler is compared for the identified features. The validity of
employed to normalize all the features to the relevant the analysis was demonstrated using accuracy,
coefficient, ensuring each feature has a zero mean and one specificity, precision, sensitivity, and F1 score. The
variance. By considering the patient’s history of cardiac StandardScaler method was used to standardize
problems and following other medical concerns, an
every feature before it passed into the algorithms.
organized and composed augmented dataset was chosen.
The dataset studied in this research is a combination of TABLE III
accessible public WBCD and chosen private datasets. THE METHODS USED TO SELECT FEATURES.
Partitioning the two datasets in this way allows us to use the Univariate
holdout validation method. In this study, 25% of the data is selection Code and Description Formula used
algorithm
in the test dataset, compared to 75% in the training dataset. FSM1:
The mutual information method is used in this research to ANOVA F The ANOVA test is a method F=
2
measure the interdependence of variables. Larger numbers value of enhancing classification ∑𝑖𝑗=1 𝑁𝑗 (𝑥𝑗 −𝑥 ) / (𝑗− 1)
accuracy through the 𝑆2
indicate greater dependency and information gathering. The (∑𝑖𝑗=1((𝑁𝑗 −1 ) 𝑗/ (𝑁− 1)))
reduction of high-dimensional
importance of features provides valuable insights into the data, the identification of
relevance and predictive power of each feature in a dataset. relevant features using feature
Using this reciprocal information technique, the thalach space, and the measurement of
feature is given the highest value of 13.65%, while the fbs similarity between features.
FSM2:
feature is given the lowest importance of 1.91%, as Chi-square To determine which of several (𝑜𝑗 − 𝑒𝑗 )
illustrated in Figure 4. 𝑋2 = ∑
nonnegative features is most 𝑒𝑖
valuable, a chi-squared score
must be computed. It
represents the difference
between the observed and
expected values.
FSM3: 𝑌
I (X;Y) = H(Y) – H ( )
Mutual Mutual information is a 𝑋

information measurement of the


relationship between features.

E. THE OUTCOME OF DIFFERENT FEATURE


SELECTION METHODS
The F value for each pair of features is determined by
using the ANOVA F value technique and the feature
weights. Table 4(a) presents the findings of the ANOVA
F test. The EIA, CPT, and OP features provide the most
importance to the score, while the RES, CM, and FBS
FIGURE 4. The importance of the heart disease dataset features. features contribute the least. Chi-square is another
approach that determines the degree to which every feature
D. FEATURE SELECTION relates to the target. Table 4(b) shows the chi-square
In this research, we perform feature selection and outcomes. In this method, the first three features that are
classification using the Scikit-learn module of the most significant are MHR, OP, and CMV, whereas TS,
Python [20]. Initially, the processed dataset was REC, and FBS, respectively, are the least important ones.
The MI technique is utilized in FSM3. To evaluate the
analyzed using several different ML classifiers,
degree of mutual dependency between features, this
including RF, LR, KNN, bagging, DT, AdaBoost, approach calculates the mutual information between them.
XGBoost, SVM, voting, and Naive Bayes, which A score of 0 indicates complete independence between the
were evaluated for their overall accuracy. In the two features under consideration; a larger number
second step, we used the Seaborn libraries from indicates a greater dependence. The MI score results are
Python to create heat maps of correlation matrices shown in Table 4(c). CPT, TS, and CMV are the three
and other visualizations of correlations between features that are most dependent on each other in this case,
different sets of data. Thirdly, a wide variety of whereas FBS and REC are the features that are
feature selection methods (FSM) such as analysis of independent of each other. Table 4 illustrates important
variance (ANOVA), chi-square, and mutual factors that can be utilized for predicting the probability of

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

having heart disease. Furthermore, REC, FBS, RBP, and of the three different sets of features, respectively. Table 5
CM all have lower total scores across all three FSMs. shows these feature sets that were selected for additional
Because of all these features, three distinct groups are investigation.
chosen to be included depending on their score. SF-1, SF-
2, and SF-3 were the abbreviations that were given to each

TABLE IV
FEATURE SCORE USING FSM1, FSM2, AND FSM3.
Feature Feature (a) Score using (b) Score using (c) Score using
No Code FSM1 FSM2 FSM3
Score Order Score Order Score Order
1 AGE 17.12 9 24.29 7 1.01 11
2 GEN 26.79 8 8.58 10 1.05 9
3 CHOL 3.20 12 24.94 6 1.08 7
4 BRP 7.46 10 15.82 8 1.03 10
5 CPT 70.77 2 63.60 4 1.17 1
6 FBS 1.24 13 1.20 13 1.00 12
7 MHR 66.12 4 189.32 1 1.10 5
8 REC 6.78 11 3.98 12 1.00 13
9 OP 69.55 3 73.64 2 1.09 6
10 EIA 71.95 1 39.91 5 1.10 4
11 CMV 65.05 5 71.89 3 1.11 3
12 PES 41.90 6 10.80 9 1.08 8
13 TS 32.80 7 6.90 11 1.14 2

shown in Figure 5. Both the web-based app and the mobile


TABLE V app, which constitute the proposed app, have been deployed
THREE DISTINCT FEATURE GROUPS (SF-1, SF-2, AND SF-3).
[26].
Feature groups Selected features
SF-1 AGE, GEN, CHOL, BRP, CPT, FBS, MHR,
REC, OP, EIA, CMV, PES, TS
SF-2 AGE, GEN, CHOL, CPT, MHR, OP, EIA,
CMV, PES, TS
SF-3 AGE, GEN, CPT, MHR, OP, EIA, CMV, PES,
TS

F. PROPOSED APP DEPLOYMENT


The proposed technique was integrated into a mobile app
framework using ML algorithms and HD symptoms to
predict HD instantaneously on real data. We have
implemented the proposed application using J2ME, PHP,
HTML, MySQL, CSS, XML, and Android Studio.
The XGBoost classifier with SMOTE using the combined
datasets and SF-2 feature subset was chosen based on the
research's assessment of performance criteria (see Table 6).
Various integrated development environments (IDEs) have
FIGURE 5. The ML-based HD prediction app design process.
been used to deploy the model, including Spyder and Python
IDEs. In addition, we implemented an Android app to V. EXPERIMENTAL RESULTS AND ANALYSIS
demonstrate the prediction system's capabilities in real time Predicting heart diseases from a dataset is done using Jupyter
and evaluate its functionality. Android Studio was used for Notebook. It simplifies the visualization of different data
developing the user interface of this application. The Java relation graphs of the dataset and facilitates the creation of
programming language was our primary language for documents including live coding. In the first step of this
coding. To implement the model, we added the Pickle research, the CHDD is cleaned with Python's Pandas and
package to Android Studio. Finally, we used Heroku to host NumPy libraries. After that, the dataset is preprocessed with
the API for the proposed application. The process framework the StandardScaler method from Python's Scikit-learn
diagram for the proposed app to predict HD using ML is

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

module [34]. In the second step of the process, each feature's 1 and 1 indicate no association. It is essential to keep in mind
importance is calculated using a feature selection approach, that the only thing that can be detected via the use of
and then three sets of features (SF) are generated. Thirdly, correlation is the linear link that exists between the variables.
the dataset was separated into training and testing sets. A The prediction for the patient is correlated with each of those
total of 75% of the data is utilized for training, while the variables at a level of at least 70% correlation.
other 25% is utilized for testing. Finally, ten distinct ML
algorithms were trained using this 75% of test data. For the
aim of predicting heart disease, the method with the best
performance was selected [35].

A. PERFORMANCE EVALUATION
In this subsection, the authors evaluate and explain the
proposed system's performance. Different algorithms and
their comparative performances were presented based on
evaluation metrics including accuracy, sensitivity,
specificity, and F1-score. These performance measures
were evaluated using true positive (TP), true negative
(TN), false positive (FP), and false negative (FN) data. The
next subsection focuses on these measurements.
Following this evaluation, the algorithm with the greatest FIGURE 7. Correlation between features of SF-2 using SMOTE.
results is provided. Figure 6 demonstrates how the
confusion matrix may be used to evaluate a classification
model's performance.

FIGURE 6. Confusion matrix of the HD dataset using XGBoost and


SMOTE FIGURE 8. Scatter plot among four selected features in the SF-2.

Figure 6 illustrates the predicted values of TP, FP, TN, and FN Figures 8 and 9 show the scatter and density plots among
for the XGBoost classifier using SMOTE. Each element in four selected features in the SF-2 dataset. These scatter and
this confusion matrix represents the number of cases for both density graphs are beneficial for exploring the relationships
the actual classes and the predicted classes that have a and distributions of variables in the HD dataset. They can
particular set of labels. As an illustration, the matrix has a provide insights into correlation, concentration, outliers, and
total of 63 cases (TP) of heart disease classifications, 3 cases patterns that may exist among the four variables (exang, cp,
(FP) of diagnosis classified as "heart disease", 4 cases (FN) ca, and thal).
of diagnosis classed as "no heart disease", and 66 cases (TN)
of distinct "heart disease" classifications.
Figure 7 presents the correlation between the important
features of SF-2 using SMOTE. The y-axis values include
thalach, chol, sex, age, slope, exang, oldpeak, ca, cp, and
thal. Positive or negative correlation coefficients show a
significant relationship between the two variables, whereas -

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

The performance of a classifier has been represented and


evaluated with the use of a confusion matrix, as shown in
Figure 6. TP measures how many individuals are accurately
classified into the sick positive class. The percentage of
healthy people who are appropriately labeled as being in the
negative class is known as TN. The number of times that
healthy persons were incorrectly diagnosed as being sick is
referred to as the FP. When the number of healthy persons is
mistakenly predicted, this is known as FN. A comparison of
the various performance indicators across 10 ML algorithms
is presented in Table 6. These AI classifiers were applied to
the combined dataset that contained SF-1, SF-2, and SF-3
feature subsets. Based on its accuracy of 97.75%, sensitivity
of 96.61%, specificity of 90.48%, precision of 95.00%, and
F1 score of 92.68% for the SF-2 feature group (see Table 6),
FIGURE 9. Density plot among the first four important features in the the XGBoost classifier had the best overall performance.
SF-2.

• Accuracy: The proposed model's accuracy was B. EXPERIMENTAL EVALUATION OF SYSTEM


developed to determine what percentage of samples has PERFORMANCE
been accurately classified. Accuracy is computed using
The accuracy of every technique is displayed in Table 6,
the formula given in (Eq. 1), which is based on the
along with the processed dataset that was analyzed using
confusion matrices:
TP +TN those algorithms. In terms of the accuracy of each
Accuracy = (1) technique, A4's accuracy calculation for SF-2 was the
TP +TN +FP +FN
• Sensitivity (or recall): Sensitivity measures the rate of greatest accurate (97.57%), followed by its accuracy
truly positive results and implies that all values should calculations for SF-1 and SF-3 (93.17% and 94.19%),
be evaluated positively. Additionally, sensitivity is respectively. A9 computed an accuracy of 93.07% over all
calculated as "the proportion of correctly detected three SFs, placing it in second place. On the other hand,
positive samples". Sensitivity is determined by the A5 determined that SF-1 and SF-3 had a low accuracy of
following formula: 85.15% among all classifiers. A3 and A10 likewise
Sensitivity = P
T
(2) provided a low level of accuracy for SF-2 and SF-3,
TP +FN coming in at 86.14% and 86.12%, respectively. The other
• Specificity: It predicts that all values will be negative methods have an accuracy between 87.13% and 90.00%.
and is determined by calculating the fraction of real Furthermore, this finding shows that the XGBoost
negative situations. Specificity is determined algorithm method using the SF-2 is the most effective for
mathematically by processing the dataset. Figure 10 shows all the various
Specificity = 1 − ( P )
F
(3) accuracy rates that may be achieved for the ten ML
FP +TN
techniques using all of SF-2.
• Precision: It determines classifier accuracy and may be
calculated from the information given. This is presented In this study, all the algorithms' sensitivities were
by comparing real TP versus predicted TP. The formula evaluated. Table 6 displays the sensitivity scores obtained
in (Eq. 4) shows how the accuracy measure verifies the from the ten ML techniques using SF-1, SF-2, and SF-3,
proposed method's behavior: respectively. A5's sensitivity to SF-3 was the lowest
Precision = P
T
(4) (88.14%). A8 rated both SF-1 and SF-3 (89.83% and
TP +FP 89.83%, respectively). A4 (XGBoost) reported the highest
• F-measure: It is a statistical measure that is employed in sensitivity for SF-2 as well, at 96.61%; A2, A3, A4, A6,
the process of evaluating the efficacy of a classification and A9 reported the second-highest sensitivity, at 94.92%.
model. It does this by determining the harmonic mean
of the accuracy and recall measurements, giving each of The analysis of specificity was performed on each of these
these metrics an equal amount of weight. It enables the techniques, and the results are summarized in Table 6. A3
performance of a model to be described and compared scored the lowest (73.81%) for SF-2 and FS-3. A4 and A9
using a single score that takes into consideration both scored the highest (90.48%) for all SFs, based on the
the recall and precision of the model's predictions and is results of the analysis. When compared to the results of the
calculated using the following formula: other techniques, A7 for SF-3 (92.86%) provided the best
score with SF-3 only.
2∗ (precision∗ recall )
F − measure = (5)
precision+ recall

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

TABLE VI
ACCURACY, SENSITIVITY, AND SPECIFICITY OF ML TECHNIQUES USING SF-1, SF-2, AND SF-2.
Algo. Accuracy % Sensitivity % Specificity % F1 score
code ML algorithms
SF-1 SF-2 SF-3 SF-1 SF-2 SF-3 SF-1 SF-2 SF-3 SF-1 SF-2 SF-3
A1 Naive Bayes 87.13 87.13 87.13 91.53 91.53 91.53 80.95 80.95 80.95 83.95 83.95 83.95
A2 SVM 89.11 89.11 88.12 94.92 94.92 93.22 80.95 80.95 80.95 86.08 86.08 85.00
A3 Voting 88.12 86.14 86.14 94.92 94.92 94.92 78.57 73.81 73.81 84.62 81.58 81.58
A4 XGBoost 93.17 97.57 93.19 94.92 96.61 94.92 90.48 90.48 90.48 91.57 92.68 91.57
A5 AdaBoost 86.14 85.15 85.15 91.53 91.53 88.14 78.57 76.19 80.95 82.50 81.01 81.93
A6 Bagging 89.11 92.08 91.09 94.92 94.92 93.22 80.95 88.10 88.10 86.08 90.24 89.16
A7 Decision Tree 89.11 87.13 93.07 93.22 91.53 93.22 83.33 80.95 92.86 86.42 8395 91.76

A8 KNN 86.14 87.13 88.12 89.83 93.22 89.83 80.95 78.57 85.71 82.93 8354 85.71
A9 Random Forest 93.07 93.07 93.07 94.92 94.92 94.92 90.48 90.48 90.48 91.57 91.57 91.57
A10 Logistic 86.14 86.14 88.12 93.22 93.22 93.22 76.19 76.19 80.95 82.05 82.05 85.00
Regression

used SMOTE: these were accuracy, sensitivity, precision,


specificity, and F1 score. The outcomes of the experiment
showed that algorithm A4 obtained the best rate of
accuracy (97.57%) for SF-2, and the accuracy rate
achieved by A9 was second best (93.07%) across all three
SFs presented in Table 6. A4 likewise obtained the greatest
score possible for sensitivity (96.61%), as well as the best
score possible for specificity (90.48%), while testing for
SF-2, as shown in Table 6. The result of the F1 score
demonstrated that A4 had the highest score of 92.68% for
SF-2 (see Table 6), while A9 obtained the highest score of
91.57% for SF-1, SF-2, and SF-3, and A6 obtained the
highest score of 90.24% for SF-2. Because A4 has the best
FIGURE 10. The accuracy results of the ten ML Algorithms. performance when employed with SF-2, this method is the
most reliable technique in terms of accuracy, specificity,
C. DISCUSSION and sensitivity. In terms of F1 score, A9 is the more
In this study, a variety of ML techniques were accurate predictive model for all SFs, which places it as
implemented for the early recognition of CVD, and a the second-best predictive algorithm overall. As a result of
combined dataset (CHDD and private datasets) was this research, we have concluded that provides the highest
employed for both testing and training purposes. The ML performance rate. As a consequence of this, it is
model was then tested and trained on the source and target permissible to conclude that XGBoost is an effective
datasets using a domain adaptation approach. The method for predicting heart diseases. When combining the
proposed HD prediction technique used in this study was results of multiple different ML algorithms, an accuracy
first trained using a private dataset with 200 cases. After range of 85.15 to 97.57% was achieved in the vast majority
that, the system was evaluated using the combined dataset of cases. Finally, the proposed system is a mobile app
with 503 cases. To be more specific, we employed a total employing RF ML.
of ten well-known ML algorithms including Naive Bayes, Figure 8 shows the mobile app's quick heart disease
SVM, voting, XGBoost, AdaBoost, bagging, DT, KNN, detection using real data and the highest successful
RF, and LR, denoted by (A1, A2, …, A10), each with a classification. Figure 11 illustrates the XGBoost
unique set of selected features. The values of the ANOVA classifier's receiver operating characteristic (ROC) curve
F statistic, the chi-square test, and the MI statistic were the with SF-2, which demonstrates the model's performance
statistical methods utilized in the classification of relevant across all classification thresholds, with an AUC of 0.98
aspects that were more useful for CVD prediction. Five (see Table 7).
different evaluation standards were used to compare and
rate the performance of the different ML techniques that

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

TABLE VII
PERFORMANCE OUTCOMES FOR XGBOOST CLASSIFIER USING SF-2 AND SMOTE.

Algorithm Accuracy Sensitivity Specificity Precision F1-score AUC

XGBoost 97.57% 96.61% 90.48% 95.00% 92.68% 0.98

In our study, after identifying the best performing model


(XGBoost with the SF-2 feature subset), we applied SHAP
to interpret the results. The SHAP analysis revealed which
features were most important and how they affected the
predictions. This information is not only valuable for model
validation but also for offering actionable insights to
healthcare professionals. Figure 12 shows an explainable AI
interpretation of the importance of the features using the
SHAP library and XGBoost classifier.

FIGURE 11. AUC and ROC curve for the XGBoost classifier using
SMOTE.

We have integrated the SHAP methodology into the


proposed work to provide a unified measure and to
interpret the contribution of each feature to the model’s
predictions, thus enhancing the transparency and
interpretability of our machine learning approach. Here’s
how SHAP enhances our model:
1. Feature Contribution Explanation: SHAP values help
us understand the impact of each individual feature on FIGURE 12. Explainable AI interpretation of the XGBoost feature
importance.
the model’s predictions. For instance, in the case of
predicting heart disease, SHAP can elucidate how
The last step was to implement the system into a mobile app
features like cholesterol levels, blood pressure, and age
by employing an XGBoost with SMOTE. As shown in
contribute to the final prediction, allowing clinicians to
Figure 13, the most effective classification was used in the
see which factors are most influential.
development of the mobile app that provides an immediate
2. Individual Prediction Analysis: We can analyze
and accurate diagnosis of HD using real data.
individual predictions using SHAP values to
Using ML classifiers for HD prediction is the goal of this
understand the reasoning behind a specific patient's
work. The experiment findings proved that the XGBoost
prediction. This is particularly important in clinical
algorithm was the most accurate percentage for predicting
settings, where understanding the rationale behind a
the occurrence of HD. The following features are classified
prediction can guide further medical investigation or
as important for HD prediction according to the mutual
treatment.
information-based feature selection approach: thalach, chol,
3. Global Model Insights: SHAP not only provides local
oldpeak, age, trestbps, ca, thal, cp, exang, slope, restecg, sex,
interpretability (individual predictions) but also offers
and fbs. We have used the SMOTE method to optimize
global insights into the model’s behavior across the
hyperparameters and oversample using the data that was
entire dataset. This helps to identify which features are
collected. The XGBoost technique with SMOTE produced
generally most important and how they interact with
the best results. The study reached its goal of predicting HD,
each other.
with the combined datasets, and the experimental results
4. Trust and Adoption in Clinical Practice: To implement
were 97.57% for accuracy, 96.61% for sensitivity, 90.48%
AI models in healthcare settings, the clarity offered by
for specificity, 95.00% for precision, 92.68% for F1 score,
SHAP values is essential. Understanding and
and 98% for AUC.
validating the model's decision-making process
increases clinicians' trust and use of machine learning
models.

VOLUME XX, 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

evaluate the ML algorithms selected.


4. Domain adaptation: The use of techniques for domain
adaptation demonstrated the adaptability of the proposed
system. The proposed technique may still have some
limitations when it comes to its ability to be applied to a
variety of different populations or environments. More
research is required to determine the technique's efficacy
in a range of populations with different lifestyles,
demographics, and healthcare systems. It is also
important to fully address any potential restrictions and
difficulties related to domain adaptation.
5. Mobile app acceptance and usability: One major
contribution to this study is the development of a mobile-
based app that allows users to input symptoms and get
real-time HD prediction. Engagement and adoption of the
mobile app by users are crucial to the success of the
proposed technique. Therefore, to guarantee application
performance in a real-world setting, future work must
assess important factors like user experience, privacy
concerns, and accessibility.

VII. CONCLUSIONS AND FUTURE WORK


During the process of this research study, we used a variety
of methods for selecting features, after which we used ten
FIGURE 13. A snapshot of the proposed mobile app that predicts heart different ML techniques with SMOTE to apply to the
disease immediately. features that had been selected after determining the most
important features that are very useful for predicting heart
VI. LIMITATIONS disease. Every algorithm generated a unique score based on
a different combination of features. Three methods were
Many limitations need to be acknowledged, even though the used to choose features: ANOVA, chi-square, and MI. These
proposed approach for predicting HD using a mobile app that methods were applied to three selected feature groups,
makes use of ML has shown encouraging results and has the namely, SF-1, SF-2, and SF-3, respectively. The best model
promise to be employed: and feature subset were determined using ten ML classifiers.
1. Dataset quality and availability: The performance and The classifiers used were Naive Bayes, SVM, voting,
reliability of ML models depend on the quality and XGBoost, AdaBoost, bagging, DT, KNN, RF, and LR. A
availability of testing and training datasets. We employed well-known open-access dataset and numerous cross-
Cleveland heart disease, and private databases in our validation processes were employed to evaluate the
study. There may be limitations in availability, suggested algorithms and measure the heart disease detection
representativeness, and data quality. This limitation could system's performance accuracy. When compared to all other
make it hard to apply the proposed approach to a broader algorithms, the performance of XGBoost was more
sample with a variety of additional sources. significant. The XGBoost classifier performed best with the
2. Imbalanced classes: The effectiveness of ML classifiers SF-2 feature subset, with 97.64% accuracy, 96.61%
might be affected by the presence of unbalanced classes, sensitivity, 90.48% specificity, 95.00% precision, a 92.68%
where one class is considerably more common than the F1 score, and a 98% AUC. The study demonstrated that the
other. The researchers in this study used SMOTE to proposed system is adaptable using a domain adaptation
address this issue. Although SMOTE aids in class approach. This work has made a significant contribution to
balance, it is not an optimal solution and may produce the field of ML-based HD prediction applications by
synthetic minority class samples. This restriction may introducing unique insights and techniques. These findings
cause predictions to be biased and less accurate when have the possibility of aiding in the diagnosis and prediction
applied to real-life situations. of HD in Egypt and Saudi Arabia. Finally, a smartphone app
3. Algorithm selection: To determine the optimal algorithm allows users to enter symptoms and predict heart disease
for predicting HD, the researchers used a variety of ML quickly and accurately. In conclusion, a mobile app predicts
techniques. Nonetheless, the selection of algorithms is heart disease using the best XGBoost technique. We
arbitrary and may affect the outcome. Other algorithms recommend gathering more private data from more patients
that were not considered in this study might be able to to generate more accurate findings, among other
achieve different trade-offs or greater accuracy. As a possibilities.
result, future research should carefully consider and

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

In clinical scenarios, the use of explainable models and literature. This validation process ensures that the
interpretable features is not only beneficial but mandatory. model's decisions are based on medically sound
Explainable Artificial Intelligence (XAI) methods have principles, which is critical for clinical acceptance.
shown to increase the performance of models by providing 2. Ethical and Legal Compliance:
transparency and fostering trust among clinicians and • Accountability: Explainable models facilitate
patients [36]. The adoption of XAI methods addresses accountability by providing a clear audit trail of the
several critical aspects: decision-making process. This is crucial for addressing
1. Ethical and Legal Issues: Explainability ensures that the ethical and legal concerns in healthcare and ensuring
decision-making process of AI models aligns with ethical that AI systems comply with regulatory standards.
standards and legal requirements, as highlighted in [37]. • Patient Trust: Understanding the prediction process
Transparent models aid in auditing and validating fosters trust in the AI system among patients and
decisions, which is critical in a highly regulated clinicians. This trust is critical for widespread adoption
healthcare sector. of AI in clinical settings.
2. Verification with Clinical Literature: The AI bases its 3. Improved Clinical Outcomes:
decisions on scientific reasoning, as its features and • Personalized Treatment Plans: XAI methods enable
models align with established clinical literature [38]. This the identification of key factors influencing individual
alignment with clinical knowledge enhances the patient predictions. This can lead to more personalized
reliability of the model's predictions. and effective treatment plans tailored to the specific
3. Acceptance and Trust: Transparent and understandable needs of each patient.
decision-making processes significantly enhance the • Early Intervention: By providing detailed explanations
acceptance and trust of AI systems in clinical practice for predictions, clinicians can identify early warning
[39]. Explainable models offer valuable insights into signs and intervene promptly, potentially improving
prediction processes, thereby facilitating their seamless patient outcomes.
integration into standard clinical workflows.
Our study concludes that incorporating explainable AI Future Directions: We plan to incorporate the following
methods such as SHAP into machine learning models for explanatory mechanisms into our future work to strengthen
heart disease prediction enhances their clarity, reliability, trust and confidence in our AI application for heart disease
and acceptability in clinical settings. prediction.
Future work will continue to focus on improving model 1. Integration of SHAP and LIME:
interpretability and aligning AI predictions with clinical • SHAP: We will continue to use SHAP to provide both
expertise to further advance the practical application of AI in global and local explanations of model predictions.
healthcare. We aim to further enhance the explainability of SHAP values will help us identify the most influential
our models by applying additional techniques such as LIME features and understand their impact on individual
(local interpretable model-agnostic explanations) and predictions.
engaging with clinical experts to ensure the interpretations • LIME: We will explore the use of LIME to generate
align with medical knowledge and practice. By incorporating interpretable models around each prediction. Locally,
these explainable AI methodologies, we aim to bridge the LIME approximates the black-box model with an
gap between complex ML models and their practical interpretable model, providing further insights into the
application in clinical settings, ultimately contributing to prediction process.
more transparent, reliable, and effective healthcare solutions. 2. Interactive Explanation Interfaces:
• We will develop interactive interfaces that allow
To increase the completeness of our study, it is essential to
clinicians to visualize and explore the explanations
emphasize the benefits of explainable AI methods and
provided by SHAP and LIME. These interfaces will
discuss how we plan to implement these mechanisms in our
enable users to drill down into specific predictions,
future work. Here are the key benefits, as well as our
compare feature contributions, and gain a deeper
proposed future directions: understanding of the model’s behavior.
Benefits of Explainable AI Methods:
3. Clinical Expert Collaboration:
1. Enhanced Transparency and Interpretability:
• We will collaborate with clinical experts to ensure that
• Clinical Decision Support: Explainable AI methods,
the explanations generated by our AI models align with
such as SHAP, provide clear insights into the
clinical practice and knowledge. Their feedback will be
contribution of each feature to the model’s predictions.
invaluable in refining our explanation mechanisms and
This transparency helps clinicians understand the ensuring their relevance and accuracy in a clinical
rationale behind AI-generated predictions, making the
context.
decision-support process more robust.
4. Continuous Model Monitoring and Improvement:
• Model Validation and Verification: Using XAI
• We will implement continuous monitoring of our AI
methods, healthcare professionals can verify the
models to track their performance and the relevance of
model's predictions against clinical knowledge and
the explanations over time. This will include regular

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

updates to the models and the explanation mechanisms Prediction. Journal of Medical Systems, 43(12), 345. doi:
10.1007/s10916-019-1524-8
based on new clinical data and feedback from
[12] Khandadash, N., Ababneh, E., & Al-Qudah, M. (2021). Predicting the
healthcare professionals. Risk of Coronary Artery Disease in Women Using Machine Learning
In conclusion, the integration of explainable AI methods into Techniques. Journal of Medical Systems, 45, 62. doi: 10.1007/s10916-
our heart disease prediction model will significantly enhance 021-01722-6
[13] Moon, S., Lee, W., & Hwang, J. (2019). Applying Machine Learning to
its transparency, reliability, and acceptance among clinicians
Predict Cardiovascular Diseases. Healthcare Informatics Research,
and patients. In our future work, we will prioritize the 25(2), 79-86. doi: 10.4258/hir.2019.25.2.79
development of robust explanation mechanisms to support [14] Lakshmi, M., & Ayeshamariyam, A. (2021). Machine Learning
clinical decision-making and improve patient outcomes. Techniques for Prediction of Cardiovascular Risk. International Journal
of Advanced Science and Technology, 30(3), 11913-11921. doi:
10.4399/97888255827001.
ACKNOWLEDGMENT [15] Md R. Hassan, Shamsul H., Mohammad M. H., Jemal A., Ahmed A.,
The authors extend their appreciation to the Deanship of Giancarlo F (2022). Early detection of cardiovascular autonomic
neuropathy: A multi-class classification model based on feature selection
Research and Graduate Studies at King Khalid University for
and deep learning feature fusion. Information Fusion, vol. 77, P 70-80.
funding this work through small group research under grant [16] Wongkoblap, A., Vadillo, M. A., & Curcin, V. (2018). Machine
number (RGP1/129/45). Learning Classifiers for Early Detection of Cardiovascular Disease.
Journal of Biomedical Informatics, 88, 44-51. doi:
10.1016/j.jbi.2018.09.003
CONFLICTS OF INTEREST [17] Delavar, M. R., Motwani, M., & Sarrafzadeh, M. (2015). A Comparative
The authors declare that they have no conflicts of interest to Study on Feature Selection and Classification Methods for
report regarding the present study. Cardiovascular Disease Diagnosis. Journal of Medical Systems, 39(9),
98. doi: 10.1007/s10916-015-0333-5
[18] Yong, K., Kim, S., Park, S. J., & Kim, J. (2017). A Clinical Decision
AVAILABILITY OF DATA AND MATERIALS Support System for Cardiovascular Disease Risk Prediction in Type 2
Diabetes Mellitus Patients using Decision Tree. Computers in Biology
The corresponding author will share study datasets upon and Medicine, 89, 413-421. doi: 10.1016/j.compbiomed.2017.08.024
reasonable request. [19] Mirza, Q. Z., Siddiqui, F. A., & Naqvi, S. R. (2020). The Risk Prediction
of Cardiac Events using a Decision Tree Algorithm. Pakistan Journal of
Medical Sciences, 36(2), 85-89. doi: 10.12669/pjms.36.2.1511
REFERENCES [20] Farag, A., Farag, A., & Sallam, A. (2016). Improving Heart Disease
Prediction using Boosting and Bagging Techniques. Proceedings of the
[1] World Health Organization. Cardiovascular Diseases (CVDs). Available
International Conference on Innovative Trends in Computer Engineering
online: https://www.afro.who.int/health-topics/cardiovascular-diseases
(ITCE), 90-96. doi: 10.1109/ITCE.2016.7473338
(accessed on 5 May 2023).
[21] Jhajhria, S., & Kumar, R. (2020). Predicting the Risk of Cardiovascular
[2] Alom, Z.; Azim, M.A.; Aung, Z.; Khushi, M.; Car, J.; Moni, M.A
Diseases using Ensemble Learning Approaches. Soft Computing, 24(7),
(2021). Early Stage Detection of Heart Failure Using Machine Learning
4691-4705. doi: 10.1007/s00500-019-04268-8
Techniques. In Proceedings of the International Conference on Big Data,
[22] Samadiani, N., Eftekhari Moghadam, A. M., & Motamed C. (2016).
IoT, and Machine Learning, Cox’s Bazar, Bangladesh, 23–25.
SVM-based Classification of Cardiovascular Diseases using Feature
[3] Gour, S.; Panwar, P.; Dwivedi, D.; Mali, C. A (2022). Machine Learning
Selection: A High-Dimensional Dataset Perspective. Journal of Medical
Approach for Heart Attack Prediction. In Intelligent Sustainable
Systems, 40(11), 244. doi: 10.1007/s10916-016-0573-7
Systems; Springer: Singapore, pp. 741–747.
[23] Zhang, X., Zhang, Y., Du, X., & Li, B. (2019). Application of XGBoost
[4] Gupta, C.; Saha, A.; Reddy, N.S.; Acharya, U.D (2022). Cardiac Disease
algorithm in clinical prediction of coronary heart disease. Chinese
Prediction using Supervised Machine Learning Techniques. In Journal of
Journal of Medical Instrumentation, 43(1), 12-15.
Physics: Conference Series; IOP Publishing: Bristol, UK, Volume 2161, p.
[24] Liu, Y., Li, X., & Ren, J. (2021). A comparative analysis of machine
012013.
learning algorithms for heart disease prediction. Computer Methods and
[5] Shameer, K., Smith, B. M., Kodysh, J., Yonker, M., Glicksberg, B. S.,
Programs in Biomedicine, 200, 105965.
Udell, J. A., & Dudley, J. T. (2021). Machine learning predictions of
[25] Hussein, N. S., Mustapha, A., & Othman, Z. A. (2020). Comparative
cardiovascular disease risk in a multi-ethnic population using electronic
study of machine learning techniques for heart disease diagnosis.
health record data. International Journal of Medical Informatics, 146,
Computer Science and Information Systems, 17(4), 773-785.
104335.
[26] Akbar, S., Tariq, R., & Basharat, A. (2020). Heart disease prediction
[6] Liu, M., Sun, X., Liu, Y., Yang, X., Xu, Y., & Sun, X. (2020). Deep
using different machine learning approaches: A critical review. Journal
learning-based prediction of coronary artery disease with CT
of Ambient Intelligence and Humanized Computing, 11(5), 1973-1984.
angiography. Japanese Journal of Radiology, 38(4), 366-374.
[27] Zarshenas, A., Ghanbarzadeh, M., & Khosravi, A. (2019). A
[7] Zakria, N., Raza, A., Liaquat, F., & Khawaja, S. G. (2017). Machine
comparative study of machine learning algorithms for predicting heart
learning based analysis of cardiovascular disease prediction. Journal of
disease. Artificial Intelligence in Medicine, 98, 44-54.
Medical Systems, 41(12), 207.
[28] Kaur, I., & Singh, G. (2019). Comparative analysis of machine learning
[8] Yang, M., Wang, X., Li, F., & Wu, J. (2016). A machine learning
algorithms for heart disease prediction. Journal of Biomedical
approach to identify risk factors for coronary heart disease: a big data
Informatics, 95, 103208.
analysis. Computer Methods and Programs in Biomedicine, 127, 262-
[29] Li, Y., Jia, W., & Li, J. (2018). Comparing different machine learning
270.
methods for predicting heart disease: A telemedicine case study. Health
[9] Ngufor, C., Hossain, A., Ali, S., & Alqudah, A. (2016). Machine
Information Science and Systems, 6, 7.
learning algorithms for heart disease prediction: a survey. International
[30] Zhang, X., Zhou, Y., & Xie, D. (2018). Heart disease diagnosis using
Journal of Computer Science and Information Security, 14(2), 7-29.
machine learning and expert system techniques: A survey paper. Journal
[10] Shoukat, A., Arshad, S., Ali, N., & Murtaza, G. (2020). Prediction of
of Medical Systems, 42(7), 129.
Cardiovascular Diseases using Machine Learning: A Systematic Review.
[31] Wu, J., Roy, J., Stewart, & W. F. (2017). A comparative study of
Journal of Medical Systems, 44(8), 162. doi: 10.1007/s10916-020-
machine learning methods for the prediction of heart disease. Journal of
01563-1
Healthcare Engineering, 2017, 7947461.
[11] Shankar, G. R., Chandrasekaran, K., & Babu, K. S. (2019). An Analysis
of the Potential Use of Machine Learning in Cardiovascular Disease

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3437181

[32] Ahmed, Z., Mohamed, K., & Zeeshan, S.(2016). Comparison of machine
learning algorithms for predicting the risk of heart disease: A systematic
review. Journal of Healthcare Engineering, 2016, 7058278.
[33] Chen, X., Hu, Z., & Cao, Y. (2007). Heart disease diagnosis using
decision tree and naïve Bayes classifiers. World Congress on Medical
Physics and Biomedical Engineering, 14, 1668-1671.
[34] F. Pedregosa, G. Varoquaux, A. Gramfort & et al. (2011). “Scikit-learn:
machine learning in python,” Journal of Machine Learning Research,
vol. 12, pp. 2825–2830.
[35] Hosam E., Samir A. E., Omar H. K., Yasser M. A., Islam A.T.F.T.
(2024). A Proposed Technique Using Machine Learning for the
Prediction of Diabetes Disease Through a Mobile App. International
Journal of Intelligent Systems, volume 2024.
[36] Vitor B., Manoela K., Pedro D., Leonardo M. & Marco A. P. (2021).
Improving deep learning performance by using Explainable Artificial
Intelligence (XAI) approaches. Discover Artificial Intelligence, Vol. 1,
No. 9.
[37] Bryce G., Seth F. (2017). European Union Regulations on Algorithmic
Decision Making and a “Right to Explanation. AI Magazine, Vol. 38,
PP 50-57, Issue 3.
[38] Militello, C., Prinzi, F., Sollami, G. et al. (2023). CT Radiomic Features
and Clinical Biomarkers for Predicting Coronary Artery
Disease. Cognitive Computation, Vol. 15, PP 238–253.
[39] Zachary C. L. (2018). The Mythos of Model Interpretability: In machine
learning, the concept of interpretability is both important and slippery.
ACM Queue, Vol. 16, Issue 3, PP 31 - 57.

Hosam F. El-Sofany received his


Ph.D. and M.Sc. degrees in computer
science. He is currently an Associate
Professor of CS at King Khalid
University, KSA (and Cairo Higher
Institute for Engineering, Computer
Science, and Management—Egypt).
His research interests include cloud
computing, e-learning, m-learning, u-
learning, fuzzy logic, cloud security,
cybersecurity, and chronic disease
prediction techniques using ML and DL
algorithms. He has published
approximately 80 research papers in
international refereed journals and conferences. He reviews many
international journals and conferences. He has also supervised many M.Sc.
and Ph.D. dissertations in computer science and information systems. At
KKU University, Hosam has served as chairman of the Graduate Studies
and Scientific Research Committee. Email: [email protected].

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like