0% found this document useful (0 votes)
14 views1 page

Proposedsytem

The document outlines a machine learning framework designed to predict and detect early-stage strokes using various classifiers such as Random Forest and XGBoost, along with evaluation metrics like precision and recall. It emphasizes the importance of avoiding data leakage during model training, particularly in healthcare, to ensure accurate predictions. The model is built on a reliable dataset from Kaggle, which includes vital health measures and addresses challenges like data imbalance through techniques like SMOTE.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views1 page

Proposedsytem

The document outlines a machine learning framework designed to predict and detect early-stage strokes using various classifiers such as Random Forest and XGBoost, along with evaluation metrics like precision and recall. It emphasizes the importance of avoiding data leakage during model training, particularly in healthcare, to ensure accurate predictions. The model is built on a reliable dataset from Kaggle, which includes vital health measures and addresses challenges like data imbalance through techniques like SMOTE.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

The proposed system is a machine learning based framework which aims to predict and detect the

early stage of stroke . We included various machine learning classifiers and a range of evaluation
metrics to build the model. The classifiers are Random forest , Smote , XGBoost , LightGBM, etc ekda
u write idk and the metrics are precision, accuracy, recall which are used for analysing the
performance of the model. The data leakage is the major concern for the prediction models so we
carefully preprocessed the data after the division into different parts to avoid the data leakage . Our
system aims to predict the early chances of stroke using vital measures such as hypertension, glucose
levels, smoking habits, working profession which are collected from a reliable dataset from Kaggle

The proposed system utilizes the Machine Learning Technology to develop a Stroke
prediction system for predicting the likelihood of stroke occurrence to a person. It aims to
address the issue of early detection of stroke, whereas the early detection of stroke majorly
helps to decrease the chance of severity effect of stroke on a person. The System is composed
of various integrated components and models that work together to facilitate the accuracy of
prediction. Machine Learning Technology is capable of handling a massive amount of data
such that it is being introduced in to the health industry where the data is in large amounts
and sensitive in nature. In the system We included various machine learning claassifiers and a
wide range of evaluation metrics. The classifiers are random forest, Smote, XGBoost,
Lightgbm and the metrics are prediction,recall , F1 score which are used for the analysing
performance of the model.

The Data leakage is the major concern in machine learning, especially in healthcare
applications. Data leak causes higher performance than actual when model is evaluated,
causing increased accuracy. Nowhere is this more dangerous than in healthcare, where
inaccurate predictions may have direct impact on patient safety. A model afflicted by data
leakage might not work when implemented in practice, falsely reassuring doctors or
overlooking patients that are most at risk. so we carefully took care of the division of data and
the preprocessing step to avoid the data leakage.

We have collected the data from the Kaggle for a reliable dataset for the model. The dataset
contains the 12 columns and 15300 rows of data, which consists important features such as
age, gender, hypertension, heart disease status, marital status, smoking status, type of work,
average glucose level, body mass index(BMI). Early challenges observed in this dataset was
imbalance i.e., stroke cases were fewer compared to non-stroke cases. we used
SMOTE(Synthetic Minority Over-sampling Technique) on the training dataset, which not
only helped the models detect patterns in minority cases more accurately but also improved
performance metrics like recall without simply duplicating existing data.

You might also like