0% found this document useful (0 votes)
54 views7 pages

AI Based: Disease Prediction System: A Practical, Responsible, and Deployable Approach

Uploaded by

ralawa428
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views7 pages

AI Based: Disease Prediction System: A Practical, Responsible, and Deployable Approach

Uploaded by

ralawa428
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AI‑Based Disease Prediction System: A Practical,

Responsible, and Deployable Approach


Authors: Aadarsh Chaturvedi [0108CS211001], Raj Alawa [0108CS211103], Laxmi [0108CS221063], Preeti
Dhurvey [0108CS221090]

Affiliation: Computer Science and Engineering , Samrat Ashok Technological Institute, Vidisha (M.P)

Contact: {Aadarsh26CS001, Raj26CS103, Laxmi26CS063, Preeti26CS090}@satiengg.in

Date: September 4, 2025

Abstract
Early and accurate prediction of disease risk can improve clinical decision‑making and reduce
healthcare costs. This paper presents the design, development, and evaluation of an AI‑based disease
prediction system targeting common non‑communicable diseases (NCDs), with a focus on diabetes,
cardiovascular disease (CVD), and breast cancer. We propose a modular pipeline that integrates
rigorous data curation, feature engineering, algorithm selection, calibration, and model interpretability.
We use publicly available datasets (e.g., Pima Indians Diabetes, Cleveland Heart Disease, and Wisconsin
Breast Cancer) to ensure reproducibility and then demonstrate how to adapt the workflow to local
hospital data under privacy safeguards. The final ensemble model achieves strong discrimination
(AUROC 0.88–0.95 across tasks), maintains well‑calibrated probabilities (Brier score ≤ 0.10 after isotonic
calibration), and supports clinician trust through SHAP‑based explanations and counterfactual what‑if
analysis. We discuss limitations, ethical considerations, and deployment details (Dockerized REST API +
on‑device risk calculator). A blueprint for regulatory‑aware validation and continuous monitoring is
provided.

Keywords: disease prediction, machine learning, risk stratification, calibration, interpretability, SHAP,
healthcare AI, MLOps

1. Introduction
Non‑communicable diseases account for the majority of global morbidity and mortality. Risk prediction
tools—traditionally based on logistic regression or scorecards—are increasingly complemented by
machine learning (ML) methods that can capture non‑linear relationships and interactions. However,
many academic models remain difficult to reproduce, poorly calibrated, or not deployable.

This work contributes: (i) an end‑to‑end, reproducible pipeline for tabular clinical data; (ii) a principled
model selection strategy prioritizing calibration and transparency alongside AUROC; (iii) an
interpretable ensemble that balances discrimination with explainability; and (iv) a deployment
architecture with privacy, security, and post‑deployment monitoring.

1
1.1 Problem Statement

Given structured patient data (demographics, vitals, lab tests, lifestyle), predict the probability of: (a)
type‑2 diabetes, (b) presence of heart disease, and (c) breast tumor malignancy. Outputs are calibrated
probabilities suitable for thresholding at different operating points (screening vs. confirmatory use).

1.2 Objectives

1. Build a modular training pipeline for multiple diseases.


2. Compare algorithms: Logistic Regression (LR), Support Vector Machines (SVM), Random Forests
(RF), Gradient Boosted Trees (XGBoost/LightGBM), and shallow neural networks (MLP).
3. Ensure probability calibration (Platt scaling, isotonic regression) and quantify uncertainty.
4. Provide global/local interpretability (feature importances, SHAP values) and what‑if analysis.
5. Package the best model(s) into a secure API with monitoring.

2. Literature Review (Concise)


Risk prediction has been widely studied for diabetes, CVD, and breast cancer using classical statistical
models and ML. Logistic regression remains a strong baseline due to interpretability and calibration.
Tree‑based ensembles (Random Forests, Gradient Boosting) often improve AUROC but can overfit
without proper cross‑validation and calibration. Model explainability (e.g., SHAP) and fairness audits are
now considered essential for clinical adoption. Data shift and leakage are recurrent pitfalls; robust
evaluation and MLOps are necessary for safe deployment.

3. Datasets
We used three widely cited, publicly available datasets for benchmarking and reproducibility. Replace or
augment with local EHR/lab data under appropriate approvals.

• Diabetes: Pima Indians Diabetes Dataset (768 samples, 8 features + label). Common issues:
zero‑as‑missing for some labs, class imbalance.
• Heart Disease: Cleveland Heart Disease Dataset (~303 samples, 13 features + label). Mixed
categorical/continuous predictors.
• Breast Cancer: Wisconsin Diagnostic Breast Cancer (WDBC; 569 samples, 30 features + label).
Low missingness, well‑separated classes.

Data Use & Ethics: All public datasets are anonymized for educational research. For hospital data, we
apply de‑identification, Data Protection Impact Assessment (DPIA), and obtain IRB/ethics approval prior
to access.

4. Methodology

4.1 Data Pre‑processing

• Data cleaning: Replace physiologically impossible values with NaN; treat zeros as missing for
specific labs (e.g., insulin, skin thickness) in Pima.

2
• Imputation: Median for continuous, most frequent for categorical; consider multivariate
imputation (MICE) for richer datasets.
• Scaling: Standardization for LR/SVM/MLP; tree models use raw scale.
• Outlier handling: Robust scalers or winsorization; maintain a log of transformations for
auditability.
• Class imbalance: Stratified splits; optionally SMOTE (applied only to training folds), and
class‑weighted loss.

4.2 Feature Engineering

• Domain‑informed ratios (e.g., BMI categories, cholesterol/HDL ratio), binning for monotonic LR,
interaction terms for LR, and polynomial features (degree ≤ 2) if justified.

4.3 Model Families

• Linear: Logistic Regression with L1/L2/elastic net.


• Kernel: SVM with RBF; probability outputs via Platt scaling.
• Ensembles: Random Forest; Gradient Boosting (XGBoost/LightGBM) with early stopping.
• Neural: MLP with 1–2 hidden layers for tabular data; strong regularization and early stopping.

4.4 Training and Validation

• Data partition: Train/validation/test = 60/20/20 with patient‑level stratification; nested 5×


stratified k‑fold for hyperparameter tuning.
• Leakage control: Fit all preprocessing within each training fold only; lock transformations in a
.
• Hyperparameter search: Bayesian or randomized search bounded by computational budget.

4.5 Calibration and Thresholding

• Assess calibration with reliability plots and Brier score; apply isotonic regression if needed.
Thresholds are chosen based on clinical utility curves or cost‑sensitive analysis (e.g., maximizing
F1 at fixed sensitivity ≥ 0.85 for screening).

4.6 Interpretability

• Global: permutation importance, gain‑based importance for trees.


• Local: SHAP values with summary and force plots; counterfactual what‑ifs to suggest minimal
actionable changes (e.g., HbA1c reduction thresholds).

4.7 Fairness & Robustness Checks

• Evaluate performance across subgroups (age bands, sex); report disparate impact, TPR/FPR
gaps. Conduct sensitivity analyses for missingness and noise injection (±5–10% perturbations).

5. Experimental Setup
• Environment: Python 3.11; scikit‑learn, XGBoost/LightGBM, shap; experiment tracking with
MLflow; seeds fixed for reproducibility.
• Hardware: <CPU/GPU details>.

3
• Metrics: AUROC, AUPRC, Accuracy, Precision, Recall, F1; Brier score; Expected Calibration Error
(ECE); confusion matrix at chosen threshold.

6. Results
Note: Replace illustrative numbers with your actual results from reruns on your
environment. Keep the reporting format.

6.1 Discrimination (Test Set AUROC)

Task LR SVM RF XGB/LGBM MLP

Diabetes 0.84 0.86 0.87 0.89 0.85

Heart Disease 0.87 0.88 0.88 0.91 0.86

Breast Cancer 0.97 0.98 0.98 0.99 0.98

6.2 Calibration and Operating Points

• Brier score improved from 0.14 to 0.09 after isotonic regression on diabetes; reliability curves
show reduced overconfidence.
• Screening operating point for diabetes: sensitivity 0.87, specificity 0.72 (threshold 0.32).
Confirmatory operating point: sensitivity 0.78, specificity 0.85 (threshold 0.55).

6.3 Explainability Highlights

• Top contributors for diabetes: BMI, Glucose, Age, Insulin.


• For CVD: Max heart rate, ST depression (oldpeak), exercise‑induced angina.
• For breast cancer: mean radius, texture, concavity/compactness features.

6.4 Ablations & Robustness

• Removing calibration reduced PPV at screening threshold by ~6%. SMOTE improved


minority‑class recall by ~4–7% with negligible AUROC change.

7. Discussion
Our findings align with prior literature: gradient boosted trees provide strong discrimination on tabular
clinical data, but require post‑hoc calibration for reliable probabilities. Interpretability methods (SHAP)
surfaced clinically plausible risk factors, supporting clinician trust. Subgroup analysis revealed mild
performance drift in older age bands for CVD, warranting targeted data augmentation and threshold
adjustments.

7.1 Limitations

• Public datasets are small; external validity is limited.


• Some features (e.g., ST depression) may be inconsistently measured across sites.

4
• Our pipeline focuses on structured/tabular inputs; imaging and text (EHR notes) are out of scope
but can be integrated.

7.2 Future Work

• Federated learning to train on multi‑institutional data without centralizing PHI.


• Semi‑supervised learning to leverage unlabeled records.
• Causal modeling to reduce spurious correlations and improve counterfactual validity.

8. System Design & Deployment (MLOps)

8.1 Architecture

• Offline: Data ingestion → Validation (Great Expectations) → Feature store (Feast) → Training
(MLflow tracking) → Model registry.
• Online: REST API (FastAPI) with JSON schema validation → AuthN/AuthZ ( JWT/OAuth) → Rate
limiting and audit logging.
• Packaging: Docker container; reproducible builds; GPU optional.

8.2 Security & Privacy

• De‑identification; encrypted storage (AES‑256 at rest, TLS in transit); role‑based access;


least‑privilege service accounts.
• PHI handling SOPs; consent management; data retention policies.

8.3 Monitoring & Maintenance

• Data/Concept drift detection (Kolmogorov–Smirnov, PSI); performance monitoring (weekly AUC/


F1/ECE); alerting thresholds.
• Human‑in‑the‑loop feedback from clinicians; periodic recalibration; governed rollback/
roll‑forward procedures.

8.4 User Interface (Clinician Dashboard)

• Risk score with confidence band; top 3 risk drivers (SHAP); what‑if sliders for modifiable factors;
links to clinical guidelines.

9. Ethical, Legal, and Social Implications (ELSI)


• Transparency: Document training data provenance and model cards.
• Fairness: Regular subgroup audits; mitigate via reweighting or threshold adjustments.
• Accountability: Keep an audit trail of predictions that influence care pathways.
• Regulatory Context: Treat as clinical decision support (CDS); not a diagnostic device. If moving
toward SaMD classification, follow IEC 62304, ISO 14971 (risk management), and maintain
post‑market surveillance.

5
10. Conclusion
We provide a reproducible, deployable, and interpretable AI‑based disease prediction system for key
NCDs. The pipeline balances performance with calibration and transparency, making it suitable for pilot
deployments under clinical supervision. Future extensions will integrate multi‑modal data and
federated training.

Acknowledgements
We thank <Mentor/Advisor Name>, the <Lab/Department>, and clinicians who provided feedback.

References (Examples — replace/add your actual sources)


[1] Dua, D. and Graff, C. UCI Machine Learning Repository. University of California, Irvine.

[2] Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization.

[3] Pedregosa, F., et al. (2011). Scikit‑learn: Machine Learning in Python. JMLR.

[4] Lundberg, S. M., & Lee, S.‑I. (2017). A Unified Approach to Interpreting Model Predictions (SHAP).
NIPS.

[5] Niculescu‑Mizil, A., & Caruana, R. (2005). Predicting Good Probabilities with Supervised Learning.
ICML.

[6] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?" Explaining the Predictions of
Any Classifier. KDD.

[7] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.

[8] World Health Organization. (2023). Noncommunicable diseases.

[9] Mitchell, R., Frank, E., et al. (2018). Imbalanced‑data techniques for machine learning in health.
(Survey/Review).

[10] FDA. (2022). Proposed Regulatory Framework for Modifications to AI/ML‑Based SaMD.

Appendix A: Reproducible Pipeline (Pseudocode)

6
models = {
'lr': LogisticRegression(penalty='l2', class_weight='balanced'),
'svm': SVC(kernel='rbf', probability=True,
class_weight='balanced'), 'rf':
RandomForestClassifier(n_estimators=500),
'xgb': XGBClassifier(tree_method='hist', early_stopping_rounds=50),
'mlp': MLPClassifier(hidden_layer_sizes=(64, 32), alpha=1e-4)
}

for name, model in models.items():


pipe = Pipeline([
('pre', ColumnTransformer(...)),
('clf', model)
])
tuned = RandomizedSearchCV(pipe, param_distributions=..., cv=5,
scoring='roc_auc')
tuned.fit(X_train, y_train)
prob_val = tuned.predict_proba(X_val)[:,1]
calibrated = CalibratedClassifierCV(tuned.best_estimator_,
method='isotonic', cv=5)
calibrated.fit(X_val, y_val)
save_model(calibrated,
registry)

best = select_by_auc_and_ece(registry)
evaluate(best, X_test, y_test)
explain_with_shap(best, X_test)

Appendix B: Risk Communication Templates


• Screening: “Patient risk for diabetes within 3 years: 23% (95% CI 18–28%). Primary drivers: BMI,
fasting glucose, age. Consider HbA1c and lifestyle counseling.”
• Triage: “High‑risk CVD: 71% probability at ED intake. Recommend ECG and cardiology consult.”

Appendix C: Sample Tables & Figures (placeholders)


• Table 1: Dataset statistics (n, missingness, class balance).
• Figure 1: ROC curves per task (test set).
• Figure 2: Reliability curves before/after calibration.
• Figure 3: SHAP summary plot (top 10 features).

Appendix D: Compliance Checklist


• IRB approval ID: <ID>
• Data sharing agreement signed: <Yes/No>
• DPIA done: <Yes/No>
• Model card completed: <Link/ID>
• MLOps monitoring configured: <Yes/No>

You might also like