Personalized Medicine Recommendation System
Personalized Medicine Recommendation System
by
Akshit Modi
Imon Banerjee
Background:
clinical conditions, procedures, allergies, and lifestyle factors makes the task of prescribing
Objective:
recommendation system using the CatBoost algorithm to predict appropriate medications from
Methods:
including age, gender, ethnicity, marital status, city, weight, height, smoking habits,
variables were encoded using techniques such as multi-label binarization and label
application was also built using HTML/CSS for the frontend and Flask for the
Results:
F1 score, Hamming loss, and exact subset accuracy. The system was successfully
integrated into a user-friendly interface that accepted patient inputs and returned
medication predictions, which were partially or fully aligned with ground truth
medication data.
Conclusion:
This framework demonstrates the effectiveness of using CatBoost for personalized medication
performance model. The web interface enhances usability and enables potential integration into
clinical decision support systems, contributing to the broader goal of individualized healthcare
delivery.
BACKGROUND
significant challenge. Traditional prescribing practices often neglect critical nuances such as
prior procedures, allergies, smoking habits, and concurrent conditions—factors that directly
influence drug efficacy and safety. According to the American Medical Association (AMA),
adverse drug events account for nearly 5% of hospital admissions and contribute to increased
healthcare costs, patient morbidity, and even mortality (AMA Citation #1). These realities
underscore the pressing need for intelligent systems that can support safer and more effective
prescribing.
Advancements in machine learning (ML), especially tree-based models like CatBoost, offer
interactions between patient variables. Unlike deep learning models that require extensive
unstructured text processing, CatBoost excels in scenarios with rich structured data, such as
patient demographics, procedure codes, condition codes, medication history, and vital statistics.
[5]
In this project, we utilize structured synthetic patient data from the Synthea™ dataset to
simulate clinical variability. A CatBoost multi-label classifier was trained using features such
as age, gender, race, allergies, height, weight, conditions, procedures, encounters, and
geographical data. This model was then integrated into a Flask-based web application, enabling
real-time medication prediction for new patient inputs. To ensure scalability, the application also
retraining.
The impact of this work lies in its ability to improve clinical decision-making through transparent,
interpretable predictions that reduce prescribing errors and optimize therapeutic outcomes. It
supports the broader goal of advancing clinical decision support systems (CDSS) in biomedical
METHODS
This project utilized the Synthea synthetic health record dataset, a publicly available simulation
of realistic yet fully de-identified patient records. Synthea was selected for its ability to generate
Approximately 10,000 synthetic patient records were extracted for this study. Each record
contained structured features such as age, gender, height, weight, marital status, race, and
ethnicity, as well as multi-label categorical data including condition codes, procedure codes,
medication codes, allergy codes, and encounter types. This dataset provided a rich foundation
Data preprocessing involved several key steps to standardize and clean the dataset. For
numerical features (e.g., age, height, weight), missing values were imputed using median
values. For categorical fields like gender, marital status, or race, missing values were filled with
a default category such as “Unknown.” Multi-label fields (e.g., condition codes) were converted
from stringified lists to actual Python lists using safe evaluation, followed by multi-hot encoding
Given the high dimensionality resulting from multi-hot encoding across multiple features,
Decomposition (SVD) was used to reduce the feature space while preserving as much
variance as possible, making the model more memory-efficient and improving computational
performance.[9]
The target variable—Medication Codes—was also treated as a multi-label output using multi-
label binarization. The model was trained using CatBoostClassifier, a gradient boosting
algorithm that performs well with categorical data and handles class imbalance efficiently.
!Figure 1 provides a high-level overview of the complete medication recommendation pipeline.
Starting from data extraction and preprocessing, the flow progresses through dimensionality
reduction, model training using CatBoost, and culminates in web-based deployment for real-time
prediction.)
hyperparameter tuning was conducted to optimize metrics such as F1 score and Hamming loss.
Once trained, the CatBoost model and preprocessing artifacts (encoders, SVD transformers)
were saved using Joblib for deployment. A Flask-based web application was developed for
real-time usage.
(Figure 2 illustrates the real-time medication prediction workflow from the user
interface to result storage. After a user submits the form, the Flask server receives
the input, which is then preprocessed in alignment with the training pipeline. The
predictions. These predictions are returned to the frontend and also saved to a CSV
the Flask backend. The backend performs preprocessing consistent with the training pipeline,
applies the CatBoost model, and returns predicted medication codes to the user interface.
Model evaluation was carried out using standard multi-label metrics: Micro F1 Score,
Hamming Loss, and Subset Accuracy (Exact Match). These metrics were computed on the
validation dataset to quantify the model’s predictive performance. Additionally, sample outputs
were manually reviewed to verify clinical plausibility, offering a qualitative validation of the
In summary, the system combines structured EHR data, interpretable machine learning
RESULTS
Training Performance
The training phase was monitored over 600 iterations. Both training and validation
loss curves showed a consistent downward trend, indicating effective learning and
Evaluation Metrics
Final evaluation on the held-out validation set yielded the following performance:
The F1 Score of 0.851 indicates the model effectively balances precision and recall,
A Hamming Loss of just 1% reflects very few incorrect label assignments relative to
Subset Accuracy of 62.3% shows the model predicted the entire correct set of
medications for over 60% of patients—a significant achievement for a problem with large
label spaces.
To ensure robust and interoperable evaluation, both input and output were encoded using
SNOMED CT codes. This structure enables the model to be integrated directly into electronic
health record (EHR) systems and evaluated at the clinical code level.
Input Summary (Structured Features) True Predicted
Medication Medication
Codes Codes
Procedure: [80146002]….
In cases where exact matches were not observed, the model still produced therapeutically
valid alternatives. For instance, substituting “387544” for “861467” in a cardiovascular scenario
still aligns with clinical treatment guidelines, indicating that the model recognizes pharmacologic
(Fig:4 Patient information entry form for collect geography and normal data)
application. The interface allows users to input structured patient data, such as age, gender,
condition codes, procedure codes, and allergy codes, through an intuitive HTML form.
2. Preprocessing steps are executed, including SNOMED CT code encoding and SVD-
After generating code-based predictions, the system uses the RxNorm API to convert those
codes into human-readable medication names. This enhances interpretability for clinicians
and end-users by bridging machine-readable predictions with standard drug terminology. For
Logs user inputs and predictions into a CSV file for future model retraining,
Enables a full pipeline from data ingestion → model inference → clinical output,
This seamless integration of machine learning inference with clinical terminology services
(RxNorm) makes the system practical for real-world deployment in electronic health record
DISCUSSION
The results obtained from this project strongly validate the initial objective of developing an
The model achieved a Micro F1 Score of 0.851, Hamming Loss of 0.0105, and Exact Match
Accuracy of 62.3% on the held-out validation set, demonstrating its effectiveness in predicting
The success of the system can be attributed to several key design choices:
The use of the Synthea synthetic dataset provided a rich yet de-identified source of
data that allowed extensive model training without ethical concerns related to patient
and SVD dimensionality reduction, allowed efficient learning across a large number of
Integration of RxNorm API post-prediction enabled conversion of coded outputs into real
users.
Moreover, qualitative inspection of individual model outputs revealed that even when exact
ground truth matches were not achieved, the generated medication recommendations were
often clinically appropriate, indicating the model’s understanding of therapeutic context and
Limitations
While the system shows strong predictive capabilities, there are notable limitations that must
be acknowledged:
Incomplete Predictions: In some cases, the model fails to return any medication
predictions for a valid patient input. This could be due to data sparsity, underrepresented
format, which may not always be available in real-world clinical settings without prior
mapping or preprocessing.
Interpretability: While RxNorm resolves codes into medication names, deeper insights
(e.g., why a drug was predicted) remain opaque unless paired with explainability
techniques.
These limitations, while not detrimental to the core system, underscore the need for continuous
Several enhancements could further increase the utility and reliability of the system:
Expand Input Variables: Incorporate lab results, allergies, vitals, and genetic markers.
Real Clinical Validation: Collaborate with healthcare institutions under IRB approval.
Model Explainability: Use tools like SHAP or attention maps to explain decisions.
Label Coverage Optimization: Improve medication code recall for rare or multi-
combination therapies.
CONCLUSION
In conclusion, this project demonstrates the feasibility and utility of integrating structured
system. The pipeline leverages SNOMED CT-coded inputs and resolves RxNorm-coded
outputs into interpretable medication names, creating a seamless bridge between clinical
The system achieved strong performance across standard metrics, including a Micro F1 Score
of 0.851, low Hamming Loss, and practical exact match accuracy, validating its effectiveness
By combining robust machine learning with real-world standards like RxNorm and SNOMED
CT, the system moves beyond experimental modeling to offer a viable clinical decision support
tool. Its deployment via a Flask web application further enhances accessibility, enabling real-
This work represents a significant step forward in the application of biomedical informatics
methodologies to personalized medicine, and lays the groundwork for future research, clinical
6. Steindel SJ. SNOMED CT: Standardizing clinical phrases. MLO Med Lab Obs.
2010;42(3):26-28.
8. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare:
Review, opportunities and challenges. Brief Bioinform. 2017;19(6):1236-1246.
doi:10.1093/bib/bbx044
9. Ribeiro MT, Singh S, Guestrin C. "Why Should I Trust You?": Explaining the
predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. ACM;
2016:1135-1144. doi:10.1145/2939672.2939778
11.Shortliffe EH, Sepúlveda MJ. Clinical decision support in the era of artificial
intelligence. JAMA. 2018;320(21):2199-2200. doi:10.1001/jama.2018.17163