Building AI Models in Healthcare
Building AI Models in Healthcare
Analytics Model in
Healthcare/Pharmaceuti
cal Industry
Pankaj Baid
Compliance in healthcare is crucial given the potential impact on patient safety and organizational viability (1/2)
▪ Comprehensive mapping of applicable regulations and standards, such as HIPAA for patient data
Understanding the privacy, FDA guidelines for drug approval, and GxP for quality control.
Regulatory ▪ Analyze the implications of non-compliance, including financial penalties, legal actions, and damage to
Landscape reputation.
▪ Example: Use of HIPAA violation reports to identify common areas of non-compliance.
▪ Identifying and integrating diverse data sources including audit findings, adverse event reports, and
Data Collection and regulatory updates.
Integration ▪ Challenges: Data quality issues, disparate data formats, and data security concerns
▪ Example: Integrating electronic health record (EHR) audit logs to monitor unauthorized access
attempts as a data source.
▪ Selection of relevant features that influence compliance outcomes, such as patterns of prescribing,
Feature Selection reporting timelines, and training adherence.
and Engineering ▪ Example: Analyzing drug prescription patterns to identify deviations potentially indicative of non-
compliant behaviors
▪ Employing statistical models and machine learning algorithms to predict risk areas and compliance
trends.
Model Development ▪ Example: Using anomaly detection algorithms to identify outliers in clinical trial data reporting,
suggesting possible areas of non-compliance.
▪ Generating actionable insights through data visualization tools and dashboards to highlight
Insights & compliance risk areas and trends.
Visualization ▪ Example: Creating a dashboard to visualize trends in adverse event reporting over time,
helping to pinpoint systemic issues or areas needing closer scrutiny.
Continuous
▪ Establishing a process for ongoing data collection, model updating, and refinement to adapt
Monitoring &
to changing regulations and business practices.
Improvement ▪ Example: Regularly updating the model with new drug approval data to reflect current
regulatory focus areas and compliance standards.
Developing an analytics model for insights on compliance risks and trends in the
healthcare/pharmaceutical industry is a critical step toward ensuring patient safety and
Conclusion
Conclusion
regulatory adherence. Through strategic data collection, thoughtful model development, and
continuous refinement, organizations can identify potential compliance issues before they
escalate, ensuring that they remain on the forefront of quality care and industry standards.
References:
•Industry guidelines from FDA, EMA, and other regulatory bodies.
•HIPAA, GDPR, and other data protection regulations relevant to patient privacy and data
security.
•Academic
SOURCE: and industry research on compliance and risk management in healthcare and
Primary Research 3
Abbreviations - GxP Good Practices can be used with GMP (Manufacturing), GCP (Clinical), GLP (Lab) and GDP (Distribution), GDocP (Documentation)
Healthcare companies can benefit from organizing their automation efforts around these significant elements
Leveraging advanced analytical / machine learning models for automated, personalized decisions
across the customer life cycle
Building and deploying advanced analytics and machine learning models at scale
Augmenting advanced analytical models with capabilities1 to reduce costs, streamline customer
journeys, and enhance the overall experience
Building an enterprise wide digital-marketing engine to translate insights generated in the decision-
making layer into a set of coordinated messages delivered through the company’s engagement layer
1 Next generation technologies like Natural Language Processing (NLP), facial recognition, block chain, Robotic process automation and behavioural analytics 4
2 AA – Advanced Analytics, ML – Machine Learning
Types of Data Analytics in Healthcare
5
Data Analytics Use Cases in Healthcare
01
Efficient Bed utilisation 10 02 Disease Control
Efficiently managing available beds remains a challenge Healthcare analytics assists in discovering new therapies,
for hospitals. Data scientists come to the rescue by innovative drugs, and technologies by identifying
predicting future bed demand and identifying when strength & weaknesses in trials & processes.
Reduced Impatient
occupied beds will become available
6
Healthcare Analytical Use Case: Predicting Hospital Readmission for Diabetes Patients
Problem Statement: Hospital readmission among diabetes patients presents significant challenges for healthcare systems,
affecting patient outcomes and healthcare costs. The goal is to develop an analytics model that predicts the likelihood of
readmission for diabetes patients, enabling targeted interventions to reduce readmission rates.
▪ Initial data exploration involved analyzing patient demographics, treatment protocols, hospital stay details,
Descriptive Analytics readmission rates, and post-discharge outcomes. For example, we observed that patients with longer initial hospital
stays and those not enrolled in a post-discharge follow-up program had higher readmission rates.
Model Development ▪ We experimented with several predictive models, including Logistic Regression and Random Forest, to predict
readmissions. Logistic Regression provided a solid baseline with interpretable results, while Random Forest offered
& Trials improved accuracy but at the cost of interpretability.
Feature Selection ▪ Feature importance analysis revealed that the length of the initial hospital stay, HbA1c levels at discharge, number of
hospital visits in the past year, and engagement in a post-discharge follow-up program were critical predictors of
and Model Insights readmission. Logistic Regression's coefficients for these features showed a high degree of explainability, indicating
their significant impact on readmission likelihood.
▪ The Logistic Regression model achieved an accuracy of 78% and a precision of 75%, with an AUC (Area Under the
Results Curve) of 0.82, indicating a good balance between sensitivity and specificity. The model's explainability was
particularly high for the identified key features, with HbA1c levels and engagement in follow-up programs showing the
strongest predictive power for readmission.
▪ Targeted Interventions: Patients with high HbA1c levels at discharge and those with longer hospital stays should be prioritized for
post-discharge follow-up programs.
Actionable Insights
▪ Program Development: Developing personalized patient education and follow-up programs focusing on managing HbA1c levels
and recognizing warning signs of potential complications.
▪ Resource Allocation: Allocating resources more efficiently by identifying patients at higher risk of readmission for targeted
interventions, potentially reducing overall readmission rates and associated healthcare costs. 7
Appendix
GMP, GLP and GCP
Good Pharmacovigilance Practice (GPvP) - Standards for the activities relating to the detection, assessment,
understanding, and prevention of adverse effects or any other medicine-related problem to promote the safe and
effective use of medicinal products, particularly through providing timely information about the safety of
medicinal products to patients, healthcare professionals, and the public
9
Banking Analytical Use Case: Detecting Financial Crime through Transaction Monitoring
Problem Statement: In the banking industry, detecting and preventing financial crimes such as money laundering and fraud are
critical challenges. The objective of this analytical model is to identify suspicious transactions that could indicate financial crime,
thereby enabling the bank to take timely preventive actions.
▪ The initial phase involved analyzing historical transaction data, including transaction amounts, frequency, account
types, geographic locations, and previously flagged transactions for suspicious activity. Patterns such as high-
Descriptive Analytics frequency transactions in short intervals, transactions just below reporting thresholds, and transactions involving high-
risk countries were identified as potential indicators of financial crime.
▪ To detect suspicious transactions, we evaluated various machine learning models, including Logistic Regression and
Model Development Gradient Boosting Machines (GBM). Logistic Regression offered an interpretable model framework, making it easier
& Trials to understand feature influences. In contrast, GBM provided higher accuracy and better performance in handling non-
linear relationships and interactions between features.
▪ Feature importance analysis from both models highlighted that transaction amount, frequency of transactions above a
Feature Selection certain threshold within a short period, and transactions involving accounts recently created were key predictors of
and Model Insights suspicious activity. Logistic Regression provided clear insights into how each feature influenced the likelihood of a
transaction being flagged as suspicious, with high-value transactions and those to/from high-risk countries having the
most substantial weights.
▪ The Gradient Boosting Machine model achieved superior performance, with an accuracy of 85% and a precision of
Results 88%. However, for the sake of regulatory compliance and explainability to non-technical stakeholders, the Logistic
Regression model, with an accuracy of 80% and precision of 83%, was chosen. Its explainability regarding feature
importance was crucial for actionable insights and for justifying decisions to regulatory bodies.
▪ Enhanced Monitoring: Transactions just below reporting thresholds and those involving newly created accounts should be subject
to enhanced monitoring.
Actionable Insights
▪ Risk Scoring: Implementing a dynamic risk scoring system based on key features like transaction amount, account age, and
geographic location to prioritize investigations.
▪ Policy Updates: Revising transaction monitoring policies to include findings from the model, such as adjusting thresholds for
certain types of transactions and enhancing scrutiny of transactions involving high-risk countries. 10
Pharmaceutical Quality Assurance - In the fast-paced and highly regulated world of
pharmaceuticals, ensuring the highest standards of quality is paramount
SOURCE: https://www.scilife.io/blog/pharma-qa-steps 11
Data sets : Data types on Electronic Healthcare Records
BMI = body mass index; CBC = Complete Blood Count; DTaP = Diphtheria, Tetanus, & acellular Pertussis; HbA1c = Hemoglobin A1c; HepB = Hepatitis B;
HRA = health risk assessment; IPV = Inactivated poliovirus; PHQ = patient health questionnaire; PRO = Patient Reported Outcome
12