Data Mining in Healthcare
Nguyen Dao Thuy Tien - 9517
Phung Nguyen Anh Khoa - 9588
AGENDA
• Data mining in general
• Data mining in Healthcare
• Classification in data mining
• Naïve Bayes algorithm
• Demo
INTRODUCTION TO DATA MINING
• Data mining is the process of identifying hidden patterns and
establishing relationships in large databases to solve many problems.
• Both businesses and sciences benefit from Data Mining.
MAIN TASKS OF DATA MINING
• Anomaly detection (change/deviation detection)
• Dependency modelling
• Clustering
• Classification
• Regression
• Summarization
DATA MINING TECHNIQUES
• Association Rule
• Classification
• Clustering
• Prediction
• Sequential patterns
• Decision Tree
• Neural Networks
DATA MINING PROCESS
DATA MINING IN HEALTHCARE
• Healthcare covers a detailed processes of the diagnosis, treatment
and prevention of disease, injury and other physical and mental
impairments in humans.
• The number of electronic health records or EHRs has increased
quickly
Data mining can help in healthcare sector.
• Benefits of applying data mining in healthcare:
• Improving healthcare quality
• Supporting insurances
• Helping healthcare administration
IMPROVING HEALTHCARE QUALITY
• Reducing many errors and issues that lead to unnecessary deaths.
• Finding out patterns and anomalies better.
• Discovering the unreported side effects of common drugs.
• Determining disease by considering the pattern of the lifestyle.
SUPPORTING HEALTH INSURANCE
• Detecting fraudulent and abusive behavior.
• Attracting new consumers as well as keeping current ones.
• Helping insurance companies understand the health insurance plans
better.
HELPING HEALTHCARE ADMINISTRATION
• Making a profit.
• Managing resources.
• Minimizing costs.
• Maintaining good customer services.
PROBLEMS OF DATA MINING IN HEALTHCARE
• Healthcare data mining can be limited by the accessibility of data.
• Data in healthcare has quality problems.
• Legal and social issues, such as data ownership and privacy issues
related to healthcare data.
• Advanced knowledge is required to gain optimal results.
TRENDS IN THE FUTURE
• Develop more specific applications with reasonable price for
healthcare industry.
• Electronic medical records will replace traditional paper-based
records.
• The healthcare system will be real-time and deployed on the clouds.
Data Mining
Classification
• K-Nearest neighbor(kNN) • Neutral networks(NN)
• Decision trees • Support vector machines(SVM)
• Naïve Bayes • Linear regression(LR)
• Logistic regression
Naïve Bayes
ADVANTAGES DISADVANTAGES
• Fast • Strong assumption
• Scalable • Scarcity
• Big data require
• Variety
• Independent predictors
• Intelligent
CHARACTERISTICS
• Formatted
• Adequation
REAL WORLD DATA PREPARATION
• Discretization
• Cleaning
• Integration
• Transformation
• Reduction
APPLIED IN PROJECT
Implementation
LIBRARIES ALGORITHMS
• Naivebayes • Naïve Bayes
• Dplyr
• Equal-width Binning
• Ggplot
• Psych
DATA PROCESSING
• Defining
• Shrinking
• Calculating
SKETCHING
APPLY NAÏVE BAYES
THANK YOU FOR YOUR LISTENING!