0% found this document useful (0 votes)

207 views42 pages

Data Mining in Medicine

Uploaded by

Ana-Maria Raileanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

207 views42 pages

Data Mining in Medicine

Uploaded by

Ana-Maria Raileanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Datamining in Medicine:

Selected Techniques and

Applications

Author
Adrian Giurca
[email protected]
Copyright, 2002 © webAI Group, www.datamining.ro
Overview
Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into
useful information - information that can be used to
increase revenue, cuts costs, or both. Data mining
software is one of a number of analytical tools for
analyzing data. It allows users to analyze data from
many different dimensions or angles, categorize it, and
summarize the relationships identified. Technically, data
mining is the process of finding correlation or patterns
among dozens of fields in large relational databases.
The nature of Medical Data
The rapidly emerging globally of data
requires standards in terminology,
vocabularies and formats to support data
sharing, standards for interfaces between
different sources of data and integration of
heterogeneous data (including images),
and standards in the design of electronic
patient records.
The nature of Medical Data
Many environments still lack such
standards, which hinders the use of data
analysis tools on large global databases,
limiting their applications to datasets
collected for specific diagnostic,
screening, prognostic, monitoring, therapy
support or other patient management
purposes.
The nature of Medical Data
Patient records collected for diagnosis and
prognosis typically encompass values of
anamnestic, clinical and laboratory
parameters, as well as results of particular
investigations, specific to the given task.
The nature of Medical Data
Such datasets are characterized by
 incompleteness (missing parameter values),
 incorrectness (systematic or random noise in the
data),
 sparness (few and/or non-representable patient
records available),
 inexactness (inappropriate selection of
parameters for the given task).
The nature of Medical Data
Datasets collected in monitoring (either acute
monitoring of a particular patient in an intensive
care unit, or discrete monitoring over long
periods of time in the case of patients with
chronic diseases) have additional characteristics:
they involve the measurements of a set of
parameters at different times, requesting the
temporal component to be taken into account in
data analysis.
Selected Medical Data Mining
Techniques
Current trends in medical decision making
show awareness of the need to introduce
formal reasoning, as well as intelligent
data analysis techniques in the extraction
of knowledge, regularities, trends and
representative cases from patient data
stored in medical records.
Selected Medical Data Mining
Techniques
Formal techniques include:
 decision theory
 symbolic reasoning technology
 methods at their intersection, such as
probabilistic belief networks
Selected Medical Data Mining
Techniques
Intelligent data analysis techniques include:
 machine learning
 clustering
 data visualization
 interpretation of time-ordered data ( derivation
and revision of temporal trends and other forms
of temporal data abstraction).
Selected Medical Data Mining
Techniques
Machine learning methods can be classified into three
major groups:
 inductive learning of symbolic rules (such as
induction of rules, decision trees and logic programs)
 statistical or pattern-recognition methods (such as k-
nearest neighbors or instance-based learning,
discriminate analysis and Bayesian classifiers)
 artificial neural networks (such as networks with
backpropagation learning, Kohonen's self organizing
network and Hofield's associative memory)
Selected Medical Data Mining
Techniques
Machine learning methods have been applied to
a variety of medical domains in order to improve
medical decision making.
These include diagnostic and prognostic
problems in: oncology, liver pathology,
neuropsychology, gynaecology.
Improved medical diagnosis and prognosis may
be achieved through automatic analysis of
patient data stored in medical records i.e. by
learning from past experiences.
Selected Medical Data Mining
Techniques
Given patient records with corresponding diagnoses, machine
learning methods are able to diagnose new cases. More
specifically, suppose E is a set of examples with known
classifications.
An example is described by Athe values of a fixed collection of
features (attributes): Ai, i =1,...,Nat
Each attribute can either have a finite set of values (discrete)
or take real numbers as values (continous).
 An individual example ej, j =1,...,Nex is a n-tuple of values vik
of attributes Ai Each example is assigned one of Ncl possible
values in the class variable C (classifications):c i, i =1,…, Ncl.
Selected Medical Data Mining
Techniques
For example, in the domain of early diagnosis of rheumatic diseases,
the patient record comprise 16 anamnestic attributes. Some of these
are continuous (age, duration of morning stiffness) and some are
discrete (e.g. joint pain, which can be arthrotic, arthritic, or not
present at all). There are eight possible diagnoses:
– degenerative spin diseases
– inflammatory spine diseases
– other inflamatory diseases
– extraarticular rheumatism
– crystal-induced synovitis
– non-specific rheumatic manifestations
– non-rheumatic diseases
Selected Medical Data Mining
Techniques
To classify (diagnose ) new cases, machine learning methods
can take different approaches.
– They can construct explicit symbolic rules that generalize
the training cases( rule induction and decision tree
induction). The induced rules or decision trees can then be
used to classify new cases.
– To store (some of) the training cases for reference
(instance-based learning). New cases can then be classified
by comparing them to the reference cases.
– To compute , for a given case to be classified , the
conditional probability of classes according to the Bayesian
formula and assign the most probable class to the case.
How does data mining work?
 While large-scale information technology has been evolving separate transaction
and analytical systems, data mining provides the link between the two. Data mining
software analyzes relationships and patterns in stored transaction data based on
open-ended user queries. Several types of analytical software are available:
statistical, machine learning, and neural networks. Generally, any of four types of
relationships are sought:
 Classes: Stored data is used to locate data in predetermined groups. For example, a
restaurant chain could mine customer purchase data to determine when customers visit
and what they typically order. This information could be used to increase traffic by
having daily specials.
 Clusters: Data items are grouped according to logical relationships or consumer
preferences. For example, data can be mined to identify market segments or consumer
affinities.
 Associations: Data can be mined to identify associations. The beer-diaper example is an
example of associative mining.
 Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack being
purchased based on a consumer's purchase of sleeping bags and hiking shoes.
Five major elements:

 Extract, transform, and load transaction data onto the

data warehouse system.
 Store and manage the data in a multidimensional
database system.
 Provide data access to business analysts and
information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph or
table.
Different levels of analysis
 Artificial neural networks: Non-linear predictive models that learn through training and resemble
biological neural networks in structure.
 Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation,
and natural selection in a design based on the concepts of natural evolution.
 Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for
the classification of a dataset. Specific decision tree methods include Classification and Regression Trees
(CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree
techniques used for classification of a dataset. They provide a set of rules that you can apply to a new
(unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by
creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART
typically requires less data preparation than CHAID.
 Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of
the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-
nearest neighbor technique.
 Rule induction: The extraction of useful if-then rules from data based on statistical significance.
 Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics
tools are used to illustrate data relationships.
What technological
infrastructure is required?
 Today, data mining applications are available on all size systems for mainframe,
client/server, and PC platforms. System prices range from several thousand
dollars for the smallest applications up to $1 million a terabyte for the largest.
Enterprise-wide applications generally range in size from 10 gigabytes to over
11 terabytes. There are two critical technological drivers:
 Size of the database: the more data being processed and maintained, the more powerful the
system required.
 Query complexity: the more complex the queries and the greater the number of queries
being processed, the more powerful the system required.
 Relational database storage and management technology is adequate for many
data mining applications less than 50 gigabytes. However, this infrastructure
needs to be significantly enhanced to support larger applications. Some vendors
have added extensive indexing capabilities to improve query performance.
Others use new hardware architectures such as Massively Parallel Processors
(MPP) to achieve order-of-magnitude improvements in query time. For example,
MPP systems from NCR link hundreds of high-speed Pentium processors to
achieve performance levels exceeding those of the largest supercomputers.
Software Design

Algorithms
Decision Tree (I)
The Decision Tree exploration engine, helps solve the task of classifying cases into multiple
categories. Decision Tree is the fastest algorithm when dealing with large amounts of attributes.
Decision Tree report provides an easily interpreted decision tree diagram and a predicted versus
real table.
Problems to Solve:
– Classification of cases into multiple categories

Target Attributes:
– Categorical or Boolean (Yes/No) attribute

Output Format:
– Classification statistics
– Predicted versus Real table (confusion matrix)
– Decision Tree diagram

Optimal Number of Records:

– Minimum of 100 records
– Maximum of 5,000,000 records
Decision Tree (II)
Preprocessing Suggested: Summary Statistics - to deselect attributes that contain to many values to provide any useful insight to
the exploration engine.

Underlying Algorithms: Information Gain splitting criteria, Shannon information theory and statistical significance tests.

The Data Used: Decision Tree works on data of any type. The DT algorithm is well-poised for analyzing very large databases
because it does not require loading all the data in machine main memory simultaneously. The software takes a full
advantage of this feature by implementing incremental DT learning with the help of the OLE DB for Data Mining
mechanism. The DT algorithm calculation time scales very well (grows only linearly) with increasing number of data
columns. At the same time, it grows more than linearly with the growing number of data records - as N*log(N), where N is
the number of records.

Problems to Solve: Decision Tree algorithm helps solving the task of classifying cases into multiple categories. In many cases,
this is the fastest, as well as easily interpreted machine learning algorithm. The DT algorithm provides intuitive rules for
solving a great variety of classification tasks ranging from predicting buyers/non-buyers in database marketing, to
automatically diagnosing patient in medicine, and to determining customer attrition causes in banking and insurance.

Target Attribute: The target attribute of a Decision Tree exploration must be of a Boolean (yes/no) or categorical data type.

When to Use This Algorithm: The Decision Tree exploration engine is used for task such as classifying records or predicting
outcomes. You should use decision trees when you goal is to assign your records to a few broad categories. Decision Trees
provide easily understood rules that can help you identify the best fields for further exploration.

The Output: The Decision Tree report starts of by giving measures resulting from the decision tree. These measures are the
Number of non-terminal nodes, Number of leaves, and depth of the constructed tree. Next, the report provides classification
statistics on the decision tree. After these measures, the predictive versus real table is shown.
Cluster Analysis
Cluster engine is used for the automated detecting clusters of
records that lie close to each other in a certain sense in the space
of all variables. Such clusters may represent different situations
or target groups, which one might find beneficial to study
separately. The Cluster engine places records corresponding to
different clusters in separate datasets for further analysis. The
cluster analysis proves to be useful for applications ranging from
database marketing to quality control.
The use of all attributes makes the Cluster algorithm very useful for
beginning data mining – it is an undirected method, and does not
require the selection of a target attribute.
Fuzzy Logic Classification
The algorithm is used for assigning cases to different classes. On
the output this exploration engine not only produces the
prediction to which class the case belongs, but also provides the
obtained symbolic classification rule generalized automatically
from the training examples. The classifier engine furnishes
simpler and more reliable results than systems based on pure
decision trees ideology. The prediction accuracy obtained for the
testing cases is comparable to the accuracy obtained for the
training cases. And again, statistical significance of the
generalized rule is determined rigorously by the classifier engine.
Note that the classifier engine can utilize either SKAT or MLR
or neural network prediction method as its driving mechanism.
Linear Regression
The Stepwise Linear Regression algorithm is, to our knowledge, the
only system capable of including categorical variables, in addition to
numerical and logical variables, in the regression analysis.
MLR discovers linear relations in data, automatically selecting only
those independent variables which influence the target variable
most. It also pinpoints redundant, mutually correlating independent
variables, and includes only their minimal subset in the results.
The Linear Regression is based on a very quick and robust calculation
algorithm. As with all other , the rigorous determination of
significance of the obtained results is performed for each model
considered. MLR is the fastest exploration engine and thus can be
used as a complementary preprocessing module for the SKAT
exploration engine.
Symbolic Knowledge Acquisition
Technology (SKAT)
Data mining is one of the most promising modern information technologies. The corporate world has
learned to derive new value from data by utilizing various intelligent tools and algorithms
designed for an automated discovery of non-trivial, useful, and previously unknown knowledge
in raw data.
Which factors influence the future variation of the price of some security shares?

What characteristics of a potential customer of some service make him/her the most probable buyer?

These and numerous other business questions can be successfully addressed by data mining.
The majority of available data mining tools are based on a few well-established technologies for data
analysis. Different knowledge discovery methods are suited best for different applications.
Among the useful knowledge presentation tasks one can name the dependency detection,
numerical prediction, explicit relation modeling, or classification rules.
Despite the usefulness of traditional data mining methods in various situations, we choose to
concentrate here first on the problems that plague these methods. Then we discuss the solutions
to these problems, which become available with an advent of SKAT - a next generation data
mining technology. We outline the reasons, foundations, and commercial implementations of this
emerging approach.
Symbolic Knowledge Acquisition
Technology (SKAT)
 Among the various tasks a data mining system is asked to perform, two
questions are encountered most frequently:
– Which database fields influence the selected target field?
– Precisely how the target field depends on other fields in the database?
 While there are many successful methods designed to answer the first
question, it is far more difficult to answer the second. Why does this
happen? Simply, an observation that across a number of cases with close
values of all parameters except some parameter X, the target parameter
Y varies considerably, implies that Y depends on X. For multi-
dimensional dependencies the issue becomes less straightforward, but
the basic idea for solving the problem is the same. At the same time, the
task of automated determination of an explicit form of the dependence
between several variables is significantly more difficult. The solution to
this problem cannot be based on similar simple-minded considerations.
Symbolic Knowledge
Acquisition Technology (SKAT)
Traditional methods for finding the precise form of a sought
relation implement the search for an expression representing the
dependence among possible expressions from some fixed class.
This idea is exploited in many existing data mining applications.
For example, one of the most straightforward and popular
methods of search for simple numerical dependencies - linear
regression - selects a solution out of a set of linear formulae
approximating the sought dependence. Systems from another
popular class of data mining algorithms - decision trees - search
for classification rules represented as trees involving simple
equalities and inequalities in the nodes connected by Boolean
AND and OR operations.
Symbolic Knowledge
Acquisition Technology (SKAT)
However, beyond the limits of the narrow classes of dependencies that
can be found by these systems there is an endless sea of dependencies
which cannot even be represented in the language used by these
systems. For example, assume you are using a decision tree system to
analyze the data holding the following simple rule: "most frequent
buyers of Post cereal are homemakers of age smaller than the inverse
square of their family income multiplied by a certain constant". A
traditional system has no means to discover such a rule. Only if one
furnishes to the system explicitly the parameter "inverse square of the
family income" can the stated rule be found by traditional systems. In
other words, one has to guess an approximate form of the solution first -
and then the machine does the rest of the job efficiently. While guessing
a general form of the solution prior to automated modeling might be a
challenging brain twister, it certainly does not make life of a corporate
data analyst much easier.
Symbolic Knowledge
Acquisition Technology (SKAT)
Case Study :

Bayesian Classification.
Bayesian Classification: Why?

 Probabilistic learning: Calculate explicit probabilities for

hypothesis, among the most practical approaches to certain
types of learning problems
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct.
Prior knowledge can be combined with observed data.
 Probabilistic prediction: Predict multiple hypotheses, weighted
by their probabilities
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision
making against which other methods can be measured
Bayesian Theorem: Basics
Let X be a data sample whose class label is unknown
Let H be a hypothesis that X belongs to class C
For classification problems, determine P(H/X): the
probability that the hypothesis holds given the observed
data sample X
P(H): prior probability of hypothesis H (i.e. the initial
probability before we observe any data, reflects the
background knowledge)
P(X): probability that sample data is observed
P(X|H) : probability of observing the sample X, given that
the hypothesis holds
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X)
follows the Bayes theorem
P(H | X )  P( X | H )P(H )
P( X )
Informally, this can be written as
posterior =likelihood x prior / evidence
MAP (maximum posteriori) hypothesis
h  arg max P(h | D)  arg max P(D | h)P(h).
MAP hH hH
Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
Naïve Bayesian Classifier
Each data sample X is represented as a vector {x1, x2, …, xn}
There are m classes C1, C2, …, Cm
Given unknown data sample X, the classifier will predict that
X belongs to class Ci, iff
P(Ci|X) > P (Cj|X) where 1  j  m , I  J
By Bayes theorem, P(Ci|X)= P(X|Ci)P(Ci)/ P(X)
Naïve Bayes Classifier
A simplified assumption: attributes are conditionally
independent:
n
P( X | C i)   P( x k | C i)
k 1
The product of occurrence of say 2 elements x1 and x2, given
the current class is C, is the product of the probabilities of
each element taken separately, given the same class
P([y1,y2],C) = P(y1,C) * P(y2,C)
No dependence relation between attributes
Greatly reduces the computation cost, only count the class
distribution.
Once the probability P(X|Ci) is known, assign X to the class
with maximum P(X|Ci)*P(Ci)
Training dataset
age income student credit_rating buys_computer
<=30 high no fair no
Class: <=30 high no excellent no
C1: 30…40 high no fair yes
buys_computer= >40 medium no fair yes
‘yes’ >40 low yes fair yes
C2: >40 low yes excellent no
buys_computer= 31…40 low yes excellent yes
‘no’
<=30 medium no fair no
Data sample <=30 low yes fair yes
X =(age<=30, >40 medium yes fair yes
Income=mediu
m, Student=yes <=30 medium yes excellent yes
Credit_rating= 31…40 medium no excellent yes
Fair) 31…40 high yes fair yes
>40 medium no excellent no
Naïve Bayesian Classifier:
Example
Compute P(X/Ci) for each class
P(age=“<30” | buys_computer=“yes”) = 2/9=0.222
P(age=“<30” | buys_computer=“no”) = 3/5 =0.6
P(income=“medium” | buys_computer=“yes”)= 4/9 =0.444
P(income=“medium” | buys_computer=“no”) = 2/5 = 0.4
P(student=“yes” | buys_computer=“yes)= 6/9 =0.667
P(student=“yes” | buys_computer=“no”)= 1/5=0.2
P(credit_rating=“fair” | buys_computer=“yes”)=6/9=0.667
P(credit_rating=“fair” | buys_computer=“no”)=2/5=0.4

X=(age<=30 ,income =medium, student=yes,credit_rating=fair)

P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044

P(X|buys_computer=“no”)= 0.6 x 0.4 x 0.2 x 0.4 =0.019
P(X|Ci)*P(Ci ) : P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.028
P(X|buys_computer=“yes”) * P(buys_computer=“yes”)=0.007

X belongs to class “buys_computer=yes”

Naïve Bayesian Classifier:
Comments
 Advantages :
– Easy to implement
– Good results obtained in most of the cases
 Disadvantages
– Assumption: class conditional independence , therefore loss of accuracy
– Practically, dependencies exist among variables
– E.g., hospitals : patients: Profile : age, family history etc
Symptoms : fever, cough etc , Disease : lung cancer, diabetes etc ,
Dependencies among these cannot be modeled by Naïve Bayesian
Classifier, use a Bayesian network
 How to deal with these dependencies?
– Bayesian Belief Networks
Naive Bayesian Classifier:
Example II
 Given a training set, we can compute the probabilities

Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
Bayesian Networks
 Bayesian belief network allows a subset of the
variables conditionally independent
 A graphical model of causal relationships
– Represents dependency among the variables
– Gives a specification of joint probability distribution
Nodes: random variables
Links: dependency
X Y X,Y are the parents of Z
Y is the parent of P
No dependency between Z
Z and P
P Has no loops or cycles
Bayesian Belief Network: An
Example
Family
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

The conditional probability table for

the variable LungCancer:
Shows the conditional probability for
PositiveXRay Dyspnea each possible combination of its
parents
n
Bayesian Belief Networks P( z1,..., zn)   P ( z i | Parents ( Z i ))
i 1

Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Data Visualization in Support of Executive Decision Making
No ratings yet
Data Visualization in Support of Executive Decision Making
14 pages
Hypothesis Testing Spinning The Wheel
No ratings yet
Hypothesis Testing Spinning The Wheel
1 page
Python ML Course Notes
No ratings yet
Python ML Course Notes
36 pages
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
No ratings yet
Database Management Systems by Raghu Ramakrishnan: Special Features of Book
3 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Cluster
100% (1)
Cluster
72 pages
Data Science With R Text Mining by Graham Williams
No ratings yet
Data Science With R Text Mining by Graham Williams
21 pages
Topic 1 Etw3482
100% (2)
Topic 1 Etw3482
69 pages
Big Data Course for MBA Students
No ratings yet
Big Data Course for MBA Students
27 pages
Data Analysis
No ratings yet
Data Analysis
30 pages
Augmented Analytics for BI Experts
No ratings yet
Augmented Analytics for BI Experts
8 pages
Distinguishing Reporting from Analysis
No ratings yet
Distinguishing Reporting from Analysis
21 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Data Mining Concepts and Techniques
100% (1)
Data Mining Concepts and Techniques
55 pages
Role of Machine Learning in The Field of Fiber Reinforced Polymer
No ratings yet
Role of Machine Learning in The Field of Fiber Reinforced Polymer
6 pages
Data Mining With Bigdata
No ratings yet
Data Mining With Bigdata
30 pages
Building A Recommendation System With R - Sample Chapter
No ratings yet
Building A Recommendation System With R - Sample Chapter
11 pages
DataMiningForTheMasses (001 158)
No ratings yet
DataMiningForTheMasses (001 158)
158 pages
Visual Analytics for Business Insights
No ratings yet
Visual Analytics for Business Insights
36 pages
Data Science Case Study For Introduction
No ratings yet
Data Science Case Study For Introduction
19 pages
Data Mining Course Overview and Syllabus
No ratings yet
Data Mining Course Overview and Syllabus
129 pages
Research Data Strategy Framework Overview
No ratings yet
Research Data Strategy Framework Overview
9 pages
MovieLens-Based Recommendation System
No ratings yet
MovieLens-Based Recommendation System
6 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
No ratings yet
U1T3 - White Paper - Data Visualization Techniques From Basics To Big Data With SAS Visual Analytics
19 pages
30 Amazing Machine Learning Projects For The Past Year (v.2018)
No ratings yet
30 Amazing Machine Learning Projects For The Past Year (v.2018)
22 pages
Machine Learning Essentials
0% (1)
Machine Learning Essentials
2 pages
CH 6
No ratings yet
CH 6
72 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
100% (1)
A Big Data Analytics Study Challenges, Unresolved Research Issues, and Techniques
8 pages
Traits and Characteristics of Big Data
No ratings yet
Traits and Characteristics of Big Data
25 pages
ARIMA Models for Naira-Dollar Exchange Rate
No ratings yet
ARIMA Models for Naira-Dollar Exchange Rate
8 pages
A Guide To Deep Learning in Healthcare: Nature Medicine January 2019
No ratings yet
A Guide To Deep Learning in Healthcare: Nature Medicine January 2019
7 pages
Decision Support Systems Guide
No ratings yet
Decision Support Systems Guide
9 pages
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
No ratings yet
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
89 pages
H2o Training Day
No ratings yet
H2o Training Day
180 pages
Statistics Machine Learning Python Draft
No ratings yet
Statistics Machine Learning Python Draft
173 pages
Data Analytics for Aspiring Analysts
No ratings yet
Data Analytics for Aspiring Analysts
54 pages
Data Management & AI On Databricks
No ratings yet
Data Management & AI On Databricks
14 pages
Handling Missing Values in KNIME
No ratings yet
Handling Missing Values in KNIME
162 pages
Python for Data Science Overview
No ratings yet
Python for Data Science Overview
1 page
Big Data Architecture Overview
No ratings yet
Big Data Architecture Overview
8 pages
Orange: Data Mining
100% (1)
Orange: Data Mining
10 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Deep Learning and CNNFYTGS5101-Guoyangxie
No ratings yet
Deep Learning and CNNFYTGS5101-Guoyangxie
42 pages
CODE201911 Practices DataVisualizations
No ratings yet
CODE201911 Practices DataVisualizations
19 pages
7 Classification
100% (3)
7 Classification
63 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
BC0041 Fundamentals of Database Management Paper 1
No ratings yet
BC0041 Fundamentals of Database Management Paper 1
11 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Microsoft Data Science Interview Guide
No ratings yet
Microsoft Data Science Interview Guide
17 pages
Business Intelligence & Business Analytics
No ratings yet
Business Intelligence & Business Analytics
8 pages
An Overview of Data Mining in Medical Field: Reema Arora, Sandeep Jaglan
No ratings yet
An Overview of Data Mining in Medical Field: Reema Arora, Sandeep Jaglan
4 pages
Medical Data Mining Techniques Survey
No ratings yet
Medical Data Mining Techniques Survey
4 pages
Survey of Data Mining Techniques Used in Healthcare Domain
No ratings yet
Survey of Data Mining Techniques Used in Healthcare Domain
8 pages
Data Mining Applications in Pharma Industry
No ratings yet
Data Mining Applications in Pharma Industry
61 pages
HELTHcrm
No ratings yet
HELTHcrm
8 pages
2.1 Waterfall Model
No ratings yet
2.1 Waterfall Model
3 pages
PowerEdge C1100 Spec Sheet en
No ratings yet
PowerEdge C1100 Spec Sheet en
2 pages
3rd International Conference On Cloud Computing
No ratings yet
3rd International Conference On Cloud Computing
1 page
Understanding Product Lifecycle Management
No ratings yet
Understanding Product Lifecycle Management
17 pages
SlideDeck Report Guidelines
No ratings yet
SlideDeck Report Guidelines
4 pages
Job Leveling Global Grading and Career Map
80% (10)
Job Leveling Global Grading and Career Map
3 pages
Photoshop CS5 Basics Workshop
No ratings yet
Photoshop CS5 Basics Workshop
26 pages
01 Microservices Material
No ratings yet
01 Microservices Material
4 pages
Project Title
No ratings yet
Project Title
15 pages
Home Work
No ratings yet
Home Work
3 pages
Advanced R Programming
No ratings yet
Advanced R Programming
12 pages
Salesforce Spring 24 Release Notes
No ratings yet
Salesforce Spring 24 Release Notes
14 pages
Chapter 6: Output: Multiple Choice
No ratings yet
Chapter 6: Output: Multiple Choice
30 pages
Exact ODE Solutions for Students
No ratings yet
Exact ODE Solutions for Students
2 pages
RGB and Hexadecimal Color Codes
No ratings yet
RGB and Hexadecimal Color Codes
7 pages
Genetic Algorithms Explained: Key Concepts
100% (1)
Genetic Algorithms Explained: Key Concepts
7 pages
Bangalore University 5th Sem Syllabus
No ratings yet
Bangalore University 5th Sem Syllabus
11 pages
Orchestration and Automation - Ryan Darst - Marco Garcia
No ratings yet
Orchestration and Automation - Ryan Darst - Marco Garcia
41 pages
75UH5J-M Datasheet (Low) LG UHD Signage 230913
No ratings yet
75UH5J-M Datasheet (Low) LG UHD Signage 230913
3 pages
Notes From Work Rules
91% (11)
Notes From Work Rules
17 pages
Unit 4: Results Recording: Closing Inspection Characteristics
No ratings yet
Unit 4: Results Recording: Closing Inspection Characteristics
2 pages
Matlab Programming Principles by Schrimpf
No ratings yet
Matlab Programming Principles by Schrimpf
45 pages
References TFNSW
No ratings yet
References TFNSW
4 pages
Fortinetwork
No ratings yet
Fortinetwork
4 pages
SQL Optimization
No ratings yet
SQL Optimization
6 pages
ABB Ability Condition Monitoring For Switchgear - SWICOM - June 2019
No ratings yet
ABB Ability Condition Monitoring For Switchgear - SWICOM - June 2019
35 pages
RADAN 2020.0 - Installation Document
No ratings yet
RADAN 2020.0 - Installation Document
24 pages
How To Configure Windows 10 Privacy Settings During Setup
No ratings yet
How To Configure Windows 10 Privacy Settings During Setup
9 pages
Inserting Image and Object
No ratings yet
Inserting Image and Object
12 pages
E143-09 Aspect Synchronization - RevA
No ratings yet
E143-09 Aspect Synchronization - RevA
20 pages

Data Mining in Medicine

Uploaded by

Data Mining in Medicine

Uploaded by

Datamining in Medicine:

Selected Techniques and

 Extract, transform, and load transaction data onto the

Optimal Number of Records:

 Probabilistic learning: Calculate explicit probabilities for

X=(age<=30 ,income =medium, student=yes,credit_rating=fair)

P(X|Ci) : P(X|buys_computer=“yes”)= 0.222 x 0.444 x 0.667 x 0.0.667 =0.044

X belongs to class “buys_computer=yes”

LC 0.8 0.5 0.7 0.1

The conditional probability table for

You might also like