SlideShare a Scribd company logo
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Classification and Prediction
Chapter 8
Introduction
There are two forms of data analysis that can be used for extracting models
describing important classes or to predict future data trends.
These two forms are as follows:
ď‚§ Classification
ď‚§ Prediction
Introduction
ď‚§ Classification models predict categorical class labels; and prediction models
predict continuous valued functions.
ď‚§ For example, we can build a classification model to categorize bank loan
applications as either safe or risky
ď‚§ Prediction model to predict the expenditures in dollars of potential customers
on computer equipment given their income and occupation.
What is classification?
ď‚§ Following are the examples of cases where the data analysis task is
Classification:
ď‚§ A bank loan officer wants to analyze the data in order to know which customer
(loan applicant) is risky or which are safe.
ď‚§ A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
ď‚§ In both of the above examples, a model or classifier is constructed to predict
the categorical labels. These labels are risky or safe for loan application data
and yes or no for marketing data.
What is prediction?
ď‚§ Following are the examples of cases where the data analysis task is Prediction:
ď‚§ Suppose the marketing manager needs to predict how much a given customer
will spend during a sale at his company.
ď‚§ In this example we are bothered to predict a numeric value. Therefore the data
analysis task is an example of numeric prediction. In this case, a model or a
predictor will be constructed that predicts a continuous-valued-function or
ordered value.
ď‚§ Regression analysis is a statistical methodology that is most often used for
numeric prediction.
How Does Classification Works?
With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes
two steps:
ď‚§ Building the Classifier or Model
ď‚§ Using Classifier for Classification
Building the Classifier or Model
ď‚§ This step is the learning step or the learning phase.
ď‚§ In this step the classification algorithms build the classifier.
ď‚§ The classifier is built from the training set made up of database tuples and their
associated class labels.
ď‚§ Each tuple that constitutes the training set is referred to as a category or class.
These tuples can also be referred to as sample, object or data points.
Building the Classifier or Model
Using Classifier for Classification
ď‚§ In this step, the classifier is used for classification.
ď‚§ Here the test data is used to estimate the accuracy of classification rules.
ď‚§ The classification rules can be applied to the new data tuples if the accuracy is
considered acceptable.
Classification by Decision Tree Induction
Decision tree induction is the learning of decision trees from class labeled
training tuples.
Decision tree is a flowchart-like tree structure where internal nodes (non leaf
node) denotes a test on an attribute branches represent outcomes of tests Leaf
nodes (terminal nodes) hold class labels and Root node is the topmost node.
Classification by Decision Tree Induction
Classification by Decision Tree Induction
Example
RID age income student credit-rating Class
1 youth high no fair ?
Test on age: youth
Test of student: no
Reach leaf node
Class NO: the customer Is Unlikely to buy a computer
Algorithm for constructing Decision Tress
Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide-
and-conquer manner.
• At start, all the training tuples are at the root
• Tuples are partitioned recursively based on selected attributes
• If all samples for a given node belong to the same class
Label the class
• If There are no remaining attributes for further partitioning
Majority voting is employed for classifying the leaf
• There are no samples left
Label the class and terminate
• Else
Got to step 2
K-Mean Algorithms
1. Take mean value
2. Find nearest number of mean and put in cluster.
3. Repeat one and two until we get same mean
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

PPTX
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
PPTX
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
PPTX
Data mining: Classification and prediction
DataminingTools Inc
 
PPTX
Data mining an introduction
Dr-Dipali Meher
 
PDF
Ghhh
agammya
 
PPTX
The 8 Step Data Mining Process
Marc Berman
 
PPTX
Data Cleaning Techniques
Amir Masoud Sefidian
 
PDF
Introduction to Data Mining
Kai Koenig
 
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
Data mining: Classification and prediction
DataminingTools Inc
 
Data mining an introduction
Dr-Dipali Meher
 
Ghhh
agammya
 
The 8 Step Data Mining Process
Marc Berman
 
Data Cleaning Techniques
Amir Masoud Sefidian
 
Introduction to Data Mining
Kai Koenig
 

What's hot (19)

PPTX
Data Mining
SHIKHA GAUTAM
 
PDF
Data mining and data warehouse lab manual updated
Yugal Kumar
 
PPTX
Data mining concepts and work
Amr Abd El Latief
 
PPTX
01 Introduction to Data Mining
Valerii Klymchuk
 
PPT
Datawarehousing
sumit621
 
ODP
Data mining
Daminda Herath
 
PPT
Lecture1
sumit621
 
PPT
Cssu dw dm
sumit621
 
PPT
Data Warehouse By Piyush
astronish
 
PPTX
142230 633685297550892500
sumit621
 
PPT
Database
sumit621
 
PPT
Chapter 13 data warehousing
sumit621
 
PPTX
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
PPT
Part1
sumit621
 
PPTX
Data mining presentation.ppt
neelamoberoi1030
 
PPTX
4 Data preparation and processing
Mahmoud Alfarra
 
PPT
Data warehousing and online analytical processing
VijayasankariS
 
PPT
1.2 steps and functionalities
Rajendran
 
PPT
Data mininng trends
VijayasankariS
 
Data Mining
SHIKHA GAUTAM
 
Data mining and data warehouse lab manual updated
Yugal Kumar
 
Data mining concepts and work
Amr Abd El Latief
 
01 Introduction to Data Mining
Valerii Klymchuk
 
Datawarehousing
sumit621
 
Data mining
Daminda Herath
 
Lecture1
sumit621
 
Cssu dw dm
sumit621
 
Data Warehouse By Piyush
astronish
 
142230 633685297550892500
sumit621
 
Database
sumit621
 
Chapter 13 data warehousing
sumit621
 
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
Part1
sumit621
 
Data mining presentation.ppt
neelamoberoi1030
 
4 Data preparation and processing
Mahmoud Alfarra
 
Data warehousing and online analytical processing
VijayasankariS
 
1.2 steps and functionalities
Rajendran
 
Data mininng trends
VijayasankariS
 
Ad

Similar to Research trends in data warehousing and data mining (20)

DOCX
Concept of Classification in Data Mining.docx
vgowthami9
 
PPTX
Classification
thamizh arasi
 
PPTX
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
PPTX
3 classification
Mahmoud Alfarra
 
PPT
Unit - III Classification wjwjdbekwjwbdbekwk
mailmuzammil871
 
PPTX
Chapter4-ML.pptx slide for concept of mechanic learning
Hina636704
 
PPTX
Classification and Prediction.pptx
SandeepAgrawal84
 
PDF
Chapter 1.pdf
DrGnaneswariG
 
PPTX
Machine learning Chapter three (16).pptx
jamsibro140
 
PPTX
Artificial intyelligence and machine learning introduction.pptx
ChandrakalaV15
 
PPTX
Presentation on supervised learning
Tonmoy Bhagawati
 
PPTX
5. Machine Learning.pptx
ssuser6654de1
 
PPTX
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
PDF
Machine Learning - Deep Learning
Oluwasegun Matthew
 
PDF
classification in data mining and data warehousing.pdf
321106410027
 
PPTX
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
PDF
Introduction to machine learning
Oluwasegun Matthew
 
PDF
Supervised learning techniques and applications
Benjaminlapid1
 
PPTX
Unit 2-ML.pptx
Chitrachitrap
 
PDF
Review of Algorithms for Crime Analysis & Prediction
IRJET Journal
 
Concept of Classification in Data Mining.docx
vgowthami9
 
Classification
thamizh arasi
 
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
3 classification
Mahmoud Alfarra
 
Unit - III Classification wjwjdbekwjwbdbekwk
mailmuzammil871
 
Chapter4-ML.pptx slide for concept of mechanic learning
Hina636704
 
Classification and Prediction.pptx
SandeepAgrawal84
 
Chapter 1.pdf
DrGnaneswariG
 
Machine learning Chapter three (16).pptx
jamsibro140
 
Artificial intyelligence and machine learning introduction.pptx
ChandrakalaV15
 
Presentation on supervised learning
Tonmoy Bhagawati
 
5. Machine Learning.pptx
ssuser6654de1
 
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Machine Learning - Deep Learning
Oluwasegun Matthew
 
classification in data mining and data warehousing.pdf
321106410027
 
Big Data Analytics - Unit 3.pptx
PlacementsBCA
 
Introduction to machine learning
Oluwasegun Matthew
 
Supervised learning techniques and applications
Benjaminlapid1
 
Unit 2-ML.pptx
Chitrachitrap
 
Review of Algorithms for Crime Analysis & Prediction
IRJET Journal
 
Ad

More from Er. Nawaraj Bhandari (20)

PPTX
Data mining approaches and methods
Er. Nawaraj Bhandari
 
PPTX
Mining Association Rules in Large Database
Er. Nawaraj Bhandari
 
PPTX
Data warehouse testing
Er. Nawaraj Bhandari
 
PPTX
Data warehouse physical design
Er. Nawaraj Bhandari
 
PPTX
Data warehouse logical design
Er. Nawaraj Bhandari
 
PPTX
Chapter 3: Simplification of Boolean Function
Er. Nawaraj Bhandari
 
PPTX
Chapter 6: Sequential Logic
Er. Nawaraj Bhandari
 
PPTX
Chapter 5: Cominational Logic with MSI and LSI
Er. Nawaraj Bhandari
 
PPTX
Chapter 4: Combinational Logic
Er. Nawaraj Bhandari
 
PPTX
Chapter 2: Boolean Algebra and Logic Gates
Er. Nawaraj Bhandari
 
PPTX
Chapter 1: Binary System
Er. Nawaraj Bhandari
 
PPTX
Introduction to Electronic Commerce
Er. Nawaraj Bhandari
 
PPT
Evaluating software development
Er. Nawaraj Bhandari
 
PPT
Using macros in microsoft excel part 2
Er. Nawaraj Bhandari
 
PPT
Using macros in microsoft excel part 1
Er. Nawaraj Bhandari
 
PPTX
Using macros in microsoft access
Er. Nawaraj Bhandari
 
PPTX
Testing software development
Er. Nawaraj Bhandari
 
PPTX
Application software and business processes
Er. Nawaraj Bhandari
 
PPTX
An introduction to vba and macros
Er. Nawaraj Bhandari
 
PPTX
An introduction to end user software development
Er. Nawaraj Bhandari
 
Data mining approaches and methods
Er. Nawaraj Bhandari
 
Mining Association Rules in Large Database
Er. Nawaraj Bhandari
 
Data warehouse testing
Er. Nawaraj Bhandari
 
Data warehouse physical design
Er. Nawaraj Bhandari
 
Data warehouse logical design
Er. Nawaraj Bhandari
 
Chapter 3: Simplification of Boolean Function
Er. Nawaraj Bhandari
 
Chapter 6: Sequential Logic
Er. Nawaraj Bhandari
 
Chapter 5: Cominational Logic with MSI and LSI
Er. Nawaraj Bhandari
 
Chapter 4: Combinational Logic
Er. Nawaraj Bhandari
 
Chapter 2: Boolean Algebra and Logic Gates
Er. Nawaraj Bhandari
 
Chapter 1: Binary System
Er. Nawaraj Bhandari
 
Introduction to Electronic Commerce
Er. Nawaraj Bhandari
 
Evaluating software development
Er. Nawaraj Bhandari
 
Using macros in microsoft excel part 2
Er. Nawaraj Bhandari
 
Using macros in microsoft excel part 1
Er. Nawaraj Bhandari
 
Using macros in microsoft access
Er. Nawaraj Bhandari
 
Testing software development
Er. Nawaraj Bhandari
 
Application software and business processes
Er. Nawaraj Bhandari
 
An introduction to vba and macros
Er. Nawaraj Bhandari
 
An introduction to end user software development
Er. Nawaraj Bhandari
 

Recently uploaded (20)

PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 

Research trends in data warehousing and data mining

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Classification and Prediction Chapter 8
  • 2. Introduction There are two forms of data analysis that can be used for extracting models describing important classes or to predict future data trends. These two forms are as follows: ď‚§ Classification ď‚§ Prediction
  • 3. Introduction ď‚§ Classification models predict categorical class labels; and prediction models predict continuous valued functions. ď‚§ For example, we can build a classification model to categorize bank loan applications as either safe or risky ď‚§ Prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation.
  • 4. What is classification? ď‚§ Following are the examples of cases where the data analysis task is Classification: ď‚§ A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe. ď‚§ A marketing manager at a company needs to analyze a customer with a given profile, who will buy a new computer. ď‚§ In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.
  • 5. What is prediction? ď‚§ Following are the examples of cases where the data analysis task is Prediction: ď‚§ Suppose the marketing manager needs to predict how much a given customer will spend during a sale at his company. ď‚§ In this example we are bothered to predict a numeric value. Therefore the data analysis task is an example of numeric prediction. In this case, a model or a predictor will be constructed that predicts a continuous-valued-function or ordered value. ď‚§ Regression analysis is a statistical methodology that is most often used for numeric prediction.
  • 6. How Does Classification Works? With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps: ď‚§ Building the Classifier or Model ď‚§ Using Classifier for Classification
  • 7. Building the Classifier or Model ď‚§ This step is the learning step or the learning phase. ď‚§ In this step the classification algorithms build the classifier. ď‚§ The classifier is built from the training set made up of database tuples and their associated class labels. ď‚§ Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object or data points.
  • 9. Using Classifier for Classification ď‚§ In this step, the classifier is used for classification. ď‚§ Here the test data is used to estimate the accuracy of classification rules. ď‚§ The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
  • 10. Classification by Decision Tree Induction Decision tree induction is the learning of decision trees from class labeled training tuples. Decision tree is a flowchart-like tree structure where internal nodes (non leaf node) denotes a test on an attribute branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels and Root node is the topmost node.
  • 11. Classification by Decision Tree Induction
  • 12. Classification by Decision Tree Induction Example RID age income student credit-rating Class 1 youth high no fair ? Test on age: youth Test of student: no Reach leaf node Class NO: the customer Is Unlikely to buy a computer
  • 13. Algorithm for constructing Decision Tress Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide- and-conquer manner. • At start, all the training tuples are at the root • Tuples are partitioned recursively based on selected attributes • If all samples for a given node belong to the same class Label the class • If There are no remaining attributes for further partitioning Majority voting is employed for classifying the leaf • There are no samples left Label the class and terminate • Else Got to step 2
  • 14. K-Mean Algorithms 1. Take mean value 2. Find nearest number of mean and put in cluster. 3. Repeat one and two until we get same mean
  • 15. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3

Editor's Notes

  • #14: Example is inside hardcopy please follow hardcopy
  • #15: Example is inside hardcopy please follow hardcopy