0% found this document useful (0 votes)

20 views

Introduction

Data Mining IOE - Chapter 1 Notes

Uploaded by

flamboyantmcclintock4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Introduction

Data Mining IOE - Chapter 1 Notes

Uploaded by

flamboyantmcclintock4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

1.

Introduction(2 Hrs)

Pukar Karki
Assistant Professor
[email protected]
Contents
1. Data Mining Origin
2. Data Mining & Data Warehousing basics
Why Data Mining?

Necessity, who is the mother of invention. – Plato

3
Why Data Mining?

4
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability
 Automated data collection tools, database systems, Web, computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”

5
Evolution of Sciences
 Before 1600, empirical science
 1600-1950s, theoretical science
 Each discipline has grown a theoretical component. Theoretical models often motivate experiments
and generalize our understanding.
 1950s-1990s, computational science
 Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical,
theoretical, and computational ecology, or physics, or linguistics.)
 Computational Science traditionally meant simulation. It grew out of our inability to find closed-form
solutions for complex mathematical models.
 1990-now, data science
 The flood of data from new scientific instruments and simulations
 The ability to economically store and manage petabytes of data online
 The Internet and computing Grid that makes all these archives universally accessible
 Scientific info. management, acquisition, organization, query, and visualization tasks scale almost
linearly with data volumes. Data mining is a major new challenge
6
Evolution of Database Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web databases
 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems 7
What is Data Mining?

8
What is Data Mining?
 Data mining (knowledge discovery from data)
 Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information harvesting,
business intelligence, etc.
 Watch out: Is everything “data mining”?
 Simple search and query processing
 (Deductive) expert systems
9
Knowledge Discovery (KDD) Process
1. Data cleaning (to remove noise and inconsistent data)

2. Data integration (where multiple data sources may be

combined)

3. Data selection (where data relevant to the analysis task

are retrieved from the database)

4. Data transformation (where data are transformed and

consolidated into forms appropriate for mining by performing
summary or aggregation operations)

5. Data mining (an essential process where intelligent

methods are applied to extract data patterns)

6. Pattern evaluation (to identify the truly interesting

patterns representing knowledge based on interestingness
measures)

7.Knowledge presentation (where visualization and

knowledge representation techniques are used to present
mined knowledge to users)
Data Mining in Business Intelligence
Increasing potential
to support
End User
business decisions Decision
Making
Data Presentation Business
Visualization Techniques Analyst

Data Mining Data

Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
11
KDD Process: A Typical View from ML and Statistics

Input Data Data Pre- Data Post-

Processing Mining Processing

Data integration Pattern discovery Pattern evaluation

Normalization Association & correlation Pattern selection
Feature selection Classification
Pattern interpretation
Dimension reduction Clustering &Outlier analysis
Pattern visualization

 This is a view from typical machine learning and statistics communities

12
Multi-Dimensional View of Data Mining
 Data to be mined
 Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,
transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media,
graphs & social and information networks
 Knowledge to be mined (or: Data mining functions)
 Characterization, discrimination, association, classification, clustering, trend/deviation, outlier
analysis, etc.
 Descriptive vs. predictive data mining
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition,
visualization, high-performance, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text
mining, Web mining, etc.
13
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web
14
Data Mining Function: (1) Generalization
 Information integration and data warehouse construction
 Data cleaning, transformation, integration, and multidimensional data
model
 Data cube technology
 Scalable methods for computing (i.e., materializing) multidimensional
aggregates
 OLAP (online analytical processing)
 Multidimensional concept description: Characterization and discrimination
 Generalize, summarize, and contrast data characteristics, e.g., dry vs.
wet region

15
Data Mining Function: (2) Association and
Correlation Analysis
 Frequent patterns (or frequent itemsets)
 What items are frequently purchased together in your Walmart?
 Association, correlation vs. causality
 A typical association rule
 Diaper  Beer [0.5%, 75%] (support, confidence)
 Are strongly associated items also strongly correlated?
 How to mine such patterns and rules efficiently in large datasets?
 How to use such patterns for classification, clustering, and other
applications?

16
Data Mining Function: (3) Classification
 Classification and label prediction
 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars based on (gas mileage)
 Predict some unknown class labels
 Typical methods
 Decision trees, naïve Bayesian classification, support vector machines, neural networks, rule-
based classification, pattern-based classification, logistic regression, …
 Typical applications:
 Credit card fraud detection, direct marketing, classifying stars, diseases, web-pages, …

17
Data Mining Function: (4) Cluster Analysis
 Unsupervised learning (i.e., Class label is unknown)
 Group data to form new categories (i.e., clusters), e.g., cluster houses
to find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing interclass
similarity
 Many methods and applications

18
Data Mining Function: (5) Outlier Analysis
 Outlier analysis
 Outlier: A data object that does not comply with the general behavior of
the data
 Noise or exception? ― One person’s garbage could be another person’s
treasure
 Methods: by product of clustering or regression analysis, …
 Useful in fraud detection, rare events analysis

19
Data Mining: Confluence of Multiple Disciplines
Why Confluence of Multiple Disciplines?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytes of data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications 21
Applications of Data Mining

Where there are data, there are data mining applications

22
Applications of Data Mining
 Web page analysis: from web page classification, clustering to PageRank & HITS
algorithms
 Collaborative analysis & recommender systems
 Basket data analysis to targeted marketing
 Biological and medical data analysis: classification, cluster analysis (microarray data
analysis), biological sequence analysis, biological network analysis
 From major dedicated data mining systems/tools (e.g., SAS, MS SQL-Server Analysis
Manager, Oracle Data Mining Tools) to invisible data mining

23
Major Issues in Data Mining (1)
 Mining Methodology
 Mining various and new kinds of knowledge
 Mining knowledge in multi-dimensional space
 Data mining: An interdisciplinary effort
 Boosting the power of discovery in a networked environment
 Handling noise, uncertainty, and incompleteness of data
 Pattern evaluation and pattern- or constraint-guided mining
 User Interaction
 Interactive mining
 Incorporation of background knowledge
 Presentation and visualization of data mining results
24
Major Issues in Data Mining (2)
 Efficiency and Scalability
 Efficiency and scalability of data mining algorithms
 Parallel, distributed, stream, and incremental mining methods
 Diversity of data types
 Handling complex types of data
 Mining dynamic, networked, and global data repositories
 Data mining and society
 Social impacts of data mining
 Privacy-preserving data mining
 Invisible data mining
25
What is a Data Warehouse?
 Defined in many different ways, but not rigorously.
 A decision support database that is maintained separately from the organization’s
operational database
 Support information processing by providing a solid platform of consolidated,
historical data for analysis.
 “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process.”—W. H.
Inmon
 Data warehousing:
 The process of constructing and using data warehouses
26
Data Warehouse—Subject-Oriented

 Organized around major subjects, such as customer, product, sales

 Focusing on the modeling and analysis of data for decision makers,
not on daily operations or transaction processing
 Provide a simple and concise view around particular subject issues by
excluding data that are not useful in the decision support process

27
Data Warehouse—Integrated
 Constructed by integrating multiple, heterogeneous data sources
 relational databases, flat files, on-line transaction records

 Data cleaning and data integration techniques are applied.

 Ensure consistency in naming conventions, encoding structures,

attribute measures, etc. among different data sources

 E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is converted.

28
Data Warehouse—Time Variant
 The time horizon for the data warehouse is significantly longer than
that of operational systems
 Operational database: current value data
 Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
 Every key structure in the data warehouse
 Contains an element of time, explicitly or implicitly
 But the key of operational data may or may not contain “time
element”

29
Data Warehouse—Nonvolatile
 A physically separate store of data transformed from the operational
environment
 Operational update of data does not occur in the data warehouse
environment
 Does not require transaction processing, recovery, and
concurrency control mechanisms
 Requires only two operations in data accessing:
 initial loading of data and access of data

30
DBMS vs. Data Warehouse
 The major task of online operational database systems is to perform
online transaction and query processing.
 These systems are called online transaction processing (OLTP)
systems.
 They cover most of the day-to-day operations of an organization such
as purchasing, inventory, manufacturing, banking, payroll, registration,
and accounting.
DBMS vs. Data Warehouse
 Data warehouse systems, on the other hand, serve users or
knowledge workers in the role of data analysis and decision making.
 Such systems can organize and present data in various formats in
order to accommodate the diverse needs of different users.
 These systems are known as online analytical processing (OLAP)
systems.
OLTP vs. OLAP

33
Data Warehousing: A Multitiered
Architecture

✔
The bottom tier is a warehouse
database server that is almost always a
relational database system.
✔
Back end tools and utilities are used to
feed data into the bottom tier from
operational databases or other external
sources.
Data Warehousing: A Multitiered
Architecture
✔
The middle tier is an OLAP server that is
typically implemented using either
- a relational OLAP (ROLAP) model (i.e., an
extended relational DBMS that maps
operations on multidimensional data to
standard relational operations); or
- a multi dimensional OLAP (MOLAP) model
(i.e., a special-purpose server that directly
implements multidimensional data and
operations).
Data Warehousing: A Multitiered
Architecture
✔
The top tier is a front-end client layer,
which contains query and reporting
tools, analysis tools, and/or data mining
tools (e.g., trend analysis, prediction,
and so on)
Metadata Repository
 Meta data is the data defining warehouse objects. It stores:
 Description of the structure of the data warehouse
 schema, view, dimensions, hierarchies, derived data definition, data mart locations and
contents
 Operational meta-data
 data lineage (history of migrated data and transformation path), currency of data (active,
archived, or purged), monitoring information (warehouse usage statistics, error reports, audit
trails)
 The algorithms used for summarization
 The mapping from operational environment to the data warehouse
 Data related to system performance
 warehouse schema, view and derived data definitions
 Business data
 business terms and definitions, ownership of data, charging policies 37
Three Data Warehouse Models
 Enterprise warehouse
 collects all of the information about subjects spanning the entire

organization
 Data Mart
 a subset of corporate-wide data that is of value to a specific groups
of users. Its scope is confined to specific, selected groups, such as
marketing data mart
 Independent vs. dependent (directly from warehouse) data mart
 Virtual warehouse
 A set of views over operational databases
 Only some of the possible summary views may be materialized 38
Data Mart
Data Mart
 a subset of corporate-wide data that is of value to a specific

groups of users.
 Its scope is confined to specific, selected groups, such as

marketing data mart.

 Independent vs. dependent (directly from warehouse) data mart.

39
Data Mart
Data Mart
 The implementation cycle of a data mart is more likely to be

measured in weeks rather than months or years.

 However, it may involve complex integration in the long run if its

design and planning were not enterprise-wide.

40
Data Mart
Data Mart
 Depending on the source of data, data marts can be categorized

as independent or dependent.
– Independent data marts are sourced from data captured from
one or more operational systems or external information providers,
or from data generated locally within a particular department or
geographic area.
– Dependent data marts are sourced directly from enterprise data
warehouses.

41
Extraction, Transformation, and Loading (ETL)
 Data extraction
 get data from multiple, heterogeneous, and external sources

 Data cleaning
 detect errors in the data and rectify them when possible

 Data transformation
 convert data from legacy or host format to warehouse format

 Load
 sort, summarize, consolidate, compute views, check integrity, and build

indices and partitions

 Refresh
 propagate the updates from the data sources to the warehouse

42
Need for Data Warehousing
 Ensure consistency
- Data warehouses are programmed to apply uniform format to all collected data,
which makes it easier for corporate decision-makers to analyze and share data
insights with their colleagues around the globe.
- Standardizing data from different sources also reduces the risk of error in
interpretation and improves overall accuracy.
Need for Data Warehousing

Make better business decisions
- Successful business leaders develop data-driven strategies and rarely make
decisions without consulting the facts.
- Data warehousing improves the speed and efficiency of accessing different
data sets and makes it easier for corporate decision-makers to derive insights
that will guide the business and marketing strategies that set them apart from
their competitors.
Need for Data Warehousing

Improve their bottom line
- Data warehouse platforms allow business leaders to quickly access their
organization's historical activities and evaluate initiatives that have been
successful or unsuccessful in the past.
- This allows executives to see where they can adjust their strategy to decrease
costs, maximize efficiency and increase sales to improve their bottom line.
Review Question
1) Explain how data mining system can be integrated with database/data
warehouse system. Explain data mining process with diagram.
2) Explain data warehouse architecture.
3) How is data warehouse different from RDBMS? Also list the similarities.
4) What is data warehouse and data mart?
5) Differentiate between OLAP and OLTP.
6) “The world is data rich and information poor.” Justify in your own words.

Budgeting For Volunteer Program
No ratings yet
Budgeting For Volunteer Program
21 pages
Continue
No ratings yet
Continue
4 pages
01 Intro
No ratings yet
01 Intro
23 pages
Chapter 1 DM
No ratings yet
Chapter 1 DM
20 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
01 Intro
No ratings yet
01 Intro
29 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
DM-Unit 1 PPT
No ratings yet
DM-Unit 1 PPT
110 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
01 Intro
No ratings yet
01 Intro
22 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
01 Intro
No ratings yet
01 Intro
40 pages
intro data mining
No ratings yet
intro data mining
51 pages
Unit 1
No ratings yet
Unit 1
95 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
01Intro
No ratings yet
01Intro
28 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
01Intro
No ratings yet
01Intro
41 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Module 1
No ratings yet
Module 1
40 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Module 3
No ratings yet
Module 3
187 pages
1 - Introduction To DM
No ratings yet
1 - Introduction To DM
59 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
01Intro1
No ratings yet
01Intro1
33 pages
DWDM-LS1-Fall-24-25
No ratings yet
DWDM-LS1-Fall-24-25
42 pages
Unit 3.1
No ratings yet
Unit 3.1
23 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
23 pages
Unit 1 A
No ratings yet
Unit 1 A
39 pages
Datamining Chapter 1 Introduction
No ratings yet
Datamining Chapter 1 Introduction
41 pages
Chap1-Introduction
No ratings yet
Chap1-Introduction
21 pages
01Intro (1)
No ratings yet
01Intro (1)
40 pages
data mining 1
No ratings yet
data mining 1
39 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Internal
No ratings yet
Internal
267 pages
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
No ratings yet
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
43 pages
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
No ratings yet
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
43 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Introduction
No ratings yet
Introduction
27 pages
Slide 03 Chapter1 Introduction
No ratings yet
Slide 03 Chapter1 Introduction
36 pages
01Intro.pptx
No ratings yet
01Intro.pptx
40 pages
Unit-1
No ratings yet
Unit-1
148 pages
dm 1
No ratings yet
dm 1
47 pages
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Shubham Jha Python Practical
No ratings yet
Shubham Jha Python Practical
27 pages
I03 Newpdm Not en Fuel Tank
No ratings yet
I03 Newpdm Not en Fuel Tank
18 pages
Ahmed Abdelrahim Jadelrab: Ersonal ATA
No ratings yet
Ahmed Abdelrahim Jadelrab: Ersonal ATA
2 pages
Cyber Security Procedure 2020.12
100% (1)
Cyber Security Procedure 2020.12
16 pages
Digimic: Central Control Unit Dcen Mini
No ratings yet
Digimic: Central Control Unit Dcen Mini
1 page
DP-100 Overview
No ratings yet
DP-100 Overview
13 pages
(Java Developers Journal 2004-8 Vol. 9 Iss. 8) - (2004)
No ratings yet
(Java Developers Journal 2004-8 Vol. 9 Iss. 8) - (2004)
68 pages
DepthmapManualForDummies - v20 - short - 사용법
No ratings yet
DepthmapManualForDummies - v20 - short - 사용법
9 pages
8 Java AWT ButtonNew
No ratings yet
8 Java AWT ButtonNew
42 pages
X32 Manual Español
100% (1)
X32 Manual Español
11 pages
BLACK BOOK GYM Sagar
No ratings yet
BLACK BOOK GYM Sagar
96 pages
Thesis Structure-Hospital Management System
No ratings yet
Thesis Structure-Hospital Management System
2 pages
Zero Nights 2019
No ratings yet
Zero Nights 2019
26 pages
Grade 5 TOS
No ratings yet
Grade 5 TOS
1 page
R 2 Frida
No ratings yet
R 2 Frida
28 pages
Discover The Power of Smart Lighting: Lutron App Smart Bridge Dimmer Remote Fan Remote Sensor
No ratings yet
Discover The Power of Smart Lighting: Lutron App Smart Bridge Dimmer Remote Fan Remote Sensor
15 pages
Ws1001 Using
No ratings yet
Ws1001 Using
206 pages
Icon Controller Id
No ratings yet
Icon Controller Id
4 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Sample Thesis For Web Development
100% (2)
Sample Thesis For Web Development
4 pages
Subject: PRF192-PFC Workshop 02: Objectives
0% (1)
Subject: PRF192-PFC Workshop 02: Objectives
5 pages
Pratdnya Patra
No ratings yet
Pratdnya Patra
7 pages
10. CP Tutorial solutions 3
No ratings yet
10. CP Tutorial solutions 3
19 pages
CS609 UPDATE MIDTERM SOLVED MCQS by JUNAID
No ratings yet
CS609 UPDATE MIDTERM SOLVED MCQS by JUNAID
41 pages
ML Assignment No 1
No ratings yet
ML Assignment No 1
2 pages
H99-SA-E-0053916 - 00 - SCF - Wireless FAT
No ratings yet
H99-SA-E-0053916 - 00 - SCF - Wireless FAT
37 pages
Forth-Programmers-Handbook-3rd - Ed - Dokumen - Pub
100% (1)
Forth-Programmers-Handbook-3rd - Ed - Dokumen - Pub
274 pages
Pengolahan Bahan Alam Berbasis Karbohidrat
No ratings yet
Pengolahan Bahan Alam Berbasis Karbohidrat
40 pages

Introduction

Uploaded by

Introduction

Uploaded by

1.

Necessity, who is the mother of invention. – Plato

2. Data integration (where multiple data sources may be

3. Data selection (where data relevant to the analysis task

4. Data transformation (where data are transformed and

5. Data mining (an essential process where intelligent

6. Pattern evaluation (to identify the truly interesting

7.Knowledge presentation (where visualization and

Data Mining Data

Input Data Data Pre- Data Post-

Data integration Pattern discovery Pattern evaluation

 This is a view from typical machine learning and statistics communities

Where there are data, there are data mining applications

 Organized around major subjects, such as customer, product, sales

 Data cleaning and data integration techniques are applied.

attribute measures, etc. among different data sources

marketing data mart.

measured in weeks rather than months or years.

design and planning were not enterprise-wide.

indices and partitions

You might also like