Lecture 10 - Data Mining in Practice
Lecture 10 - Data Mining in Practice
https://www.sv-europe.com/crisp-dm-methodology/
Business Understanding
• Select data
• Rationale for inclusion/exclusion of data
• Clean data
• Replace missing values, normalization, corrections, etc
• Construct data
• Deriving new values
• Integrate data
• Merge data (if necessary)
• Format data
• Preparing data to be read by data mining techniques
Modeling
• Data mining
• Select modeling technique (algorithms)
• Generate test design –for example in classification, dataset divided into
training and test set
• Build model
• Assess model
• Revised parameters
Evaluation
• Evaluate results
• Check how the model performed
• Must align with business objectives
• Approved models
• Review the models before endorsed by experts
• Must find support
• Determine next steps
• Deployment (indicating successful deployment of project)
• Or review business objectives (go through another round of data mining
or start a completely new project with different business objectives)
Deployment
https://www.sv-europe.com/crisp-dm-methodology/
Building A Loan Approval
Model
Approved or rejected
Poor
performance
Perform like
human?
Data mining applications
1 2
Text mining Web mining
Text Mining
Text Mining
c
ial
Recognition
s
tic
Int
tis
Text
ellig
Sta
en
ce
DATA
Mining
Machine
MINING Learning
Mathematical
Modeling Databases
• Association analysis
• Each document is a “transaction” and list of keywords is the “list
of items”.
• A collection of documents will form the “transaction database”.
• Example of association rules: {data, mining} → {clustering, Naïve,
Bayes}
• Problem with this kind of keyword association discovery is that
many association patterns maybe discovered. Thus, some
associations maybe shallow in meaning and indicate only co-
occurences.
• Frequently occurring keywords may serve the purpose of phrase
extraction, for eg: {human} → {computer, interaction}
Text Mining - Classification
• http://books.google.com/ngrams
Web Mining
Web Mining