0% found this document useful (0 votes)
9 views28 pages

Data Mining_Lecture1

The document outlines a Data Mining course taught by Ertan Karakurt, focusing on practical applications and case studies in the field. It covers various topics including data collection, analysis techniques, clustering, classification, and prediction methods, alongside assessments and recommended readings. The course aims to bridge the gap between theory and real-world applications in data mining.

Uploaded by

eray.ckr.25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views28 pages

Data Mining_Lecture1

The document outlines a Data Mining course taught by Ertan Karakurt, focusing on practical applications and case studies in the field. It covers various topics including data collection, analysis techniques, clustering, classification, and prediction methods, alongside assessments and recommended readings. The course aims to bridge the gap between theory and real-world applications in data mining.

Uploaded by

eray.ckr.25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Mining

Lecture 1
Instructor Info
• Name: Ertan Karakurt
Contact : [email protected]

• 10+ years experience on Data Mining and


Intelligent Applications Development
– General Purpose Data Mart Development for
Financial Modeling
– Behavioral Clustering of Retail Customers in Banking
Sector
– Propensity Modeling for Cross Selling
– Attrition/Retention Modeling
– Modeling Algorithms Library Development for
...
Defense
Instructor Info
• Ertan Karakurt
• founder of İzmir based Akıllı Sistemler
– fuzzy/exact searching/matching engine for
Databases:
• search space analyzing, learning
• algorithm space analyzing, learning
• parallelization architecture
Course Objective
• stimulate university and industry cooperation
• create an opportunity to work with real life
applications and problems in Data Mining
– case studies on data dictionaries
– case studies on physically built data mining models
• adjusting/utilizing the balance point between
theory and application in Data Mining
Course Syllabus
• Course topics:
• Introduction (Week1-Week2)
– What is Data Mining?
– Data Collection and Data Management Fundamentals
– The Essentials of Learning
– The Emerging Needs for Different Data Analysis Perspectives
• Data Management and Data Collection Techniques for
Data Mining Applications (Week3-Week4)
– Data Warehouses: Gathering Raw Data from Relational
Databases and transforming into Information.
– Information Extraction and Data Processing Techniques
– Data Marts: The need for building highly specialized data
storages for data mining applications
Course Syllabus
• Case Study 1: Working and experiencing on the
properties of The Retail Banking Data Mart
• Data Analysis Techniques (Week 5)
– Statistical Background
– Trends/ Outliers/Normalizations
– Principal Component Analysis
– Discretization Techniques
• Case Study 2: Working and experiencing on the
properties of discretization infrastructure of The Retail
Banking Data Mart
Lecture Talk: In-class discussion (OPTIONAL)
Course Syllabus
• Clustering Techniques (Week 6)
– K-Means Clustering
– Condorcet Clustering
– Other Clustering Techniques
• Case Study 3: Working and experiencing on the
properties of the clustering infrastructure for The
Retail Banking
• Lecture Talk: In-class Discussion (OPTIONAL)
Course Syllabus
• Classification Techniques (Week 7- Week 8-
Week 9)
– Inductive Learning
– Decision Tree Learning
– Association Rules
– Regression
– Probabilistic Reasoning
– Bayesian Learning
• Case Study 4: Working and experiencing on the
properties of the classification infrastructure of
Propensity Score Card System for The Retail
Banking
Course Syllabus
• Prediction Techniques (Week 10- Week 11)
– Neural Networks
– Radial Basis Networks
– Reinforcement Learning
• Case Study 5: Working and experiencing on the properties of the prediction
infrastructure of Propensity Score Card System for The Retail Banking
• Other Classification and Prediction Techniques (Week 12- Week 13)
– Text Mining and Web Mining
– Explanation Based Learning
– Rule Based Learning
– Genetic Algorithms
– Recurrent Networks
• Case Study 6: Working and experiencing on the properties of Genetic
Algorithms infrastructure for Neural Network Topology Estimation
(OPTIONAL)
Course Syllabus
• Assesment:
– One midterm examination (%40)
– One final examination (%60)
Course Syllabus
• Text Book:
– Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques, 2nd ed., Morgan Kaufmann, 2006.
• Supplementary Books:
– Hastie, R. Tibshirani, and J. Friedman, The Elements of
Statistical Learning: Data Mining, Inference, and Prediction,
Springer-Verlag, 2001
– P.-N.Tan, M. Steinbach, and V. Kumar, Introduction to Data
Mining, Addison-Wesley, 2006. ISBN: 0-321-32136-7
– Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997.
– C. M. Bishop, Pattern Recognition and Machine Learning,
Springer 2007
– R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification,
2ed., Wiley-Inter-science, 2001.
Week1- What Is Data Mining?

"Drowning in Data yet Starving for Knowledge"


???

"Computers have promised us a fountain


of wisdom but delivered a flood of data"
William J. Frawley, Gregory Piatetsky-Shapiro, and
Christopher J. Matheus
Week1-What Is Data Mining?

• Data flood
– Information society produces vast amounts of data

• Data are generated by:


– Bank, telecom, other business transactions ...
– Scientific data: astronomy, biology, etc
– Web, text, image, and e-commerce
Week1-What Is Data Mining?

– AT&T handles billions of calls per day


» As of 2003, according to Winter Corp. Survey,
» AT&T has a 26 TB decision-support database.
– Web
» 1998: 26 million pages
» 2003: Google searches 4+ billion pages, many
hundreds TB
» 2005: Google searches 8+ billion pages
» 2008: 1+ trillion (1,000,000,000,000) pages.
Week1-What Is Data Mining?

– UC Berkeley 2003 estimate:


» 5 exabytes (5 million terabytes) of new data was
created in 2002.
» Twice as much information was created in 2002 as
in1999 (growth rate: about 30% a year)
• Other growth rate estimates are even higher
• Very few data will ever be looked at by a human
• Tools are needed to make sense and use of data
Week1-What Is Data Mining?
• Data:
Data
– raw (Operation)
– atomic

• Information:
Information
– processed (Analytic)
– re-organized Data
– grouped

• Knowledge Knowledge
– patterns, models, findings ‘behind’ Information

• Wisdom Wisdom
– perfect orchestration of Knowledge

“Where is the wisdom we have lost in knowledge?


Where is the knowledge we have lost in information?”
T. S. Eliot
Week1-What Is Data Mining?

• Hypothesis:
current data bases contain a lot of potentially
important knowledge that can be used for wise-
decisionining
• Mission of DM:
find it !!!
Week1-What Is Data Mining?

• Data Mining (Alternative Name: Knowledge


Discovery in Databases KDD) definitions:
– mining knowledge from data

– process of extracting interesting (non-trivial, implicit,


previously unknown and potentially useful) knowledge
or patterns from data in large databases.

– discover knowledge that characterizes general


properties of data

– discover patterns on the previous and current data in


order to make predictions on future data
Week1-What Is Not Data Mining?

"Torturing data until it confesses ... and if you torture it


enough, it will confess to anything"
Jeff Jonas, IBM
"An Unethical Econometric practice of massaging and
manipulating the data to obtain the desired results"
W.S. Brown “Introducing Econometrics”
"A buzz word for what used to be known as DBMS
reports"
An Anonymous Data Mining Skeptic
Week1-What Is Data Mining?
Week1-What Is Data Mining?
• Data Mining -an interdisciplinary field
– Databases
– Statistics
– High Performance Computing
– Machine Learning
– Visualization
– Mathematics
Week1-What Is Data Mining?
• Data Mining -an interdisciplinary field
– Large Data sets in Data Mining
– Efficiency of Algorithms is important
– Scalability of Algorithms is important
– Real World Data
– Lots of Missing Values
– Pre-existing data - not synthetic
– Data not static - prone to updates
– Domain Knowledge in the form of integrity
constraints available.
– Exploratory data analysis
Week1-Data Mining Application
Examples
• Credit Assessment
• Stock Market Prediction
• Fault Diagnosis in Production Systems
• Medical Discovery
• Fraud Detection
• Hazard Forecasting
• Buying Trends Analysis
• Organizational Restructuring
• Target Mailing
• ---
Week1-Data Mining Application
Examples

• Can I develop a general characterization/profile


of different investor types? (characterization)
• What characteristics distinguish between Online
and Broker investors? (classification)
• Can I develop a model which will predict the
average trades/month for a new investor?
(regression)
Week1-Data Mining Application
Examples

• the natural question is to predict the


Diagnosis from the symptoms (Medical
Diagnosis Prediction)
Week1-Data Mining Application
Examples
• Assessing Credit Risk
– Situation: Person applies for a loan
– Task: Should a bank approve the loan?
• Need to predict the credit risk of the
person people with bad credit are not likely
to repay.
Week1-Data Mining Application
Examples
• A person buys a book (product) at amazon.com.
• Task: Recommend other books (products) this
• person is likely to buy
• Amazon does clustering based on books bought:
• customers who bought “Advances in Knowledge
• Discovery and Data Mining”, also bought “Data
• Mining: Practical Machine Learning Tools and
• Techniques with Java Implementations”
• Recommendation program is quite successful
Week 1-End
• read
– Course Text Book Chapter 1

You might also like