Introduction to Data Mining
Index
• Introduction
• What is Data Mining?
• Domains that benefits from Data Mining
• Data Mining Techniques
• Data Mining Tools
2
Introduction
• There are a huge amount of data available on the information
industry.
• This data may contain a lot of unimportant information
• We have to analyze this data and extract useful information from it
3
What is Data Mining
What is Data Mining?
• “Data Mining” is a defined as the procedure of extracting
informations from huge sets of data.
• In other words we can say that “Data Mining” is “Mining Knowledge
from Data”
5
What is Data Mining?
• Data mining is not an independent, it involves other processes such as:
• Data Cleaning
• Data Transformation
• Data Mining
• Pattern Evaluation
• Data Presentation
What is Data Mining?
• Data mining principles have been around for many years, but, with the advent
of big data, it is even more prevalent.
Domains that benefits from Data
Mining
Domains that benefits from Data Mining
• Data mining is highly useful in the following domains
• Market Analysis and Management
• Corporate Analysis & Risk Management
• Fraud Detection
Market Analysis and
Management
Market Analysis and Management
• Customer Profiling
Data mining helps determine what kind of people buy what kind of products.
• Identifying Customer Requirements
Data mining helps in identifying the best products for different customers. It
uses prediction to find the factors that may attract new customers.
• Cross Market Analysis
Data mining performs association/correlations between product sales.
Market Analysis and Management
• Target Marketing
Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.
• Determining Customer purchasing pattern
Data mining helps in determining customer purchasing pattern.
• Providing Summary Information
Data mining provides us various multidimensional summary reports.
Corporate Analysis & Risk
Management
Corporate Analysis & Risk Management
• Finance Planning and Asset Evaluation
It involves cash flow analysis and prediction, contingent claim analysis to
evaluate assets.
• Resource Planning
It involves summarizing and comparing the resources and spending.
• Competition
It involves monitoring competitors and market directions.
Fraud Detection
Fraud Detection
• Data mining is also used in the fields of credit card services and other
fields to detect frauds.
• It also analyzes the patterns that deviate from expected norms.
Data Mining Techniques
Data Mining Techniques
• Several core techniques that are used in data mining describe the
type of mining operation.
• Association
• Classification
• Clustering
• Prediction
• Sequential patterns
• Decision trees
Association
• It is making a simple correlation between two or more items,
often of the same type to identify patterns
“For example, when tracking people's buying habits, you might
identify that a customer always buys cream when they buy
strawberries, and therefore suggest that the next time that they buy
strawberries they might also want to buy cream.”
Classification
• You can use classification to build up an idea of the type of
objects by describing multiple attributes to identify a
particular class.
For example, You can classify customer by age and social group.
Clustering
• By examining one or more attributes or classes, you can
group individual pieces of data together to form a structure
opinion.
• At a simple level, clustering is using one or more attributes as
your basis for identifying a cluster of correlating results.
Prediction
• Prediction is a wide topic and runs from predicting the failure
of components or machinery, to identifying fraud and even
the prediction of company profits
• Used in combination with the other data mining techniques,
prediction involves analyzing trends, classification, pattern
matching, and relation.
• By analyzing past events or instances, you can make a
prediction about an event.
Sequential patterns
• Oftern used over longer-term data, sequential patterns are a
useful method for identifying trends, or regular occurrences
of similar events.
Decision trees
• Related to most of the other techniques (primarily
classification and prediction), the decision tree can be used
either as a part of the selection criteria, or to support the use
and selection of specific data within the overall structure.
Decision trees
• Within the decision tree, you start with a simple question
that has two (or sometimes more) answers. Each answer
leads to a further question to help classify or identify the
data so that it can be categorized, or so that a prediction can
be made based on each answer.
Data Mining Tools
Open Source Data Mining Tools
Rapid Miner (Known as YALE)
• Written in the Java Programming language, this tool offers advanced
analytics through template-based frameworks.
• In addition to data mining, RapidMiner also provides functionality
like data preprocessing and visualization, predictive analytics and
statistical modeling, evaluation, and deployment
WEKA
• The original non-Java version of WEKA primarily was developed for
analyzing data from the agricultural domain.
• With the Java-based version, the tool is very sophisticated and used in
many different applications including visualization and algorithms for
data analysis and predictive modeling
R-Programming
• It’s a free software programming language and software environment
for statistical computing and graphics.
• The R language is widely used among data miners for developing
statistical software and data analysis
Commercial Data Mining Tools
SQL Server Data Tools
• It is used to develop data analysis and Business Intelligence solutions
utilizing the Microsoft SQL Server Analysis Services, Reporting Services and
Integration Services
• It is based on the Microsoft Visual Studio development environment, but
customized with the SQL Server services-specific extensions and project
types, including tools, controls and projects for reports, ETL dataflows,
OLAP cubes and data mining structure.
IBM Cognos Business Intelligence
• IBM Cognos is a web-based business intelligence suite that integrates
with the company's data mining application, SPSS, for easy
visualization of the data mining process. Self-service available offline
and through the mobile app.
Dundas BI
• Dundas BI, from Dundas Data Visualization, is a browser-based
business intelligence and data visualization platform that includes
integrated dashboards, reporting tools, and data analytics.
• It provides end users the ability to create interactive, customizable
dashboards, build their own reports, run ad-hoc queries and analyze
and drill-down into their data and performance metrics.
Thank You for listening

Introduction to Data mining

  • 1.
  • 2.
    Index • Introduction • Whatis Data Mining? • Domains that benefits from Data Mining • Data Mining Techniques • Data Mining Tools 2
  • 3.
    Introduction • There area huge amount of data available on the information industry. • This data may contain a lot of unimportant information • We have to analyze this data and extract useful information from it 3
  • 4.
  • 5.
    What is DataMining? • “Data Mining” is a defined as the procedure of extracting informations from huge sets of data. • In other words we can say that “Data Mining” is “Mining Knowledge from Data” 5
  • 6.
    What is DataMining? • Data mining is not an independent, it involves other processes such as: • Data Cleaning • Data Transformation • Data Mining • Pattern Evaluation • Data Presentation
  • 7.
    What is DataMining? • Data mining principles have been around for many years, but, with the advent of big data, it is even more prevalent.
  • 8.
    Domains that benefitsfrom Data Mining
  • 9.
    Domains that benefitsfrom Data Mining • Data mining is highly useful in the following domains • Market Analysis and Management • Corporate Analysis & Risk Management • Fraud Detection
  • 10.
  • 11.
    Market Analysis andManagement • Customer Profiling Data mining helps determine what kind of people buy what kind of products. • Identifying Customer Requirements Data mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. • Cross Market Analysis Data mining performs association/correlations between product sales.
  • 12.
    Market Analysis andManagement • Target Marketing Data mining helps to find clusters of model customers who share the same characteristics such as interests, spending habits, income, etc. • Determining Customer purchasing pattern Data mining helps in determining customer purchasing pattern. • Providing Summary Information Data mining provides us various multidimensional summary reports.
  • 13.
    Corporate Analysis &Risk Management
  • 14.
    Corporate Analysis &Risk Management • Finance Planning and Asset Evaluation It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. • Resource Planning It involves summarizing and comparing the resources and spending. • Competition It involves monitoring competitors and market directions.
  • 15.
  • 16.
    Fraud Detection • Datamining is also used in the fields of credit card services and other fields to detect frauds. • It also analyzes the patterns that deviate from expected norms.
  • 17.
  • 18.
    Data Mining Techniques •Several core techniques that are used in data mining describe the type of mining operation. • Association • Classification • Clustering • Prediction • Sequential patterns • Decision trees
  • 19.
    Association • It ismaking a simple correlation between two or more items, often of the same type to identify patterns “For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.”
  • 20.
    Classification • You canuse classification to build up an idea of the type of objects by describing multiple attributes to identify a particular class. For example, You can classify customer by age and social group.
  • 21.
    Clustering • By examiningone or more attributes or classes, you can group individual pieces of data together to form a structure opinion. • At a simple level, clustering is using one or more attributes as your basis for identifying a cluster of correlating results.
  • 22.
    Prediction • Prediction isa wide topic and runs from predicting the failure of components or machinery, to identifying fraud and even the prediction of company profits • Used in combination with the other data mining techniques, prediction involves analyzing trends, classification, pattern matching, and relation. • By analyzing past events or instances, you can make a prediction about an event.
  • 23.
    Sequential patterns • Ofternused over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences of similar events.
  • 24.
    Decision trees • Relatedto most of the other techniques (primarily classification and prediction), the decision tree can be used either as a part of the selection criteria, or to support the use and selection of specific data within the overall structure.
  • 25.
    Decision trees • Withinthe decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer.
  • 26.
  • 27.
    Open Source DataMining Tools
  • 28.
    Rapid Miner (Knownas YALE) • Written in the Java Programming language, this tool offers advanced analytics through template-based frameworks. • In addition to data mining, RapidMiner also provides functionality like data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment
  • 29.
    WEKA • The originalnon-Java version of WEKA primarily was developed for analyzing data from the agricultural domain. • With the Java-based version, the tool is very sophisticated and used in many different applications including visualization and algorithms for data analysis and predictive modeling
  • 30.
    R-Programming • It’s afree software programming language and software environment for statistical computing and graphics. • The R language is widely used among data miners for developing statistical software and data analysis
  • 31.
  • 32.
    SQL Server DataTools • It is used to develop data analysis and Business Intelligence solutions utilizing the Microsoft SQL Server Analysis Services, Reporting Services and Integration Services • It is based on the Microsoft Visual Studio development environment, but customized with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports, ETL dataflows, OLAP cubes and data mining structure.
  • 33.
    IBM Cognos BusinessIntelligence • IBM Cognos is a web-based business intelligence suite that integrates with the company's data mining application, SPSS, for easy visualization of the data mining process. Self-service available offline and through the mobile app.
  • 34.
    Dundas BI • DundasBI, from Dundas Data Visualization, is a browser-based business intelligence and data visualization platform that includes integrated dashboards, reporting tools, and data analytics. • It provides end users the ability to create interactive, customizable dashboards, build their own reports, run ad-hoc queries and analyze and drill-down into their data and performance metrics.
  • 35.
    Thank You forlistening

Editor's Notes

  • #3  Index ===== Definition Why Parameterized Queries? Protection against SQL Injection Attack Performance Implications Single and double quotes Problems Disadvantages Queries are embedded into application code Parameterized queries VS Stored Procedures Parameterized queries using Vb.net ===============================
  • #4 Definition Reference: * Author: Mateusz Zoltak * URL: http://cran.r-project.org/web/packages/RODBCext/vignettes/Parameterized_SQL_queries.html * Date Posted: 2014-07-04 * Date Retrieved: 2014-09-11