0% found this document useful (0 votes)
37 views3 pages

KDD

The document discusses knowledge discovery from data (KDD) which involves extracting useful patterns from large datasets through cleaning, integration, selection, transformation, mining, evaluation and representation of data. KDD aims to find useful knowledge while data mining focuses on patterns. KDD is an iterative process requiring domain expertise at various steps.

Uploaded by

rajmishra183373
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views3 pages

KDD

The document discusses knowledge discovery from data (KDD) which involves extracting useful patterns from large datasets through cleaning, integration, selection, transformation, mining, evaluation and representation of data. KDD aims to find useful knowledge while data mining focuses on patterns. KDD is an iterative process requiring domain expertise at various steps.

Uploaded by

rajmishra183373
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

KNOWLEDGE DISCOVERY FROM DATA (KDD)

“Data Mining” can be referred to as knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging. Data Mining also known as Knowledge Discovery in
Databases, refers to the nontrivial extraction of implicit, previously unknown and potentially useful
information from data stored in databases.
The need of data mining is to extract useful information from large datasets and use it to make
predictions or better decision-making. Nowadays, data mining is used in almost all places where a
large amount of data is stored and processed.
For examples: Banking sector, Market Basket Analysis, Network Intrusion Detection.
KDD Process
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful,
previously unknown, and potentially valuable information from large datasets. The KDD process is an
iterative process and it requires multiple iterations of the above steps to extract accurate knowledge
from the data.
The following steps are included in KDD process:
1. Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection.
Cleaning in case of Missing values.
Cleaning noisy data, where noise is a random or variance error.
Cleaning with Data discrepancy detection and Data transformation tools.
2. Data Integration
Data integration is defined as heterogeneous data from multiple sources combined in a common
source(DataWarehouse). Data integration using Data Migration tools, Data Synchronization tools and
ETL(Extract-Load-Transformation) process.
3. Data Selection
Data selection is defined as the process where data relevant to the analysis is decided and retrieved
from the data collection. For this we can use Neural network, Decision Trees, Naive bayes,
Clustering, and Regression methods.
4. Data Transformation
Data Transformation is defined as the process of transforming data into appropriate form required by
mining procedure. Data Transformation is a two step process:
Data Mapping: Assigning elements from source base to destination to capture transformations.
Code generation: Creation of the actual transformation program.
5. Data Mining
Data mining is defined as techniques that are applied to extract patterns potentially useful. It
transforms task relevant data into patterns, and decides purpose of model using classification or
characterization.
6. Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge based
on given measures. It find interestingness score of each pattern, and uses summarization and
Visualization to make data understandable by user.
7. Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used to make decisions.
Advantages of KDD
1. Improves decision-making: KDD provides valuable insights and knowledge that can help
organizations make better decisions.
2. Increased efficiency: KDD automates repetitive and time-consuming tasks and makes the
data ready for analysis, which saves time and money.
3. Better customer service: KDD helps organizations gain a better understanding of their
customers’ needs and preferences, which can help them provide better customer service.
4. Fraud detection: KDD can be used to detect fraudulent activities by identifying patterns and
anomalies in the data that may indicate fraud.
5. Predictive modeling: KDD can be used to build predictive models that can forecast future
trends and patterns.
Disadvantages of KDD
1. Privacy concerns: KDD can raise privacy concerns as it involves collecting and analyzing
large amounts of data, which can include sensitive information about individuals.

2. Complexity: KDD can be a complex process that requires specialized skills and knowledge
to implement and interpret the results.

3. Unintended consequences: KDD can lead to unintended consequences, such as bias or


discrimination, if the data or models are not properly understood or used.

4. Data Quality: KDD process heavily depends on the quality of data, if data is not accurate or
consistent, the results can be misleading

5. High cost: KDD can be an expensive process, requiring significant investments in hardware,
software, and personnel.

6. Overfitting: KDD process can lead to overfitting, which is a common problem in machine
learning where a model learns the detail and noise in the training data to the extent that it
negatively impacts the performance of the model on new unseen data.
Difference between KDD and Data Mining

Parameter KDD Data Mining

Definition KDD refers to a process of Data Mining refers to a process


identifying valid, novel, potentially of extracting useful and valuable
useful, and ultimately information or patterns from large
understandable patterns and data sets.
relationships in data.
Objective To find useful knowledge from data. To extract useful information
from data.
Techniques Data cleaning, data integration, data Association rules, classification,
Used selection, data transformation, data clustering, regression, decision
mining, pattern evaluation, and trees, neural networks, and
knowledge representation and dimensionality reduction.
visualization.
Output Structured information, such as rules Patterns, associations, or insights
and models, that can be used to that can be used to improve
make decisions or predictions. decision-making or
understanding.
Focus Focus is on the discovery of useful Data mining focus is on the
knowledge, rather than simply discovery of patterns or
finding patterns in data. relationships in data.
Role of Domain expertise is important in Domain expertise is less critical in
domain KDD, as it helps in defining the data mining, as the algorithms are
expertise goals of the process, choosing designed to identify patterns
appropriate data, and interpreting the without relying on prior
results. knowledge.

You might also like