DM Introduction
DM Introduction
By
R. Siva Narayana
RGUKT Nuzvid
Data Mining
• Data mining is the process of discovering
interesting patterns and knowledge from large
amounts of data.
– The data sources can include databases, data
warehouses, the Web, other information
repositories, or data that are streamed into the
system dynamically.
Data
Mining
Evolution
KDD
What kinds of data can be mined?
• Database data
• Data warehouses
• Transactional data
• Other kinds of data
– Time-related or sequence data(stock-exchange)
– Data-streams(Video surveillance and sensor data)
– Spatial data(maps)
– Hypertext and multimedia
Getting to Know Your Data
• Real-world data are typically noisy, enormous in volume, and
may originate from heterogenous sources.
• Knowledge about your data is useful for Data Preprocessing.
– What are the types of attributes?
– What kind of values does each attribute have?
– Which attributes are discrete and which are continuous valued?
– What do the data look like? How are the values distributed?
– What are the ways we visualize the data to get better sense?
– Can we spot any outliers?
– Can we measure the similarity of some data objects with respect
to others?
Data Objects and Attribute Types
• Datasets are made up of data objects
• Data objects are typically describes by attributes
• Attribute is a field, representing a characteristic or feature
of data object
• Observed values for a given attribute are known as
observations
• A set of attributes used to describe a given object is called
an attribute vector
• Standard Deviation:
Graphic Displays
• Graphs are helpful to visual description of
data, which is useful in data preprocessing by
identifying the noise and outliers
– Histograms
– Quantile Plot
– Quantile-Quantile plot
– Scatter Plots
ra m s
to g
His