Data Preparation Part1
Data Preparation Part1
Data Preparation
Data Preparation
• Definition
• Purpose
• Benefits
• Steps
• Challenges
Definition
• identify and fix data issues that otherwise might not be detected
Once the data is cleaned, data profiling tools will return various statistics to describe
the data set. This could include the mean, minimum/maximum value, frequency,
recurring patterns, dependencies or data quality risks.
Data Discovery and Profiling …(Contd)
• Benefits
leads to higher-quality, more credible data;
helps with more accurate predictive analytics and decision-making;
makes better sense of the relationships between different data sets
and sources;
keeps company information centralized and organized;
eliminates errors, such as missing values or outliers, that add costs to
data-driven projects;
highlights areas within a system that experience the most data quality
issues, such as data corruption or user input errors; and
produces insights surrounding risks, opportunities and trends.
Data Cleansing
• It is the process of fixing incorrect, incomplete, duplicate or
otherwise erroneous data in a data set.
• It involves identifying data errors and then changing, updating
or removing data to correct them.
• The types of issues that are commonly fixed as part of data
cleansing projects includes
Typos and invalid or missing data.
Inconsistent data.
Duplicate data.
Irrelevant data.
Data Cleansing…(contd)
• Steps
Inspection and profiling
Cleaning
Verification
Reporting
• Characteristics
accuracy
completeness
consistency
Integrity
Uniformity
Validity
Data Cleansing…(contd)
• Benefits
Improved decision-making
More effective marketing and sales
Better operational performance
Increased use of data
Reduced data costs