0% found this document useful (0 votes)
12 views12 pages

Data Preparation Part1

Uploaded by

kannansneha288
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

Data Preparation Part1

Uploaded by

kannansneha288
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Topic

Data Preparation
Data Preparation
• Definition
• Purpose
• Benefits
• Steps
• Challenges
Definition

• Data preparation is the process of gathering, combining,

structuring and organizing data so it can be used in business

intelligence, analytics and data visualization applications.


Purpose

• Raw data being prepared for processing and analysis is


accurate and consistent so the results of BI and analytics
applications will be valid.

• Finding relevant data to ensure that analytics applications


deliver meaningful information and actionable insights for
business decision-making.
Benefits
• ensure the data used in analytics applications produces reliable results

• identify and fix data issues that otherwise might not be detected

• enable more informed decision-making by business executives and


operational workers

• reduce data management and analytics costs

• avoid duplication of effort in preparing data for use in multiple applications

• get a higher ROI from BI and analytics initiatives.


Steps
• Data discovery and profiling.
• Data cleansing.
• Data structuring
• Data transformation and enrichment.
• Data validation and publishing.
Data Discovery and Profiling

• Data profiling refers to the process of examining, analyzing, reviewing


and summarizing data sets to gain insight into the quality of data.

• Data quality is a measure of the condition of data based on factors such


as its accuracy, completeness, consistency, timeliness and accessibility.

• It also involves a review of source data to understand the data's


structure, content and interrelationships.
Data Discovery and Profiling …(Contd)
• Types
 Structure discovery
 Content discovery
 Relationship discovery
• Steps
 gathering one or multiple data sources and the associated metadata for analysis.

 The data is then cleaned to unify structure, eliminate duplications, identify

interrelationships and find anomalies.

 Once the data is cleaned, data profiling tools will return various statistics to describe

the data set. This could include the mean, minimum/maximum value, frequency,
recurring patterns, dependencies or data quality risks.
Data Discovery and Profiling …(Contd)
• Benefits
leads to higher-quality, more credible data;
helps with more accurate predictive analytics and decision-making;
makes better sense of the relationships between different data sets
and sources;
keeps company information centralized and organized;
eliminates errors, such as missing values or outliers, that add costs to
data-driven projects;
highlights areas within a system that experience the most data quality
issues, such as data corruption or user input errors; and
produces insights surrounding risks, opportunities and trends.
Data Cleansing
• It is the process of fixing incorrect, incomplete, duplicate or
otherwise erroneous data in a data set.
• It involves identifying data errors and then changing, updating
or removing data to correct them.
• The types of issues that are commonly fixed as part of data
cleansing projects includes
Typos and invalid or missing data.
Inconsistent data.
Duplicate data.
Irrelevant data.
Data Cleansing…(contd)
• Steps
Inspection and profiling
Cleaning
Verification
Reporting
• Characteristics
accuracy
completeness
consistency
Integrity
Uniformity
Validity
Data Cleansing…(contd)
• Benefits
Improved decision-making
More effective marketing and sales
Better operational performance
Increased use of data
Reduced data costs

You might also like