DSA2

Data cleaning is the process of correcting or removing inaccurate, corrupted, or incomplete data to ensure reliable outcomes. It involves steps like importing data, creating backups, and using Excel functions to clean and standardize data formats. Data preprocessing enhances data quality by eliminating errors, handling missing values, and removing duplicates, but it may not be suitable for large datasets and often requires manual execution.

Uploaded by

davenguting20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

DSA2

Uploaded by

davenguting20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

What is data cleaning?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted,
duplicate, or incomplete data within a dataset. When combining multiple data sources, there are
many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and
algorithms are unreliable, even though they may look correct. There is no one absolute way to
prescribe the exact steps in the data cleaning process because the processes will vary from
dataset to dataset. But it is crucial to establish a template for your data cleaning process so you
know you are doing it the right way every time.

Advantages and benefits of data cleaning

Having clean data will ultimately increase overall productivity and allow for the highest quality
information in your decision-making. Benefits include:

 Removal of errors when multiple sources of data are at play.

 Fewer errors make for happier clients and less-frustrated employees.
 Ability to map the different functions and what your data is intended to do.
 Monitoring errors and better reporting to see where errors are coming from, making it
easier to fix incorrect or corrupt data for future applications.
 Using tools for data cleaning will make for more efficient business practices and quicker
decision-making.

The basic steps for cleaning data are as follows:

1. Import the data from an external data source.

2. Create a backup copy of the original data in a separate workbook.
3. Ensure that the data is in a tabular format of rows and columns with: similar data in
each column, all columns and rows visible, and no blank rows within the range. For
best results, use an Excel table.
4. Do tasks that don't require column manipulation first, such as spell-checking or
using the Find and Replace dialog box.
5. Next, do tasks that do require column manipulation. The general steps for
manipulating a column are:
a. Insert a new column (B) next to the original column (A) that needs cleaning.
b. Add a formula that will transform the data at the top of the new column (B).
c. Fill down the formula in the new column (B). In an Excel table, a calculated
column is automatically created with values filled down.
d. Select the new column (B), copy it, and then paste as values into the new
column (B).
e. Remove the original column (A), which converts the new column from B to
A.
To periodically clean the same data source, consider recording a macro or writing code to
automate the entire process. There are also a number of external add-ins written by third-party
vendors, listed in the Third-party providers section, that you can consider using if you don't have
the time or resources to automate the process on your own.

Excel Data Cleaning is a significant skill that all Business and Data Analysts must possess. In the
current era of data analytics, everyone expects the accuracy and quality of data to be of the
highest standards. A major part of Excel Data Cleaning involves the elimination of blank spaces,
incorrect, and outdated information.

Data Preprocessing
Data preprocessing is a kind of process in data analysis. It is used to clean and transform raw
data into useful information that can be used by computers. Before analyzing the data, we need
to make sure that the data should be clean and useful. Data preprocessing helps to improve the
quality of data, consistency of the data, and compatibility.

Data Preprocessing helps in many ways:

It helps in eliminating errors.
It helps in handling the missing values.
It helps in removing duplicates.
It helps in standardizing formats.

Steps in Data Preprocessing

1. Collection of the Data
In this step, we need to collect the raw data. We can collect this data from various sources such
as spreadsheets, online repositories, etc.
2. Cleaning of the Data
In this step, we need to clean the data before using it. We have to identify and address data
quality issues. Excel provides functions like Find and Replace, Text to Columns,
and conditional formatting to clean the data.
3. Handling Missing Values
In this step, we need to handle the missing values. If a value is missing, it can create a major
problem in transforming the data. We can identify and handle missing values using some
functions:
 IF
 ISNA or ISBLANK

We can choose all those rows which are having missing values. We can also replace them with
appropriate substitutes.
4. Removing Duplicates
In this step, we need to remove the duplicates from the data. Duplicates can lead us to skewed
analysis results. Excel offers a simple way to remove duplicates. First, we need to select the data
range and go to the Data tab. Then click on the Remove Duplicates button. Then we can choose
the columns to check for duplicates. Excel will remove duplicate rows, keeping only unique
values.
5. Standardizing Formats
In this step, we need to standardize the formats. Inconsistent data formats can create some
challenges for us during analysis. That’s why Excel allows you to standardize formats. We can
use the features of Excel like cell formatting, text functions (e.g., PROPER, UPPER,
LOWER), and data validation rules.

6. Filtering and Sorting

In this step, we need to filter and sort the data. Excel's filtering and sorting capabilities help
explore and organize large datasets. The Filter function allows you to display specific subsets of
data based on criteria. Sorting data in ascending or descending order can be done using
the Sort function.

Advantages of Data Preprocessing

There are several advantages of data preprocessing in Excel:
Excel provides a user-friendly interface so that we can easily do data preprocessing and other
data analysis tasks.
Excel offers a wide range of functions and features that helps in different data preprocessing
needs.
Excel is widely available, that’s why it is commonly used for data preprocessing.
Excel integrates well with other Microsoft Office applications, facilitating seamless data transfer
and collaboration.

Disadvantages of Data Preprocessing

Along with the advantages, there are some disadvantages of data preprocessing in Excel:
Excel may not be suitable for handling large datasets.
Excel’s analytical capabilities are robust but may not match those offered by specialized
statistical or data analysis software.
Data preprocessing tasks in Excel often require manual execution.

SLT_Licensing_InANutshell_2024
No ratings yet
SLT_Licensing_InANutshell_2024
8 pages
Excel For Auditors
100% (1)
Excel For Auditors
53 pages
Excel 2019 All-in-One: Master the new features of Excel 2019 / Office 365
From Everand
Excel 2019 All-in-One: Master the new features of Excel 2019 / Office 365
Lokesh Lalwani
No ratings yet
Green Bridge Excel Data Analytics Training Courseware
No ratings yet
Green Bridge Excel Data Analytics Training Courseware
227 pages
OANDA Exchange Rate With D365FO
No ratings yet
OANDA Exchange Rate With D365FO
6 pages
Proof of Delivery
No ratings yet
Proof of Delivery
8 pages
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
No ratings yet
Blue Futuristic Illustrative Artificial Intelligence Project Presentation
11 pages
Data Preparation & Cleaning
No ratings yet
Data Preparation & Cleaning
24 pages
Data Cleaning in Excel
No ratings yet
Data Cleaning in Excel
4 pages
Best Practices for Data Cleaning_EN_1802
No ratings yet
Best Practices for Data Cleaning_EN_1802
13 pages
Using Excel To Clean and Prepare Data
No ratings yet
Using Excel To Clean and Prepare Data
9 pages
Excel For Data Analysis
No ratings yet
Excel For Data Analysis
9 pages
Using Excel To Clean and Prepare Data For Analysis
No ratings yet
Using Excel To Clean and Prepare Data For Analysis
9 pages
DataCleaning 1717312956
No ratings yet
DataCleaning 1717312956
22 pages
Excel
No ratings yet
Excel
6 pages
Data Analysis
No ratings yet
Data Analysis
29 pages
4. Data Cleaning and Preparation
No ratings yet
4. Data Cleaning and Preparation
20 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Data Cleaning in Excel
No ratings yet
Data Cleaning in Excel
16 pages
Data cleaning
No ratings yet
Data cleaning
6 pages
Data Mining Group Assignment4
No ratings yet
Data Mining Group Assignment4
10 pages
m4t5 - PDF - Eng Data Cleaning & Etl
No ratings yet
m4t5 - PDF - Eng Data Cleaning & Etl
6 pages
dm unit 3
No ratings yet
dm unit 3
15 pages
DS-Unit-2_ABM_final
No ratings yet
DS-Unit-2_ABM_final
134 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
the Ultimate Guide to Data Cleaning With SQL 1738769035
No ratings yet
the Ultimate Guide to Data Cleaning With SQL 1738769035
36 pages
Excel Dashboard For Business Analytics
No ratings yet
Excel Dashboard For Business Analytics
22 pages
Data Cleaning in Excel
No ratings yet
Data Cleaning in Excel
12 pages
? Data Cleaning 101❗_
No ratings yet
? Data Cleaning 101❗_
17 pages
10 Ways to Clean Data in Excel
No ratings yet
10 Ways to Clean Data in Excel
10 pages
Hand out on Data Cleaning
No ratings yet
Hand out on Data Cleaning
4 pages
Top 8 Excel Data Cleaning Techniques To Know in 2023 Simplilearn 2
No ratings yet
Top 8 Excel Data Cleaning Techniques To Know in 2023 Simplilearn 2
1 page
DAE Manual-CSE (4)
No ratings yet
DAE Manual-CSE (4)
52 pages
Manual - Excel Masterclass 1 - DS7
No ratings yet
Manual - Excel Masterclass 1 - DS7
4 pages
Data Cleaning in Power Query_ Best Practices and Techniques
No ratings yet
Data Cleaning in Power Query_ Best Practices and Techniques
20 pages
Chap3.Data Extraction and Management
No ratings yet
Chap3.Data Extraction and Management
29 pages
Data Analysis and Information Management
No ratings yet
Data Analysis and Information Management
13 pages
Www Tutorialspoint Com Excel Data Analysis Excel Data Analysis Quick Guide Htm
No ratings yet
Www Tutorialspoint Com Excel Data Analysis Excel Data Analysis Quick Guide Htm
50 pages
Data Science and Analytics
No ratings yet
Data Science and Analytics
51 pages
Data Warehouse and Data Mining - Unit 3
No ratings yet
Data Warehouse and Data Mining - Unit 3
14 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
30 pages
Guide To Data Cleaning in Ms Excel
No ratings yet
Guide To Data Cleaning in Ms Excel
6 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
65 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
Data Cleaning in Excel
100% (1)
Data Cleaning in Excel
68 pages
Intro To Data Analytics - Cleanup & Transformation
No ratings yet
Intro To Data Analytics - Cleanup & Transformation
30 pages
SMA_Expt_3
No ratings yet
SMA_Expt_3
9 pages
C-42 Exp 3 Sma
No ratings yet
C-42 Exp 3 Sma
8 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
Data Cleaning Methods in Excel
No ratings yet
Data Cleaning Methods in Excel
11 pages
How To Filter and Clean Data in Excel
No ratings yet
How To Filter and Clean Data in Excel
1 page
Data Cleaning_ Importance and Techniques
No ratings yet
Data Cleaning_ Importance and Techniques
1 page
BA-Unit 2
No ratings yet
BA-Unit 2
31 pages
Exc Report
No ratings yet
Exc Report
28 pages
data-cleaning-using-pandas
No ratings yet
data-cleaning-using-pandas
9 pages
Excel Cleanup Guide
No ratings yet
Excel Cleanup Guide
14 pages
Lesson 3 Data Cleaning and Preparation
No ratings yet
Lesson 3 Data Cleaning and Preparation
105 pages
Week 5 Assignme-WPS Office
No ratings yet
Week 5 Assignme-WPS Office
3 pages
AIDS C04-Session-21
No ratings yet
AIDS C04-Session-21
18 pages
Data Analysis and Business Modeling With Excel 2013 - Sample Chapter
No ratings yet
Data Analysis and Business Modeling With Excel 2013 - Sample Chapter
27 pages
Lec 9
No ratings yet
Lec 9
1 page
Excel 2024: Mastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step GuideMastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step Guide
From Everand
Excel 2024: Mastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step GuideMastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step Guide
Thomas Reynolds
No ratings yet
Special Techniques in Excel
From Everand
Special Techniques in Excel
David Fong
No ratings yet
AR3 Wirtz Etal 2018xxx 2
No ratings yet
AR3 Wirtz Etal 2018xxx 2
28 pages
Delphi Reports: Reportsmith, Quickreport, and Beyond
No ratings yet
Delphi Reports: Reportsmith, Quickreport, and Beyond
50 pages
Anti-Money Laundering Regulation of Cryptocurrency: U.S. and Global Approaches
No ratings yet
Anti-Money Laundering Regulation of Cryptocurrency: U.S. and Global Approaches
14 pages
Official ISC ² guide to the SSCP CBK 2nd ed Edition Tipton - Download the full ebook now for a seamless reading experience
100% (1)
Official ISC ² guide to the SSCP CBK 2nd ed Edition Tipton - Download the full ebook now for a seamless reading experience
32 pages
Design Thinking 100 MCQs Full
No ratings yet
Design Thinking 100 MCQs Full
18 pages
JD - SAP Basis Senior Consultant
No ratings yet
JD - SAP Basis Senior Consultant
2 pages
"Artificial Intelligence": Bldea'S Vachana Pitamaha Dr.P.G.Halakatti College of Engineering & Technology
100% (1)
"Artificial Intelligence": Bldea'S Vachana Pitamaha Dr.P.G.Halakatti College of Engineering & Technology
17 pages
EC CH 4 B2B Note
No ratings yet
EC CH 4 B2B Note
54 pages
Adeboye-SAP PM
No ratings yet
Adeboye-SAP PM
3 pages
Muhammad Tauseef CV
No ratings yet
Muhammad Tauseef CV
3 pages
Public Sector Solutions Summer '23 Release Session
No ratings yet
Public Sector Solutions Summer '23 Release Session
45 pages
Chapter27 Oracle WorkFlow Builder 03 Creating WorkFlows
No ratings yet
Chapter27 Oracle WorkFlow Builder 03 Creating WorkFlows
14 pages
Asset Valuation in Simplerisk
No ratings yet
Asset Valuation in Simplerisk
1 page
Bank Figma
No ratings yet
Bank Figma
9 pages
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
No ratings yet
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
24 pages
Gis Cover Letter With No Experience
100% (1)
Gis Cover Letter With No Experience
6 pages
Assignment Gelagat Organisasi
No ratings yet
Assignment Gelagat Organisasi
19 pages
IBM Project Pre Assesment - PPT Template 202324
No ratings yet
IBM Project Pre Assesment - PPT Template 202324
9 pages
Professional Summary: Email ID: Name Mobile No
No ratings yet
Professional Summary: Email ID: Name Mobile No
4 pages
Prerequisite Customizing Required For Legacy
No ratings yet
Prerequisite Customizing Required For Legacy
3 pages
Explain "Ybackoffice" Using Ant Extgen?
No ratings yet
Explain "Ybackoffice" Using Ant Extgen?
18 pages
EC-Council Certified SOC Analyst
No ratings yet
EC-Council Certified SOC Analyst
4 pages
What Is OLAP
No ratings yet
What Is OLAP
11 pages
ISMS - 008 Backup Policy
No ratings yet
ISMS - 008 Backup Policy
10 pages
RFM Analysis 01 Assignment: RFM Nutramin Data Data
No ratings yet
RFM Analysis 01 Assignment: RFM Nutramin Data Data
2 pages
8 Tech Support Best Practices
No ratings yet
8 Tech Support Best Practices
3 pages
Using Instruction of Tencent Meeting
No ratings yet
Using Instruction of Tencent Meeting
10 pages

DSA2

Uploaded by

DSA2

Uploaded by

What is data cleaning?

Advantages and benefits of data cleaning

 Removal of errors when multiple sources of data are at play.

The basic steps for cleaning data are as follows:

1. Import the data from an external data source.

Data Preprocessing helps in many ways:

Steps in Data Preprocessing

6. Filtering and Sorting

Advantages of Data Preprocessing

Disadvantages of Data Preprocessing

You might also like