UNIVERSAL AI UNIVERSITY
MBA(BDI5-1 & BDI5-2)
EXCEL FOR MANAGERS
PROECT
Faculty:Sujatha Ayyangar
ATTEMPT ALL QUESTIONS
Q1 Carries 5 Marks
Q2 Carries 5 Marks
Q3 & Q4 Carry 5 Marks
Data Set Details
I have uploaded 12 Datasets for Each Division.(One dataset per Group)
The given Data Set is downloaded from Kaggle.In order to understand about the dataset,type in the
dataset name in Google Search,it will prompt you the Kaggle site,Click on the required Kaggle site
and understand about the dataset
Instructions for Data Cleaning Exercise
Note: If the data provided to you is already clean, your first task is to intentionally introduce
inconsistencies or errors into the dataset. This will help you better understand the types of issues
that can arise in real-world data and how to effectively clean them.
Introduce Errors into Clean Data
1. Duplicate Entries:
o Add duplicate rows for some records.
2. Inconsistent Data:
o Change formatting for dates, numbers, or text entries (e.g., switch date formats,
alter capitalization).
3. Missing Values:
o Delete or blank out some data in various columns to create missing values.
4. Incorrect Data:
o Enter incorrect values in certain fields (e.g., mix text into numeric fields).
5. Mixed Data Types:
o Introduce different data types within the same column (e.g., mix numbers and
text).
6. Add Extra Spaces:
o Insert unnecessary spaces within cells (leading, trailing, or in between words).
QUESTIONS
Q1.DATA CLEANING
Handling Missing Data
How would you identify missing values in the dataset?
How do you impute missing values using the mean, median, or mode?
What is the impact of dropping rows or columns with missing data?
Duplicated Data
How do you identify and remove duplicate rows in the dataset?
Data Standardization
How do you standardize the case of text data (e.g., converting all text to lowercase or
uppercase)?
How do you handle inconsistent data formats (e.g., date formats)?
How do you standardize categorical variables (e.g., combining similar categories)?
Handling Inconsistent Data
How do you identify and correct inconsistent data entries (e.g., misspelled categories)?
How would you handle data inconsistencies between related columns?
Below are the Examples of Inconsistent Data
Inconsistent Date Formats
Example: 01/12/2024, 2024-12-01, December 1, 2024, 12/01/24
Issue: Different date formats can cause issues when sorting or filtering data by date.
2. Inconsistent Categorical Data
Example: New York, NY, new york, N.Y.
Issue: These variations all refer to the same city but are treated as different categories in the
dataset.
3. Inconsistent Use of Case
Example: John Doe, john doe, JOHN DOE
Issue: Names or other text data recorded with varying capitalization can lead to duplicates or
misclassification.
4. Inconsistent Units of Measure
Example: 5kg, 5000g, 5 kilograms
Issue: Different units of measure (kilograms vs. grams) may lead to errors when aggregating
or comparing data.
5. Inconsistent Abbreviations
Example: USA, U.S.A., United States, US
Issue: Different abbreviations or full forms of the same entity can lead to multiple entries for
what should be a single category.
6. Inconsistent Phone Number Formats
Example: +1-234-567-8901, 12345678901, (234) 567-8901
Issue: Varying formats of phone numbers can complicate validation and analysis.
7. Inconsistent Currency Formats
Example: $1000, 1000 USD, 1,000.00
Issue: Different formats for currency values can lead to incorrect calculations or comparisons.
8. Inconsistent Data Entry
Example: Red, red, Redd, Reed
Issue: Spelling mistakes or variations in the data entry process can lead to inaccurate or
duplicate records.
9. Inconsistent Boolean Values
Example: True, False, Yes, No, 1, 0
Issue: Mixing different representations of boolean values can cause logic errors in analysis.
10. Inconsistent Address Formats
Example: 123 Main St., 123 Main Street, 123 main st
Issue: Inconsistent address formats make it difficult to group or match records.
11. Inconsistent Numeric Formatting
Example: 1,000, 1000, 1.000
Issue: Different ways of representing numbers, especially with commas and periods, can
cause issues in calculations.
Formatting Data
How do you clean and format date and time data?
What are the steps to ensure numerical data is correctly formatted
Text Cleaning
How do you remove unwanted characters, spaces, or symbols from text data?
How do you handle leading and trailing spaces in text fields?
Data Type Conversion
How do you convert data types for specific columns (e.g., converting text to date)?
Why is it important to ensure that data types are consistent?
Validation Rules
How do you apply validation rules to ensure data integrity?
What methods can be used to check for data entry errors (e.g., ranges, allowed values)?
Conditional Formatting
Apply Conditional Formatting Custom rules to highlight Necessary Data with Colours
Q2.PIVOT TABLE & DATA VISULISATION
Summarizing Data
How do you summarize data using Pivot Tables (e.g., sum, average, count)?
How can you change the summary function in a Pivot Table?
How do you display the data as a percentage of the total instead of the actual values?
Grouping Data
How can you group data by specific categories in a Pivot Table?
How do you ungroup data that has been grouped in a Pivot Table?
Sorting and Filtering
How do you sort data within a Pivot Table?
How can you apply filters to a Pivot Table to display specific data?
add a slicer to filter data in a Pivot Table?
Pivot Table Calculations
How do you add a calculated field to a Pivot Table?
What are calculated items in a Pivot Table, and how do you create them?
How can you use Pivot Table calculations to find the difference between values?
PIVOT CHART
Create any three Pivot Charts
Q3.STATISTICAL OPERATIONS
Note:
If your dataset has very limited data to do statistical operations,you are
allowed to download any other dataset to do only Statistical Operations
Apply Basic Statistics Functions for the Appropriate
Column(Max,Min,Count,CountA,CountBlank,Sum,Average)
Apply Descriptive Statistics for the Appropriate Column
Q4.VLOOKUP/HLOOKUP
Dropdown List:
Create a dropdown list in a cell that allows users to select a value from the specified
column.
VLOOKUP:
Use VLOOKUP to find and display corresponding data from another column based
on the selected value.
HLOOKUP:
Use HLOOKUP to find and display additional corresponding data from a specified
row in a horizontally arranged table.
************************** *ALL THE BEST *********************************