0% found this document useful (0 votes)
19 views

Univ BA LAB RECORD - Merged

Business analytics lab essentials
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Univ BA LAB RECORD - Merged

Business analytics lab essentials
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Department of

_________________________Laboratory Record

Name of the Student : _____________________________

Register Number : _____________________________

Department : _____________________________

Semester : _____________________________
BONAFIDE CERTIICATE

University Reg.No. ____________________________ This is to certify that this is a bonafide record

work done by Mr./Miss._______________________________________________ studying B.E/BTech.,

_____________________________________________Department in the _________________________

Laboratory for _________________________________Semester.

_______________________ ____________________

Staff-in-Charge Head of the Department

Submitted for the University Practical examination held on __________________ at Velammal Institute

of Technology

______________________ ______________________

Internal Examiner External Examiner


CCW331 BUSINESS ANALYTICS LTPC
2023
COURSE OBJECTIVES:

To understand the Analytics Life Cycle.


To comprehend the process of acquiring Business Intelligence
To understand various types of analytics for Business Forecasting
To model the supply chain management for Analytics.
To apply analytics for different functions of a business
LIST OF EXPERIMENTS:
• Use MS-Excel and Power-BI to perform the following experiments using a Business data set, and
make presentations.
• Students may be encouraged to bring their own real-time socially relevant data set.
I Cycle – MS Excel
1. Explore the features of Ms-Excel.
2. (i) Get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT,
ROUND)
ii) Perform data import/export operations for different file formats.
3. Perform statistical operations - Mean, Median, Mode and Standard deviation, Variance, Skewness,
Kurtosis
4. Perform Z-test, T-test & ANOVA
5. Perform data pre-processing operations i) Handling Missing data ii) Normalization
6. Perform dimensionality reduction operation using PCA, KPCA & SVD
7. Perform bivariate and multivariate analysis on the dataset.
8. Apply and explore various plotting functions on the data set.

II Cycle – Power BI Desktop


9. Explore the features of Power BI Desktop
10. Prepare & Load data
11. Develop the data model
12. Perform DAX calculations
13. Design a report
14. Create a dashboard and perform data analysis

COURSE OUTCOMES:
CO1: Explain the real-world business problems and model with analytical solutions.
CO2: Identify the business processes for extracting Business Intelligence
CO3: Apply predictive analytics for business fore-casting
CO4: Apply analytics for supply chain and logistics management.
CO5: Use analytics for marketing and sales.
TOTAL: 30 PERIODS

3
EXP. NO. DATE NAME OF THE EXPERIMENT PAGE SIGN
NO.
1. Explore the features of MS-Excel.
(i) Get the input from user and perform
numerical operations (MAX, MIN, AVG,
SUM, SQRT, ROUND)
2.
ii) Perform data import/export operations for
different file formats
3. Perform statistical operations - Mean, Median,
Mode and Standard deviation, Variance,
Skewness, Kurtosis.
Perform Z-test Formula
4.
P-Value in Excel
Anova using Excel
5. Perform data pre-processing operations
i) Handling Missing data
ii) Normalization
6. Perform dimensionality reduction operation
using PCA, KPCA & SVD.
7. To Perform Bivariate and multi variate on the
data set.
8. Apply and Explore various plotting functions
on the data set.
9. Features of Power BI
10. Prepare and Load Data
11. Develop the Data Model
12 Perform the DAX Calculations
13 Design a report

14 Create Dashboard and perform data analysis


EXP. NO:1 Explore the features of MS-Excel.
DATE:

Features of MS Excel
MS Excel is used for processing the data that is in tabular form and then performing mathematical
functions on it to analyze it. This is what the Excel window looks like (version 2007):

Excel is a tool for coordinating and performing calculations on data. It can examine data, compute
statistics, create pivot tables, and express data as a chart or graph. MS Excel performs the following
basic functions:
In MS Excel, there are rows and columns. The intersection of rows and columns makes a cell. So each
of the cells is an individual unit of data. Each cell has a cell address which is the number of rows and
alphabet of the column it appears in. No two cells have the same address ever.
Home and Insert
The Home & Insert menu of MS Excel is similar to MS Word. Users can change the formatting of the
content from home & include pie charts, tables, and other files related to data from the insert menu.
Font size, font color, font styles, alignment, background color, formatting options and styles, insertion,
deletion, and editing in the cells options are also available.
One can insert images and figures, header, and footer, charts, and sparklines and even attach graphs,
equations, and symbols.
Formulas
The unique functions that MS Excel has are Formulas & Data. Users can perform the formula on data
to analyze it quickly. Users have to select the cells for that and one cell becomes one unit of data.
So if the user selects 10 cells and applies an average formula to them, the user will get an average of
the data output of those 10 cells.
To apply a formula to any data, the user needs to select it without any space. Then in the function bar,
the user needs to type ‘=’ and the abbreviation of the formula the user wishes to apply.
Data
From the Data menu, the user can perform functions without changing the original data. Users can filter,
add external data from the web & sort data without changing it. For example, the user can sort the data
in alphabetical order.

1
Right from basic functions like addition & subtraction, the user can perform complex statistical
functions like correlation & t-test. Moreover, users can convert them into Pie charts or graphs within
moments. This makes data analysis easy.
Page Layout
Users can apply themes, orientation, and check the page setup through the page layout option.
Review
Proofreading like spell check can be performed for an excel sheet in the review section and a user can
even add comments or remarks in this part.
View
Different views and layouts in which the user wants the spreadsheet to be displayed can be selected
here. Options to zoom in and out, full screen, and pane arrangement are available under this section.

There are several features that are available in Excel to make our task more manageable. Some
of the main features are:

1. AutoFormat: It allows the Excel users to use predefined table formatting options.
2. AutoSum: AutoSum feature helps us to calculate the sum of a row or column automatically by
inserting an addition formula for a range of cells.
3. List AutoFill: It automatically develops cell formatting when a new component is added to the end
of a list.
4. AutoFill: This feature allows us to quickly fill cells with a repetitive or sequential record such as
chronological dates or numbers and repeated documents. AutoFill can also be used to copy
functions. We can also alter text and numbers with this feature.
5. AutoShapes: AutoShapes toolbar will allow us to draw some geometrical shapes, arrows, flowchart
items, stars, and more. With these shapes, we can draw our graphs.
6. Wizard: It guides us to work effectively while we work by displaying several helpful tips and
techniques based on what we are doing. Drag and Drop feature will help us to reposition the record
and text by simply dragging the data with the help of the mouse.
7. Charts: This feature will help you to present the data in graphical form by using Pie, Bar, Line
charts, and more.
8. PivotTable: It flips and sums data in seconds and allows us to execute data analysis and generating
documents like periodic financial statements, statistical documents, etc. We can also analyze
complex data relationships graphically.
9. Shortcut Menus: The shortcut menu helps users to make the work done through shortcut
commands that need a lengthy process.

2
EXP. NO:2. (i) Get the input from user and perform numerical operations (MAX, MIN, AVG, SUM,
DATE: SQRT, ROUND)

Aim:
To get the input from user and perform numerical operations (MAX,MIN, AVG, SUM,SQRT,
ROUND) in Excel

Procedures:

Suppose we are given the following data:

We wish to find out the total sales for the first six months. The formula to be used is:

Step-1: Select a Sample Excel data sheet

Step-2: Perform following operations


a. Sum:
Adding Two Manual Entries

• Type A1(=)
• Type 5+5
• Hit enter

Adding Two Cells


Select a cell and type (=)
Select a cell
Type (+)
Select another cell
Hit enter

Adding Several Cells


Formula: =SUM (number1, [number2], [number3]……)
1. Type B1(=SUM)

2. Double click the SUM command

3. Mark the range A1:A5

3
4. Hit enter

We get the result below:

Adding Using Absolute Reference

1. Select a cell and type (=)


2. Select the cell you want to lock, add two dollar signs ($) before the column and row
3. Type (+)

4
4. Fill a range

Step by step:

1. Type C1(=)
2. Select B1
3. Type dollar sign before column and row $B$1
4. Type (+)
5. Select A1
6. Hit enter
7. Fill the range C1:C10

b. MAX Function
The MAX function is a premade function in Excel, which finds the highest number in a range.
It is typed =MAX
The function ignores cells with text. It will only work for cells with numbers.

How to use the =MAX function:


Select a cell
Type =MAX
Double click the MAX command
Select a range
Hit enter

c. MIN Function

The MIN function is a premade function in Excel, which finds the lowest number in a range.
It is typed =MIN
How to use the =MIN function:
1. Select a cell

2. Type =MIN

3. Double click the MIN command

4. Select a range

5
5. Hit enter

d. AVERAGE Function

The AVERAGE function is a premade function in Excel, which calculates the average (arithmetic
mean).
It is typed =AVERAGE
It adds the range and divides it by the number of observations.
Note: The AVERAGE function ignores cells with text.
1. Select a cell

2. Type =AVERAGE

3. Double click the AVERAGE command

4. Select a range

5. Hit enter

6. Next, Fill

e. finding square root using SQRT Function


SQRT(number).Where number is the number or reference to the cell containing the number for which
you want to find the square root.

To calculate square root of a number in A2, use this one:=SQRT(A2)

f. ROUND Function

The ROUND Formula in Excel accepts the following parameters and arguments:

Number – The number which has to be rounded.

Num_Digits – The total number of digits to round the number to.

Formula Result Description

=ROUND (A2,2) 106.86 The number in A2 is rounded to 2 decimal places.

6
=ROUND (A2,1) 106.9 The number is A2 is rounded to 1 decimal place.

=ROUND (A2,0) 107 The number in A2 is rounded to the nearest integer.

=ROUND (A2,-1) 110 The number in A2 is rounded to the nearest multiple of 10.

=ROUND (A2-2) 100 The number in A2 is rounded to the nearest multiple of 100.

Conclusion:
Thus, to Get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT,
ROUND) output was Executed successfully.

7
EXP. NO:2. (ii) Perform data import/export operations for different file formats.
DATE:

Aim:
To Perform data import/export operations for different file formats in Excel
Procedure:

Excel can import and export many different file types aside from the standard. xslx format. If your
datais shared between other programs, like a database, you may need to save data as a different file type
orbring in files of a different file type.
Export Data

When you have data that needs to be transferred to another system, export it from Excel in a format that
can be interpreted by other programs, such as a text or CSV file.
1. Click the File tab.

2. At the left, click Export.

3. Click the Change File Type.

4. Under Other File Types, select a file type.

• Text (Tab delimited): The cell data will be separated by a tab.

• CSV (Comma delimited): The cell data will be separated by a comma.

• Formatted Text (space delimited): The cell data will be separated by a space.

• Save as Another File Type: Select a different file type when the Save As dialog
box appears.

The file type you select will depend on what type of file is required by the program that will consume
the exported data.
5. Click Save As.

8
6. Specify where you want to save the file.

7. Click Save.

A dialog box appears stating that some of the workbook features may be lost.
8. Click Yes.

Import Data

Excel can import data from external data sources including other files, databases, or web pages.
1. Click the Data tab on the Ribbon..

2. Click the Get Data button.

9
Some data sources may require special security access, and the connection process can
often be very complex. Enlist the help of your organization’s technical support staff for
assistance.
3. Select From File.

4. Select From Text/CSV.

If you have data to import from Access, the web, or another source, select one of those
options in the Get External Data group instead.
5. Select the file you want to import.

6. Click Import.

If, while importing external data, a security notice appears saying that it is connecting
to an external source that may not be safe, click OK.

7. Verify the preview looks correct.

10
Because we've specified the data is separated by commas, the delimiter is already set.
If you need to change it, it can be done from this menu.
8. Click Load.

Conclusion:
Thus, the Perform data import/export operations for different file formats in Excel output was Executed
successfully.

11
EXP. NO:3 Perform statistical operations - Mean, Median, Mode and Standard deviation,
DATE: Variance,Skewness, Kurtosis

Aim:

To perform descriptive statistics using MS Excel. It also helps you to understand


descriptivestatistics in detail along with examples.

Procedures:

In other words, it consists of measures of central tendency, variability, skewness and kurtosis.
Measures of central tendency
are used to find the single value that best describes about the entire distribution. There are three
main measures of central tendency: Mean, Median and Mode.

Mean Average value


Median Middle value
Mode Most frequent value

Measures of Variability refers to the spread or dispersion of scores. There are four main
measures of variability: Range, Inter quartile range, Standard deviation and Variance.

Range Difference between max and min in a distribution


Standard Deviation Average distance of scores in a distribution from their mean
Variance Square of the standard deviation
Skewness Degree to which scores in a distribution are spread out.
Kurtosis Flatness or peakness of the curve

Examples:
Suppose you are asked to provide a figure that best describes the annual salary offered to
students in ABC College you would answer this question with a measure of central tendency
and variability.

Steps to calculate Descriptive Statistics using MS Excel

1. If you haven't already installed the Analysis ToolPak, Click the Microsoft Office button,
then click on the Excel Options , and then select Add-Ins , Click Go, check the Analysis
ToolPak box, and click Ok. How to install Analysis ToolPak

2. Select Data tab, then click on the Data Analysis option, then selects Descriptive Statistics
from the list and Click Ok. [Data tab >> Data Analysis >> Descriptive Statistics]

12
3. In the Input Range we select the data, and then select Output Range where you want the output
to be stored. If you don’t specify the output range it will throw output in the new worksheet.

4. Check Summary Statistics and Confidence Level for Mean options. By default the confidence
level is 95%. You can change the level as per the hypothesis standard of study.

5. When you click Ok, you will see the result in the selected output range.

13
Interpretation:
The average value is 5.533. The middle value is 6 and the most frequent value is 8. Negative skewness
indicates a left skewed data. Negative kurtosis indicates a flat distribution. The 95% confidence level
indicates you can be 95% sure that the true percentage of the population lies between

5.275 (5.533 – 0.258) 𝑎𝑛𝑑 5.791 (5.533 + 0.258).

Using Excel Functions:


You can accomplish the same task using excel functions such as AVERAGE, MEDIAN,
MODE, SUM, STDEV, KURT, MAX, MIN, CONFIDENCE

How to Install Analysis ToolPak


Step 1: Click on the Excel Options located at the bottom of the windows menu
Step 2: Select Add–Ins option and then Click on the Go button
Step 3: In the dialog box appeared on the screen Select Analysis ToolPak and then press OK button.
After completing the above steps , you will see Data Analysis in Data menu.

A formula for Population Mean is given by:

𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 / 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔

In case you want to use the sample mean as representative of the population mean:

𝑺𝒂𝒎𝒑𝒍𝒆 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 / (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 – 𝟏)

Let’s take an example to understand the calculation of the Population Mean formula in a better manner.

You can download this Population Mean Template here – Population Mean Template
Example #1
Let’s say you have a data set with 10 data points, and we want to calculate Population Mean for
that.

14
Data set: {14,61,83,92,2,8,48,25,71,12}

Solution:

𝑷𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 / 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔

15
• Population Mean = (14+61+83+92+2+8+48+25+71+12) / 10

• Population Mean = 416 / 10

• Population Mean = 41.6

Example #2

Let’s say you want to invest in IBM and are keen to look at its past performance and returns. You
want to go back 20 years and calculate monthly returns, but that will become very hectic. So you
have decided to take a sample of the last 10 months and calculate the return and mean. You
believe that the sample you have taken correctly represents the population.

Solution:

16
So if you see here, in the last 10 months, IBM’s return has fluctuated very much.
Sample Mean is calculated using the formula given below
𝑺𝒂𝒎𝒑𝒍𝒆 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 / (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 – 𝟏)

• Sample Mean = (3.74% + 1.07% +4.34% + (-23.66)% + 7.66% + (-7.36)% + 18.25% + 2.76%
+ 1.48% + 0.00%) / (10 – 1)

• Sample Mean = 8.28% / 9

• Sample Mean = 0.92%

Overall, the average return in the last 10 months is only 0.92%.

Conclusion:
Thus, the Perform Statistical operations-Mean, Median, Mode and Standard Deviation, Variance, Skewness, Kurtosis
using toolpak in Excel output was Executed successfully.

17
EXP. NO:4 To Perform Z-Test, T-test & ANOVA
DATE:

Aim:
To Perform Z-test, T-test & ANOVA in Excel by following Steps.
Procedures:

Perform Z-test formula in Excel

11. Z TEST Formula in Excel


In Excel, we have a function for Z-Test named as ZTest, where, as per syntax, we need to have Array
and X value (Hypothesized sample mean) and Sigma value (Optional). Mostly X is a minimum of 95%
probability, which can take from 0 to 5. Another way of doing Z-Test is from the Data Analysis option
from the Data menu tab. There we would need 2 variable ranges, 2 variances of each range. If Z < Z
Critical then we will reject the null hypothesis.

Z TEST Formula has the below arguments:

• Array: Test the hypothesized sample mean for the given set of values.
• X: The hypothesized sample mean, which requires a test.
• Sigma: This is an optional argument that represents the population standard deviation. If it’s
not given or unknown, use the sample standard deviation.

Working of Z TEST Function in Excel with Examples


Let’s understand the working of the Z TEST Function in Excel with some examples.

Example #1 We have given the below set of values:

18
To calculate the one-tailed probability value of a Z Test for the above data, let’s assume the
hypothesized population mean is 5. Now we will use the Z TEST formula as shown below:

The result is given below:

19
The formula below calculates the two-tailed P-value of a Z TEST for the given hypothesized
population, which is 5.

The result is given below:

Two Sample Z Test:


While using the Z Test, we test a null hypothesis that states that the two population’s mean is equal.

20
i.e.H0: µ1 – µ 2 = 0
H1: µ1 – µ 2 ≠ 0
Where H1 is called an alternative hypothesis, the mean of the two populations is not equal.
Let’s take an example to understand the usage of two sample Z tests.
12. Example #2
Let’s take the example of student’s marks in two different subjects.

Now we need to calculate the variance of both subjects, so we will use the below formula for this:

The above formula applies for Variance 1 (Subject 1) like below:

The result is given below:

21
The above same formula applies for Variance 2 (Subject 2) like below:

The result is given below:

22
• Now, go to the Data Analysis tab in the extreme upper right corner under the DATA tab as
shown below screenshot:

• It will open a dialog box with Data Analysis options.


• Click on z-Test: Two-Sample for Means and click on OK, as shown below.

• It will open a dialog box for Z-test, as shown below.

• Now in the Variable 1 range box, select subject 1 range from A25:A35

• Similarly, in the Variable 2 range box, select subject 2 range from B25:B35

23
• Under the Variable 1 variance box, enter cell B38 variance value.
• Under the Variable 2 variance box, enter cell B39 variance value.

• In Output Range, Select the cell where you want to see the result. Here we have passed cell E24
and then clicked on OK.

The result is shown below:

24
Explanation

• We can reject the null hypothesis if z < -z Critical two-tail or z stat > z Critical two-tail.
• Here 1.279 > -1.9599 and 1.279 < 1.9599; hence we can’t reject the null hypothesis.
• Thus, the means of both populations don’t differ significantly.

25
P- value in Excel

What is P-Value in Excel?


The p-value in Excel is a statistical measure that checks if the correlation between the two data groups
is caused by important factors or just by coincidence. It plays a vital role in analyzing real-world issues
in areas like medicine, economics, and human study.

Statisticians and researchers commonly use the p-value when they want to analyze two data groups.
They start by considering a null hypothesis, which assumes there is no relationship between the two
data groups. This serves as their initial assumption for the experiment. Then they conduct various
statistical tests, including the p-value, and interpret the results. For instance, if the p-value is greater
than the significance level (α) of 0.05 (5%), it suggests that the two data groups are indeed related.
Therefore, the initial assumption of the null hypothesis was incorrect. Conversely, if the p-value is less
than 0.05 (5%), it indicates that the two data groups are unrelated, supporting the initial null hypothesis
assumption.

Statisticians can use Excel to quickly and easily calculate the p-value. Although Microsoft does not
have any specific or direct formula for p-value in Excel, we can use functions like T.TEST and T.DIST
for the calculation. Moreover, there is another method: Analysis ToolPak, that basically simplifies the
T.TEST function for us.

How to Calculate P-Value in Excel?


In this section, we will see how to calculate P-Value in Excel using examples. Here are the three
different ways or functions that we will use:

1. T.TEST Function
2. Analysis ToolPak
3. T.DIST Function

You can download this P-Value Excel Template here – P-Value Excel Template
1. T.TEST Function
Purpose: We can use the T.TEST Function to calculate the p-value in Excel by directly adding the
data ranges to the function.
Syntax:
=T.TEST(range1, range2, tails, type)

In this syntax:

• range1: Cell range of the first data set

• range2: Cell range of the second data set

• tails: Specifies if the test is one-tailed (1) or two-tailed (2).

• type: It determines the type of t-test. [1 for before and after comparison, 2 for an equal number
of data in all columns, and 3 for an unequal number of observations in all the data.]

Example:
Let us compare the scores of students from Class B and Class C to check if Class C students have higher
scores than Class B students. First, we need to assume the null and alternate hypotheses for this test.

26
Null Hypothesis (H0): There is no difference in the scores of both divisions.

Alternate Hypothesis (H1): Class C students have higher scores.

We will calculate the p-value to determine if we should accept or reject the null hypothesis.

Consider the following data:

Solution:
Step 1: Select cell B9 and write the below formula:

=T.TEST(A2:A7,B2:B7,1,2)

27
Step 2: Press “Enter,” and Excel will calculate the p-value as 0.38692 in cell B9.

Result:

P-Value: 0.38692 (39% approx)


Significance level (α): 0.05 (5%)
Our Analysis: p-value > α
Our Conclusion: • Reject Null Hypothesis

• Accept Alternate Hypothesis (Class C Score >


Class B Score)

Based on the analysis, we can conclude that the p-value obtained (0.38692, approximately 39%) is
higher than the significance level (α) of 0.05 (5%). Therefore, we reject the null hypothesis (H0), which
assumes no difference in scores between the students of both classes. Consequently, we accept the
alternate hypothesis, which suggests that Class C students have higher test scores than Class B students.

Analysis ToolPaK
Purpose: The Analysis ToolPak is an Excel Add-in feature that makes it easier to perform t-tests. By
simply inputting the necessary information in a new window, ToolPak automatically calculates and
shows the p-value.

Here,

• Variable 1 Range: Range for the first data column

• Variable 2 Range: Range for the second data column

• Labels: Allows Excel to display the column headings in the output

• Alpha: The standard value for p-value comparison (0.05)

• Output Range: Cell where Excel should display the output.

Example:

Let us compare the scores of students from Class A and Class B to investigate whether Class A students
have higher scores on their exams than students from Class B.

28
Null Hypothesis (H0): There is no difference in the average scores between both divisions.
Alternate Hypothesis (H1): Class B students have higher average scores.
We want to calculate the p-value to check if our assumption in the null hypothesis is true or false.
Consider the following data:

Solution:
Step 1: Click the “Data Analysis” option from the “Data” Tab.

Step 2: A “Data Analysis” dialogue box will open, from which you have to select “t-Test: Two-
Sample Assuming Equal Variances”. Then click “OK”.

Step 3: Add the necessary


information in the dialogue box
as shown below:

• Variable 1 Range: A1 to A7
• Variable 2 Range: B1 to B7
• Labels: Tick the checkbox
• Output Range: E1.

After entering the above data, click on “OK”.

29
Excel will provide a detailed result for the p-value in cells E1 to G14, as shown below:

Result:

One-tailed P-Value: 0.48


Two-tailed P-Value: 0.96
Significance level (α): 0.05
Our Analysis: p-values > α
Our Conclusion: • Accept Alternate Hypothesis (Class B Score > Class
A Score)

• Reject Null Hypothesis

The tool gives us two p-values: one for a one-tailed t-test and the other for a two-tailed t-test. Both p-
values, 0.48 (approximately) and 0.96 (approximately), are greater than the significance level (α)
of 0.05 (5%).

As the p-value is higher than 0.05, it indicates that the alternate hypothesis is true, suggesting that Class
B students have scored more than Class A students. Therefore, we reject the null hypothesis, which
assumes no difference between the average scores of students from both classes.

30
2. T.DIST Function
Purpose: We can use Excel’s T.DIST Function to calculate the p-value by simply adding the test
statistic value and degree of freedom.

Syntax:
= T.DIST.RT(x, degrees_freedom)

In this syntax:

• Rt: It is for calculating one-tailed (one-sided) sample data. There is another variation
T.DIST.2T for two-tailed data for comparison between two sets of data.
• x: It is the test statistic value, i.e., the value we want to study
• degrees_freedom: Degrees of freedom for the T-distribution

3. Example:
The school headmaster believes that students cannot score above 50 without attending tuition classes,
so they plan to start a special class for all students. To investigate this belief, a teacher chooses a random
group of 6 students and records their scores in a specific subject. The average score of this group is
70.5, with a standard deviation of 19.06.
We need to calculate the p-value using the t-dist function to see if the headmaster’s assumption (students
that do not attend tuition classes cannot score more than 50) is true or not.
Null Hypothesis H(0): Students can score higher than 50 even without attending tuition.
Alternate Hypothesis H(1): Students score lower than 50 if they do not attend tuition.
Consider the following data:

Solution:
Step 1: Calculate the test statistic value (x) using the following formula,
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = (𝑥̅ − 𝜇) / (𝑠/√𝑛)
We can write the above formula in the Excel format as follows:

= (B9 – B10) / (B11/SQRT(6))

Enter the above formula in cell B12 and press “Enter”. As a result, Excel will show the x value as 2.634
(approx).

31
Step 2: Calculate degrees of freedom in cell B13 as follows:
𝑫𝒆𝒈𝒓𝒆𝒆 𝒐𝒇 𝒇𝒓𝒆𝒆𝒅𝒐𝒎 = 𝑁𝑜. 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (𝑛) – 1
=6– 1
=𝟓

Step 3: Now, we will find the p-value in Excel using the T.DIST Function.

• Enter the below formula in the cell B14

=T.DIST.RT(B12,B13)

32
• Press “Enter,” and Excel will display the p-value as 023161286 in cell B14.

Result:

P-Value: 0.02316 (2.3% approx)


Significance level (α): 0.05 (5%)
Our Analysis: p-value < α
Our Conclusion: • Accept Null Hypothesis (Students can score above 50
even without tuition)

• Reject Alternate Hypothesis

The T.DIST function returns the p-value as approximately 0.023.

We can see that the p-value of 0.023 (2.3%) is less than the significance value (α) of 0.05 (5%).
Therefore, the null hypothesis is correct; the students can score more than 50 without attending tuition.
Thus, we accept the null hypothesis.

33
Anova using Excel

This example teaches you how to perform a single factor ANOVA (analysis of variance) in Excel. A
single factor or one-way ANOVA is used to test the null hypothesis that the means of several
populations are all equal.

Below you can find the salaries of people who have a degree in economics, medicine or history.
𝐻0: 𝜇1 = 𝜇2 = 𝜇3
H1: at least one of the means is different.

To perform a single factor ANOVA, execute the following steps.


1. On the Data tab, in the Analysis group, click Data Analysis.

Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak add-in.
2. Select Anova: Single Factor and click OK.

3. Click in the Input Range box and select the range A2:C10.
4. Click in the Output Range box and select cell E1.

34
5. Click OK.
Result:

Conclusion:
if F > F crit, we reject the null hypothesis. This is the case, 15.196 > 3.443. Therefore, we reject
the null hypothesis. The means of the three populations are not all equal. At least one of the meansis
different. However, the ANOVA does not tell you where the difference lies. You need a t-Test to test
each pair of means.

35
EXP. NO:5 To perform Data Pre-processing operations in Handling data and normalization
DATE:

Aim:
To perform Data Pre-processing operations in Handling data and normalization in Excel.
Procedures:
Data preprocessing is a kind of process in data analysis. It is used to clean and transform raw
data intouseful information that can be used by computers. Before analyzing the data, we need to
make sure that the data should be clean and useful. Data preprocessing helps to improve the quality of
data, consistency of the data, and compatibility.
Data Preprocessing helps in many ways:
• It helps in eliminating errors.
• It helps in handling the missing values.
• It helps in removing duplicates.
• It helps in standardizing formats.

Steps in Data Preprocessing


There are several steps that are followed in doing data preprocessing:
Collection of the Data
In this step, we need to collect the raw data. We can collect this data from various sources such as
spreadsheets, online repositories, etc.
Cleaning of the Data
In this step, we need to clean the data before using it. We have to identify and address data quality
issues. Excel provides functions like Find and Replace, Text to Columns, and conditional
formatting to clean the data.
Handling Missing Values
In this step, we need to handle the missing values. If a value is missing, it can create a major problem
in transforming the data. We can identify and handle missing values using some functions:
IF

ISNA or ISBLANK

We can choose all those rows which are having missing values. We can also replace them with
appropriate substitutes.
Removing Duplicates
In this step, we need to remove the duplicates from the data. Duplicates can lead us to skewed
analysis results. Excel offers a simple way to remove duplicates. First, we need to select the data
range and go to the Data tab. Then click on the Remove Duplicates button. Then we can choose the
columns to check for duplicates. Excel will remove duplicate rows, keeping only unique values.
Standardizing Formats
In this step, we need to standardize the formats. Inconsistent data formats can create some challenges
for us during analysis. That’s why Excel allows you to standardize formats. We can use the features of
Excel like cell formatting, text functions (e.g., PROPER, UPPER, LOWER), and data validation
rules.
Filtering and Sorting
In this step, we need to filter and sort the data. Excel's filtering and sorting capabilities help explore
and organize large datasets. The Filter function allows you to display specific subsets of data based
on criteria. Sorting data in ascending or descending order can be done using the Sort function.
Example of Data Preprocessing
Suppose we have data of our Ninjas in an Excel spreadsheet, and we want to do data preprocessing;
for this, we need to follow the above steps:
Step 1: Data Collection
We have gathered the information about our Ninjas in an Excel spreadsheet.

36
Step 2: Data Cleaning
Now, we need to clean the data. Suppose we are using TRIM to remove irregular text spacing and
keep single spaces between words.

This will produce the output:

37
Now, we can do the same operation for every row to correct the spacing or we can scroll down the
operational column.

Step 3: Handling Missing Values


Now, we have to check whether there are some missing values or not. For this, we can
use ISBLANK.

38
This will give the following output after scrolling up to the last row.

So, we have to fill all those rows that are empty or use some proper substitutes. After filling them, we
can move to our next step.
Step 4: Removing duplicates
Now, we have to remove all the duplicates that are present in a spreadsheet. So, we have to select all
the data and go to the Data tab and find the Remove Duplicates button.

39
Now, click on the OK button. Then it will give you

If you have duplicate values, it will give a message “duplicate values found and removed.”
Step 5: Standardizing formats

40
Now, we have to standardize the formats so that it cannot create a problem during analysis. Suppose
we want to make sure Courses should be in capital letters only. For this, we can use UPPER.

This will give after scrolling up to the last row

Step 6: Filtering and sorting


Now, we have to do filtering and sorting. We can do this after selecting a particular column we want
to sort

41
Let's sort this from smallest to largest.

Now we have preprocessed data finally in Excel. Now, let us understand the advantages and
disadvantages of data preprocessing in Excel.
Advantages of Data Preprocessing
• Excel provides a user-friendly interface so that we can easily do data preprocessing and other
data analysis tasks.
• Excel offers a wide range of functions and features that helps in different data preprocessing
needs.
• Excel is widely available, that’s why it is commonly used for data preprocessing.
• Excel integrates well with other Microsoft Office applications, facilitating seamless data
transfer and collaboration.
Disadvantages of Data Preprocessing
• Excel may not be suitable for handling large datasets.
• Excel's analytical capabilities are robust but may not match those offered by specialized
statistical or data analysis software.
• Data preprocessing tasks in Excel often require manual execution.

Conclusion:
Thus, the perform Data Pre-processing operations in Handling data and normalization in Excel output was Executed
successfully.

42
EXP. NO:6 To perform dimensionality reduction operation using PCA, KPCA,SVD
DATE:

Aim:
To perform dimensionality reduction operation using PCA, KPCA,SVD
Procedure:
The school system of a major city wanted to determine the characteristics of a great teacher, and
so theyasked 120 students to rate the importance of each of the following 9 criteria using a Likert scale
of 1 to10 with 10 representing that a particular characteristic is extremely important and 1 representing
that the characteristic is not important.

1. Setting high expectations for the students


2. Entertaining
3. Able to communicate effectively
4. Having expertise in their subject
5. Able to motivate
6. Caring
7. Charismatic
8. Having a passion for teaching
9. Friendly and easy-going
Figure 1 shows the scores from the first 10 students in the sample and Figure 2 shows some descriptive
statistics about the entire 120 person sample.

Figure 1 – Teacher evaluation scores

Figure 2 – Descriptive statistics for teacher evaluations

The sample covariance matrix S is shown in Figure 3 and can be calculated directly as

= 𝑴𝑴𝑼𝑳𝑻(𝑻𝑹𝑨𝑵𝑺𝑷𝑶𝑺𝑬(𝑩𝟒: 𝑱𝟏𝟐𝟑 − 𝑩𝟏𝟐𝟔: 𝑱𝟏𝟐𝟔), 𝑩𝟒: 𝑱𝟏𝟐𝟑


− 𝑩𝟏𝟐𝟔; 𝑱𝟏𝟐𝟔)/(𝑪𝑶𝑼𝑵𝑻(𝑩𝟒: 𝑩𝟏𝟐𝟑) − 𝟏)

Here B4:J123 is the range containing all the evaluation scores and B126:J126 is the range containing
the means for each criterion. Alternatively, we can simply use the Real Statistics formula
COV(B4:J123) to produce the same result.

43
Figure 3 – Covariance Matrix

In practice, we usually prefer to standardize the sample scores. This will make the weights of the nine
criteria equal. This is equivalent to using the correlation matrix. Let R = [rij] where rij is the correlation
between xi and xj, i.e.

The sample correlation matrix R is shown in Figure 4 and can be calculated directly as

=MMULT(TRANSPOSE((B4:J123-B126:J126)/B127:J127),(B4:J123-
B126:J126)/B127:J127)/(COUNT(B4:B123)-1)

Here B127:J127 is the range containing the standard deviations for each criterion. Alternatively, we can
simply use the Real Statistics function CORR(B4:J123) to produce the same result.

Figure 4 – Correlation Matrix

Note that all the values on the main diagonal are 1, as we would expect since the variances have been
standardized. We next calculate the eigenvalues for the correlation matrix using the Real Statistics
eigVECTSym(M4:U12) formula, as described in Linear Algebra Background. The result appears in
range M18:U27 of Figure 5.

44
Figure 5 – Eigenvalues and eigenvectors of the correlation matrix

The first row in Figure 5 contains the eigenvalues for the correlation matrix in Figure 4. Below each
eigenvalue is a corresponding unit eigenvector. E.g. the largest eigenvalue is λ1 = 2.880437.
Corresponding to this eigenvalue is the 9 × 1 column eigenvector B1 whose elements are 0.108673, -
0.41156, etc.

As we described above, coefficients of the eigenvectors serve as the regression coefficients of the 9
principal components. For example, the first principal component can be expressed by

This can also be expressed as

Thus for any set of scores (for the xj) you can calculate each of the corresponding principal components.
Keep in mind that you need to standardize the values of the xj first since this is how the correlation
matrix was obtained. For the first sample (row 4 of Figure 1), we can calculate the nine principal
components using the matrix equation Y = BX′ as shown in Figure 6.

Figure 6 – Calculation of PC1 for first sample

Here B (range AI61:AQ69) is the set of eigenvectors from Figure 5, X (range AS61:AS69) is simply
the transpose of row 4 from Figure 1, X′ (range AU61:AU69) standardizes the scores in X (e.g. cell
AU61 contains the formula =STANDARDIZE(AS61, B126, B127), referring to Figure 2) and Y (range
AW61:AW69) is calculated by the formula

=MMULT(TRANSPOSE(AI61:AQ69),AU61:AU69)

45
Thus the principal component values corresponding to the first sample are 0.782502 (PC1), -1.9758
(PC2), etc.

As observed previously, the total variance for the nine random variables is 9 (since the variance was
standardized to 1 in the correlation matrix), which is, as expected, equal to the sum of the nine
eigenvalues listed in Figure 5. In fact, in Figure 7 we list the eigenvalues in decreasing order and show
the percentage of the total variance accounted for by that eigenvalue.

Figure 7 – Variance accounted for by each eigenvalue

The values in column M are simply the eigenvalues listed in the first row of Figure 5, with cell M41
containing the formula =SUM(M32:M40) and producing the value 9 as expected. Each cell in column
N contains the percentage of the variance accounted for by the corresponding eigenvalue. E.g. cell N32
contains the formula =M32/M41, and so we see that 32% of the total variance is accounted for by the
largest eigenvalue. Column O simply contains the cumulative weights, and so we see that the first four
eigenvalues account for 72.3% of the variance.
Using Excel’s charting capability, we can plot the values in column N of Figure 7 to obtain a graphical
representation, called a scree plot.

46
Figure 8 – Scree Plot

We decide to retain the first four eigenvalues, which explain 72.3% of the variance. In section Basic
Concepts of Factor Analysis we will explain in more detail how to determine how many eigenvalues to
retain. The portion of Figure 5 that refers to these eigenvalues is shown in Figure 9. Since all but the
Expect value for PC1 is negative, we first decide to negate all the values. This is not a problem since
the negative of a unit eigenvector is also a unit eigenvector.

Figure 9 – Principal component coefficients (Reduced Model)

Those values that are sufficiently large, i.e. the values that show a high correlation between the principal
components and the (standardized) original variables, are highlighted. We use a threshold of ±0.4 for
this purpose.

This is done by highlighting the range R32:U40 and selecting Home > Styles|Conditional Formatting
and then choosing Highlight Cell Rules > Greater Than and inserting the value .4 and then selecting
Home > Styles|Conditional Formatting and then choosing Highlight Cell Rules > Less Than and
inserting the value -.4.

Note that Entertainment, Communications, Charisma and Passion are highly correlated with PC1,
Motivation and Caring are highly correlated with PC3 and Expertise is highly correlated with PC4.
Also, Expectation is highly positively correlated with PC2 while Friendly is negatively correlated with
PC2.

Ideally, we would like to see that each variable is highly correlated with only one principal component.
As we can see from Figure 9, this is the case in our example. Usually, this is not the case, however, and
we will show what to do about this in the Basic Concepts of Factor Analysis when we discuss rotation
in Factor Analysis.

In our analysis, we retain 4 of the 9 principal factors. As noted previously, each of the principal
components can be calculated by

i.e. Y= BTX′, where Y is a k × 1 vector of principal components, B is a k x k matrix


(whose columns are the unit eigenvectors) and X′ is a k × 1 vector of the
standardized scores for the original variables.

47
If we retain only m principal components, then Y = BTX where Y is an m × 1 vector, B is a k × m matrix
(consisting of the m unit eigenvectors corresponding to the m largest eigenvalues) and X′ is the k × 1
vector of standardized scores as before. The interesting thing is that if Y is known we can calculate
estimates for standardized values for X using the fact that X′ = BBTX’ = B(BTX′) = BY (since B is an
orthogonal matrix, and so, BBT = I). From X′ it is then easy to calculate X.

Figure 10 – Estimate of original scores using reduced model

In Figure 10 we show how this is done using the four principal components that we calculated from the
first sample in Figure 6. B (range AN74;AQ82) is the reduced set of coefficients (Figure 9), Y (range
AS74:AS77) are the principal components as calculated in Figure 6, X′ are the estimated standardized
values for the first sample (range AU74:AU82) using the formula =
𝑀𝑀𝑈𝐿𝑇(𝐴𝑁74: 𝐴𝑄82, 𝐴𝑆74: 𝐴𝑆77) and finally, X are the estimated scores in the first sample (range
AW74:AW82) using the formula = 𝐴𝑈74: 𝐴𝑈82 ∗ 𝑇𝑅𝐴𝑁𝑆𝑃𝑂𝑆𝐸(𝐵127: 𝐽127) +
𝑇𝑅𝐴𝑁𝑆𝑃𝑂𝑆𝐸(𝐵126: 𝐽126).

Conclusion:
Thus, the perform dimensionality reduction operation using PCA, KPCA,SVD in Excel output was successfully
Executed.

48
EXP. NO:7 To Perform Bivariate and multivariate Analysis on the dataset
DATE:

Aim:
To Perform Bivariate Analysis and multivariate in Excel
Procedure:
The term bivariate analysis refers to the analysis of two variables. You can remember this
becausethe prefix “bi” means “two.”

The purpose of bivariate analysis is to understand the relationship between two variables

There are three common ways to perform bivariate analysis:

1. Scatterplots

2. Correlation Coefficients

3. Simple Linear Regression

The following example shows how to perform each of these types of bivariate analysis in Excel using
the following dataset that contains information about two variables: (1) Hours spent studying
and (2) Exam score received by 20 different students:

49
1. Scatterplots

To create a scatterplot of hours vs. score, we can highlight cells A2:B21, then click the Insert tab
along the top ribbon, then click Insert Scatter Chart within the Charts group:

We can also modify the y-axis limits to gain a better view of the data points.

To do so, double click the y-axis. In the Format Axis panel that appears on the right side of the
screen, click Axis Options and then change the Minimum and Maximum bounds to 60 and 100,
respectively.

The y-axis will automatically update:

The x-axis shows the hours studied and the y-axis shows the exam score received.

50
From the plot we can see that there is a positive relationship between the two variables. As hours
studied increases, exam score tends to increase as well.

2. Correlation Coefficients

A Pearson Correlation Coefficient is a way to quantify the linear relationship between two variables.

We can use the following formula in Excel to calculate the correlation coefficient between hours
studied and exam score:

=CORREL(A2:A21, B2:B21)

The correlation coefficient turns out to be 0.891.

This value is close to 1, which indicates a strong positive correlation between hours studied and exam
score received.

3. Simple Linear Regression

Simple linear regression is a statistical method we can use to quantify the relationship between two
variables.

51
To fit a simple linear regression model in Excel, click the Data tab along the top ribbon, then click
the Data Analysis option in the Analyze group. In the new panel that appears, click Regression and
then click OK.

Note: If you don’t see the Data Analysis option, you need to first load the Excel Analysis ToolPak.

In the panel that appears, enter the following information and then click OK:

Once you click OK, the results of the regression model will appear:

52
The fitted regression equation turns out to be:

Exam Score = 69.0734 + 3.8471*(hours studied)

This tells us that each additional hour studied is associated with an average increase of 3.8471 in
exam score.

We can also use the regression equation to estimate the score that a student will receive based on their
total hours studied.

For example, a student who studies for 3 hours is estimated to receive a score of 81.6147:

• Exam Score = 69.0734 + 3.8471*(hours studied)


• Exam Score = 69.0734 + 3.8471*(3)
• Exam Score = 81.6147

Conclusion:
Thus, the Perform Bivariate Analysis and multivariate in Excel output was Executed successfully.

53
EXP. NO:8 Apply and explore various plotting functions on the data set
DATE:

Aim:
To apply and explore various plotting functions on the data set in excel.
Procedure:
Visualizing Data with Charts

In Excel, charts are used to make a graphical representation of any set of data. A chart is a visual
representation of the data, in which the data is represented by symbols such as bars in a Bar Chart or
lines in a Line Chart. Excel provides you with many chart types and you can choose one that suits
your data or you can use the Excel Recommended Charts option to view charts customized to your
data and select one of those.

• Creating Combination Charts

Suppose you have the target and actual profits for the fiscal year 2015-2016 that you obtained from
different regions.

We will create a Clustered Column Chart for these results.

54
As you observe, it is difficult to visualize the comparison quickly between the targets and actual in
this chart. It does not give a true impact on your results.

Use Vertical Columns for the target values and a Line with Markers for the actual values.

• Click the DESIGN tab under the CHART TOOLS tab on the Ribbon.
• Click Change Chart Type in the Type group. The Change Chart Type dialog box appears.

• Click Combo.
• Change the Chart Type for the series Actual to Line with Markers. The preview appears under
Custom Combination.
• Click OK.

55
Your Customized Combination Chart will be displayed.

As you observe in the chart, the Target values are in Columns and the Actual values are marked along
the line. The data visualization has become better as it also shows you the trend of your results.

However, this type of representation does not work well when the data ranges of your two data values
vary significantly.

56
Creating a Combo Chart with Secondary Axis

Suppose you have the data on the number of units of your product that was shipped and the actual
profits for the fiscal year 2015-2016 that you obtained from different regions.

If you use the same combination chart as before, you will get the following −

In the chart, the data of No. of Units is not visible as the data ranges are varying significantly.

In such cases, you can create a combination chart with secondary axis, so that the primary axis
displays one range and the secondary axis displays the other.

• Click the INSERT tab.


• Click Combo in Charts group.
• Click Create Custom Combo Chart from the drop-down list.

57
The Insert Chart dialog box appears with Combo highlighted.

For Chart Type, choose −

• Line with Markers for the Series No. of Units


• Clustered Column for the Series Actual Profits
• Check the Box Secondary Axis to the right of the Series No. of Units and click OK.

A preview of your chart appears under Custom Combination.

58
Your Combo chart appears with Secondary Axis.

You can observe the values for Actual Profits on the primary axis and the values for No. of Units on
the secondary axis.

A significant observation in the above chart is for Quarter 3 where No. of Units sold is more, but the
Actual Profits made are less. This could probably be assigned to the promotion costs that were
incurred to increase sales. The situation is improved in Quarter 4, with a slight decrease in sales and a
significant rise in the Actual Profits made.

Discriminating Series and Category Axis

Suppose you want to project the Actual Profits made in Years 2013-2016.

Create a clustered column for this data.

59
As you observe, the data visualization is not effective as the years are not displayed. You can
overcome this by changing year to category.

Remove the header year in the data range.

Now, year is considered as a category and not a series. Your chart looks as follows −

60
.

• Click Chart Styles


• Select a Style and Color that suits your data.

61
You can use Trendline to graphically display trends in data. You can extend a Trendline in a chart
beyond the actual data to predict future values.

• Data Labels

Excel 2013 and later versions provide you with various options to display Data Labels. You can
choose one Data Label, format it as you like, and then use Clone Current Label to copy the formatting
to the rest of the Data Labels in the chart.

62
The Data Labels in a chart can have effects, varying shapes and sizes.

It is also possible to display the content of a cell as part of the Data Label with Insert Data Label
Field.

Quick Layout

You can use Quick Layout to change the overall layout of the chart quickly by choosing one of the
predefined layout options.

• Click the chart.


• Click the DESIGN tab under CHART TOOLS.
• Click Quick Layout.

Different possible layouts will be displayed. As you move on the layout options, the chart layout
changes to that particular option.

Select the layout you like. The chart will be displayed with the chosen layout.

63
Using Pictures in Column Charts

You can create more emphasis on your data presentation by using a picture in place of columns.

• Click on a Column on the Column Chart.


• In the Format Data Series, click on Fill.
• Select Picture.
• Under Insert picture from, provide the filename or optionally clipboard if you had copied an
image earlier.

The picture you have chosen will appear in place of columns in the chart.

64

Band Chart

You might have to present customer survey results of a product from different regions. Band Chart is
suitable for this purpose. A Band Chart is a Line Chart with an added shaded area to display the upper
and lower boundaries of groups of data.

Suppose your customer survey results from the east and west regions, month wise are −

65
Here, in the data < 50% is Low, 50% - 80% is Medium and > 80% is High.

With Band Chart, you can display your survey results as follows −

Create a Line Chart from your data.

66
Change the chart type to −

• East and West Series to Line with Markers.


• Low, Medium and High Series to Stacked Column.

Your chart looks as follows.

67
• Click on one of the columns.
• Change gap width to 0% in Format Data Series.

You will get Bands instead of columns.

68
To make the chart more presentable −

• Add Chart Title.


• Adjust Vertical Axis range.
• Change the colors of the bands to Green-Yellow-Red.
• Add Labels to bands.

The final result is the Band Chart with the defined boundaries and the survey results represented
across the bands. One can quickly and clearly make out from the chart that while the survey results for
the region West are satisfactory, those for the region East have a decline in the last quarter and need
attention.

69
Thermometer Chart

When you have to represent a target value and an actual value, you can easily create a Thermometer
Chart in Excel that emphatically shows these values.

With Thermometer chart, you can display your data as follows −

Arrange your data as shown below −

70
• Select the data.
• Create a Clustered Column chart.

As you observe, the right side Column is Target.

• Click on a Column in the chart.


• Click on Switch Row/Column on the Ribbon.

71
• Right click on the Target Column.
• Click on Format Data Series.
• Click on Secondary Axis.

As you observe the Primary Axis and Secondary Axis have different ranges.

• Right click the Primary Axis.


• In the Format Axis options, under Bounds, type 0 for Minimum and 1 for Maximum.
• Repeat the same for Secondary Axis.

72
Both Primary Axis and Secondary Axis will be set to 0% - 100%. The Target Column hides the
Actual Column.

• Right click the visible column (Target)


• In the Format Data Series, select
o No fill for FILL
o Solid line for BORDER
o Blue for Color

• In Chart Elements, unselect


o Axis → Primary Horizontal
o Axis → Secondary Vertical
o Gridlines
o Chart Title
• In the chart, right click on Primary Vertical Axis
• In Format Axis options, click on TICK MARKS
• For Major type, select Inside

73
• Right click on the Chart Area.
• In the Format Chart Area options, select
o No fill for FILL
o No line for BORDER

Resize the chart area, to get the shape of a thermometer.

74
You got your thermometer chart, with the actual value as against target value being shown. You can
make this thermometer chart more impressive with some formatting.

• Insert a rectangle shape superimposing the blue rectangular part in the chart.
• In Format Shape options, select −
o Gradient fill for FILL
o Linear for Type
o 1800 for Angle
• Set the Gradient stops at 0%, 50% and 100%.
• For the Gradient stops at 0% and 100%, choose the color black.
• For the Gradient stop at 50%, choose the color white.

75
• Insert an oval shape at the bottom.
• Format shape with same options.

The result is the Thermometer Chart that we started with.

Gantt Chart

76
A Gantt chart is a chart in which a series of horizontal lines shows the amount of work done in certain
periods of time in relation to the amount of work planned for those periods.

In Excel, you can create a Gantt chart by customizing a Stacked Bar chart type so that it depicts tasks,
task duration, and hierarchy. An Excel Gantt chart typically uses days as the unit of time along the
horizontal axis.

Consider the following data where the column −

• Task represents the Tasks in the project


• Start represents number of days from the Start Date of the project
• Duration represents the duration of the Task

Note that Start of any Task is Start of previous Task + Duration. This is the case when the Tasks are
in hierarchy.

• Select the data.


• Create Stacked Bar Chart.

77
• Right-click on Start Series.
• In Format Data Series options, select No fill.

• Right-click on Categories Axis.


• In Format Axis options, select Categories in reverse order.

78
• In Chart Elements, deselect
o Legend
o Gridlines
• Format the Horizontal Axis to
o Adjust the range
o Major Tick Marks at 5 day intervals
o Minor Tick Marks at 1 day intervals
• Format Data Series to make it look impressive
• Give a Chart Title

Waterfall Chart

Waterfall Chart is one of the most popular visualization tools used in small and large businesses.
Waterfall charts are ideal for showing how you have arrived at a net value such as net income, by
breaking down the cumulative effect of positive and negative contributions.

Excel 2016 provides Waterfall Chart type. If you are using earlier versions of Excel, you can still
create a Waterfall Chart using Stacked Column Chart.

The columns are color coded so that you can quickly tell positive from negative numbers. The initial
and the final value columns start on the horizontal axis, while the intermediate values are floating
columns. Because of this look, Waterfall Charts are also called Bridge Charts.

Consider the following data.

79
• Prepare the data for Waterfall Chart
• Ensure the column Net Cash Flow is to the left of the Months Column (This is because you
will not include this column while creating the chart)
• Add 2 columns – Increase and Decrease for positive and negative cash flows respectively
• Add a column Start - the first column in the chart with the start value in the Net Cash Flow
• Add a column End - the last column in the chart with the end value in the Net Cash Flow
• Add a column Float – that supports the intermediate columns
• Compute the values for these columns as follows

• In the Float column, insert a row in the beginning and at the end. Place n arbitrary value
50000. This just to have some space to the left and right of the chart

The data will be as follows.

80
• Select the cells C2:H18 (Exclude Net Cash Flow column)
• Create Stacked Column Chart

• Right click on the Float Series.


• Click Format Data Series.
• In Format Data Series options, select No fill.

81
• Right click on Negative Series.
• Select Fill Color as Red.

• Right click on Positive Series.


• Select Fill Color as Green.

82
• Right click on Start Series.
• Select Fill Color as Grey.
• Right click on End Series.
• Select Fill Color as Grey.
• Delete the Legend.

• Right click on any Series


• In Format Data Series options, select Gap Width as 10% under Series Options

83
Give the Chart Title. The Waterfall Chart will be displayed.

Sparklines

Sparklines are tiny charts placed in single cells, each representing a row of data in your selection.
They provide a quick way to see trends.

You can add Sparklines with Quick Analysis tool.

• Select the data for which you want to add Sparklines.


• Keep an empty column to the right side of the data for the Sparklines.

84
Quick Analysis button appears at the bottom right of your selected data.

• Click on the Quick Analysis button. The Quick Analysis Toolbar appears with various
options.

85
Click SPARKLINES. The chart options displayed are based on the data and may vary.

Click Line. A Line Chart for each row is displayed in the column to the right of the data.

86
PivotCharts

Pivot Charts are used to graphically summarize data and explore complicated data.

A PivotChart shows Data Series, Categories, and Chart Axes the same way a standard chart does.
Additionally, it also gives you interactive filtering controls right on the chart so that you can quickly
analyze a subset of your data.

PivotCharts are useful when you have data in a huge PivotTable, or many complex worksheet data
that includes text and numbers. A PivotChart can help you make sense of this data.

You can create a PivotChart from

• A PivotTable.
• A Data Table as a standalone without PivotTable.

• PivotChart from PivotTable

To create a PivotChart follow the steps given below −

• Click the PivotTable.


• Click ANALYZE under PIVOTTABLE TOOLS on the Ribbon.
• Click on PivotChart. The Insert Chart dialog box appears.

87
Select Clustered Column from the option Column.

Click OK. The PivotChart is displayed.

88
The PivotChart has three filters – Region, Salesperson and Month.

• Click the Region Filter Control option. The Search Box appears with the list of all Regions.
Check boxes appear next to Regions.
• Select East and South options.

The filtered data appears on both the PivotChart and the PivotTable.

89
PivotChart without a PivotTable

You can create a standalone PivotChart without creating a PivotTable.

• Click the Data Table.


• Click the Insert tab.
• Click PivotChart in Charts group. The Create PivotChart window appears.
• Select the Table/Range.
• Select the Location where you want the PivotChart to be placed.

You can choose a cell in the existing worksheet itself, or in a new worksheet. Click OK.

An empty PivotChart and an empty PivotTable appear along with the PivotChart Field List to build
the PivotChart.

90
• Choose the Fields to be added to the PivotChart
• Arrange the Fields by dragging them into FILTERS, LEGEND (SERIES), AXIS
(CATEGORIES) and VALUES
• Use the Filter Controls on the PivotChart to select the Data to be placed on the PivotChart

Excel will automatically create a coupled PivotTable.

Conclusion:
Thus, the various plotting functions on the data set in Excel output was Executed successfully.

91
EXP. NO:9 Explore the features of power BI Desktop
DATE:

Features of Power BI:


Microsoft Power BI is a collection of tools used to import, aggregate and present data in the form of
immersive and easy-to-digest reports and visuals.
What is Microsoft Power BI?
Microsoft Power BI is a business intelligence (BI) platform that provides nontechnical business users
with tools for aggregating, analyzing, visualizing and sharing data. Power BI's user interface is fairly
intuitive for users familiar with Excel, and its deep integration with other Microsoft products makes it
a versatile self-service tool that requires little upfront training.
Create reports and dashboards documentation.
• It connects several Data servers like SQL, Excel, Oracle, Azure, etc.
• It is easy to access the data for users.
• Range of Attractive Visualizations.
• Flexible Tiles
• It can be able to work with large datasets.

Common uses of Power BI


Microsoft Power BI is used to find insights within an organization's data. Power BI can help connect
disparate data sets, transform and clean the data into a data model and create charts or graphs to provide
visuals of the data. All of this can be shared with other Power BI users within the organization.

The data models created from Power BI can be used in several ways for organizations, including the
following:
• telling stories through charts and data visualizations;
• examining "what if" scenarios within the data; and
• creating reports that can answer questions in real time and help with forecasting to make sure
departments meet business metrics.

Power BI can also provide executive dashboards for administrators or managers, giving
management more insight into how departments are doing.

92
Key features of Power BI
Microsoft has added a number of data analytics features to Power BI since its inception, and continues
to do so. Some of the most important features are the following:

• Artificial intelligence. Users can access image recognition and text analytics in Power BI,
create machine learning models using automated ML capabilities and integrate with Azure
Machine Learning.
• Hybrid deployment support. This feature provides built-in connectors that allow Power BI
tools to connect with a number of different data sources from Microsoft, Salesforce and other
vendors.
• Quick Insights. This feature allows users to create subsets of data and automatically apply
analytics to that information.
• Common data model support. Power BI's support for the common data model allows the
use of a standardized and extensible collection of data schemas (entities, attributes and
relationships).
• Cortana integration. This feature, which is especially popular on mobile devices, allows
users to verbally query data using natural language and access results using Cortana,
Microsoft's digital assistant.
• Customization. This feature allows developers to change the appearance of default
visualization and reporting tools and import new tools into the platform.
• APIs for integration. This feature provides developers with sample code and application
program interfaces (APIs) for embedding the Power BI dashboard in other software products.
• Self-service data prep. Using Power Query, business analysts can ingest, transform, integrate
and enrich big data into the Power BI web service. Ingested data can be shared across
multiple Power BI models, reports and dashboards.
• Modeling view. This allows users to divide complex data models by subject area into
separate diagrams, multiselect objects and set common properties, view and modify properties
in the properties pane, and set display folders for simpler consumption of complex data
models.

Conclusion:
Thus, the Explore the features of power BI was Studied.

93
EXP. NO:10 Prepare and Load Data
DATE:

Aim:
To Prepare and Load data using in Microsoft Power BI tool.

Procedure:

1. Go to the Get Data and from where you're getting the data and the format of the data is
needed to choose the source of data.
2. Then Open the source load your data into Power Bi.

94
3. Select the columns to be in your data which is to be visualized and then click on load button
to load your data.
4. Here the data is processing to get the data from Excel to Power Bi.

5. The Data is successfully loaded into the Power Bi.

95
Conclusion:
Thus, the Prepare and Load data using in Microsoft Power BI tool was created and executed successfully

96
EXP. NO:11 Develop the Data Model
DATE:
Aim:
To develop the data model using Microsoft power BI tool

Procedure:
1. Now go to Modelling option in your Ribbon bar.
2. Then go to Manage Relationships to verify the relation between the tables is as
per requirement or not.

3. Now close this relations and open model view to visualize the relationships
between the tables.

97
4. Now these are not as per our requirement, So delete every single relationship by
selecting all relations in Manage Relationships.

98
5. After deleting every relationship, Create the relationships which are needed for our Business
Model.

6. Create the relations as per your requirement, like this.


7. Finally, the relations of the tables can be visualized by clicking the Model View.

8. Finally, we developed the model as per our requirements.

Conclusion:
Thus, the develop the data model using in Microsoft Power BI was created and executed successfully

99
EXP. NO:12 Perform the DAX Calculations
DATE:
Aim:
To Perform the DAX Calculations using Microsoft Power BI
Procedure:

1. Create a new column to represent the data in a new way as per requirement.
2. In the formula box, type the condition to be applied and to be displayed as new column.

3. Create new measure to represent the data in a new way as per requirement.

100
4. In the formula box, type the condition to be applied and viewed as the new measure.

Conclusion:
Thus, the Perform the DAX Calculations using Microsoft Power BI was created and executed successfully.

101
EXP. NO:13 Design a report
DATE:
Aim:
To design a report in Microsoft power BI
Procedure:

1. Create a card by Order, Employee first-name, Customer first-name.

102
2. To add a header for the report, add the text box which in the home ribbon.

3. To create a slicer, select slicer from the build visual and add employee first name .
4. Create one more slicer to represent the customer’s first name.

103
5. To create a map, select map from the build visual and add ship city as the location, count of
order id as the bubble sizes.

6. To create a line chart , select line chart from the build visual and add count of order id in y-
axis and order date in the x-axis .

104
7. Create a table to display the required columns from the data.

8. In the table display the Order ID, Employee First-Name, Customer First-Name.

To create a Donut chart, select the Donut chart from the build visual.

105
9. In Donut chart, display Count of Order Id and Status Id.

Conclusion:
Thus, the Design a report using Microsoft Power BI was created and executed successfully.

106
EXP. NO:14 Create Dashboard and perform data analysis
DATE:

Aim:
To Create Dashboard and perform data analysis in Microsoft Power BI tool.
Procedure:

1. Here is the sample dashboard.

2. To perform data analysis, select any data from the dashboard.

Conclusion:
This is one of the ways to customize a Dashboard based on our requirements.

107

You might also like