Univ BA LAB RECORD - Merged
Univ BA LAB RECORD - Merged
_________________________Laboratory Record
Department : _____________________________
Semester : _____________________________
BONAFIDE CERTIICATE
_______________________ ____________________
Submitted for the University Practical examination held on __________________ at Velammal Institute
of Technology
______________________ ______________________
COURSE OUTCOMES:
CO1: Explain the real-world business problems and model with analytical solutions.
CO2: Identify the business processes for extracting Business Intelligence
CO3: Apply predictive analytics for business fore-casting
CO4: Apply analytics for supply chain and logistics management.
CO5: Use analytics for marketing and sales.
TOTAL: 30 PERIODS
3
EXP. NO. DATE NAME OF THE EXPERIMENT PAGE SIGN
NO.
1. Explore the features of MS-Excel.
(i) Get the input from user and perform
numerical operations (MAX, MIN, AVG,
SUM, SQRT, ROUND)
2.
ii) Perform data import/export operations for
different file formats
3. Perform statistical operations - Mean, Median,
Mode and Standard deviation, Variance,
Skewness, Kurtosis.
Perform Z-test Formula
4.
P-Value in Excel
Anova using Excel
5. Perform data pre-processing operations
i) Handling Missing data
ii) Normalization
6. Perform dimensionality reduction operation
using PCA, KPCA & SVD.
7. To Perform Bivariate and multi variate on the
data set.
8. Apply and Explore various plotting functions
on the data set.
9. Features of Power BI
10. Prepare and Load Data
11. Develop the Data Model
12 Perform the DAX Calculations
13 Design a report
Features of MS Excel
MS Excel is used for processing the data that is in tabular form and then performing mathematical
functions on it to analyze it. This is what the Excel window looks like (version 2007):
Excel is a tool for coordinating and performing calculations on data. It can examine data, compute
statistics, create pivot tables, and express data as a chart or graph. MS Excel performs the following
basic functions:
In MS Excel, there are rows and columns. The intersection of rows and columns makes a cell. So each
of the cells is an individual unit of data. Each cell has a cell address which is the number of rows and
alphabet of the column it appears in. No two cells have the same address ever.
Home and Insert
The Home & Insert menu of MS Excel is similar to MS Word. Users can change the formatting of the
content from home & include pie charts, tables, and other files related to data from the insert menu.
Font size, font color, font styles, alignment, background color, formatting options and styles, insertion,
deletion, and editing in the cells options are also available.
One can insert images and figures, header, and footer, charts, and sparklines and even attach graphs,
equations, and symbols.
Formulas
The unique functions that MS Excel has are Formulas & Data. Users can perform the formula on data
to analyze it quickly. Users have to select the cells for that and one cell becomes one unit of data.
So if the user selects 10 cells and applies an average formula to them, the user will get an average of
the data output of those 10 cells.
To apply a formula to any data, the user needs to select it without any space. Then in the function bar,
the user needs to type ‘=’ and the abbreviation of the formula the user wishes to apply.
Data
From the Data menu, the user can perform functions without changing the original data. Users can filter,
add external data from the web & sort data without changing it. For example, the user can sort the data
in alphabetical order.
1
Right from basic functions like addition & subtraction, the user can perform complex statistical
functions like correlation & t-test. Moreover, users can convert them into Pie charts or graphs within
moments. This makes data analysis easy.
Page Layout
Users can apply themes, orientation, and check the page setup through the page layout option.
Review
Proofreading like spell check can be performed for an excel sheet in the review section and a user can
even add comments or remarks in this part.
View
Different views and layouts in which the user wants the spreadsheet to be displayed can be selected
here. Options to zoom in and out, full screen, and pane arrangement are available under this section.
There are several features that are available in Excel to make our task more manageable. Some
of the main features are:
1. AutoFormat: It allows the Excel users to use predefined table formatting options.
2. AutoSum: AutoSum feature helps us to calculate the sum of a row or column automatically by
inserting an addition formula for a range of cells.
3. List AutoFill: It automatically develops cell formatting when a new component is added to the end
of a list.
4. AutoFill: This feature allows us to quickly fill cells with a repetitive or sequential record such as
chronological dates or numbers and repeated documents. AutoFill can also be used to copy
functions. We can also alter text and numbers with this feature.
5. AutoShapes: AutoShapes toolbar will allow us to draw some geometrical shapes, arrows, flowchart
items, stars, and more. With these shapes, we can draw our graphs.
6. Wizard: It guides us to work effectively while we work by displaying several helpful tips and
techniques based on what we are doing. Drag and Drop feature will help us to reposition the record
and text by simply dragging the data with the help of the mouse.
7. Charts: This feature will help you to present the data in graphical form by using Pie, Bar, Line
charts, and more.
8. PivotTable: It flips and sums data in seconds and allows us to execute data analysis and generating
documents like periodic financial statements, statistical documents, etc. We can also analyze
complex data relationships graphically.
9. Shortcut Menus: The shortcut menu helps users to make the work done through shortcut
commands that need a lengthy process.
2
EXP. NO:2. (i) Get the input from user and perform numerical operations (MAX, MIN, AVG, SUM,
DATE: SQRT, ROUND)
Aim:
To get the input from user and perform numerical operations (MAX,MIN, AVG, SUM,SQRT,
ROUND) in Excel
Procedures:
We wish to find out the total sales for the first six months. The formula to be used is:
• Type A1(=)
• Type 5+5
• Hit enter
3
4. Hit enter
4
4. Fill a range
Step by step:
1. Type C1(=)
2. Select B1
3. Type dollar sign before column and row $B$1
4. Type (+)
5. Select A1
6. Hit enter
7. Fill the range C1:C10
b. MAX Function
The MAX function is a premade function in Excel, which finds the highest number in a range.
It is typed =MAX
The function ignores cells with text. It will only work for cells with numbers.
c. MIN Function
The MIN function is a premade function in Excel, which finds the lowest number in a range.
It is typed =MIN
How to use the =MIN function:
1. Select a cell
2. Type =MIN
4. Select a range
5
5. Hit enter
d. AVERAGE Function
The AVERAGE function is a premade function in Excel, which calculates the average (arithmetic
mean).
It is typed =AVERAGE
It adds the range and divides it by the number of observations.
Note: The AVERAGE function ignores cells with text.
1. Select a cell
2. Type =AVERAGE
4. Select a range
5. Hit enter
6. Next, Fill
f. ROUND Function
The ROUND Formula in Excel accepts the following parameters and arguments:
6
=ROUND (A2,1) 106.9 The number is A2 is rounded to 1 decimal place.
=ROUND (A2,-1) 110 The number in A2 is rounded to the nearest multiple of 10.
=ROUND (A2-2) 100 The number in A2 is rounded to the nearest multiple of 100.
Conclusion:
Thus, to Get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT,
ROUND) output was Executed successfully.
7
EXP. NO:2. (ii) Perform data import/export operations for different file formats.
DATE:
Aim:
To Perform data import/export operations for different file formats in Excel
Procedure:
Excel can import and export many different file types aside from the standard. xslx format. If your
datais shared between other programs, like a database, you may need to save data as a different file type
orbring in files of a different file type.
Export Data
When you have data that needs to be transferred to another system, export it from Excel in a format that
can be interpreted by other programs, such as a text or CSV file.
1. Click the File tab.
• Formatted Text (space delimited): The cell data will be separated by a space.
• Save as Another File Type: Select a different file type when the Save As dialog
box appears.
The file type you select will depend on what type of file is required by the program that will consume
the exported data.
5. Click Save As.
8
6. Specify where you want to save the file.
7. Click Save.
A dialog box appears stating that some of the workbook features may be lost.
8. Click Yes.
Import Data
Excel can import data from external data sources including other files, databases, or web pages.
1. Click the Data tab on the Ribbon..
9
Some data sources may require special security access, and the connection process can
often be very complex. Enlist the help of your organization’s technical support staff for
assistance.
3. Select From File.
If you have data to import from Access, the web, or another source, select one of those
options in the Get External Data group instead.
5. Select the file you want to import.
6. Click Import.
If, while importing external data, a security notice appears saying that it is connecting
to an external source that may not be safe, click OK.
10
Because we've specified the data is separated by commas, the delimiter is already set.
If you need to change it, it can be done from this menu.
8. Click Load.
Conclusion:
Thus, the Perform data import/export operations for different file formats in Excel output was Executed
successfully.
11
EXP. NO:3 Perform statistical operations - Mean, Median, Mode and Standard deviation,
DATE: Variance,Skewness, Kurtosis
Aim:
Procedures:
In other words, it consists of measures of central tendency, variability, skewness and kurtosis.
Measures of central tendency
are used to find the single value that best describes about the entire distribution. There are three
main measures of central tendency: Mean, Median and Mode.
Measures of Variability refers to the spread or dispersion of scores. There are four main
measures of variability: Range, Inter quartile range, Standard deviation and Variance.
Examples:
Suppose you are asked to provide a figure that best describes the annual salary offered to
students in ABC College you would answer this question with a measure of central tendency
and variability.
1. If you haven't already installed the Analysis ToolPak, Click the Microsoft Office button,
then click on the Excel Options , and then select Add-Ins , Click Go, check the Analysis
ToolPak box, and click Ok. How to install Analysis ToolPak
2. Select Data tab, then click on the Data Analysis option, then selects Descriptive Statistics
from the list and Click Ok. [Data tab >> Data Analysis >> Descriptive Statistics]
12
3. In the Input Range we select the data, and then select Output Range where you want the output
to be stored. If you don’t specify the output range it will throw output in the new worksheet.
4. Check Summary Statistics and Confidence Level for Mean options. By default the confidence
level is 95%. You can change the level as per the hypothesis standard of study.
5. When you click Ok, you will see the result in the selected output range.
13
Interpretation:
The average value is 5.533. The middle value is 6 and the most frequent value is 8. Negative skewness
indicates a left skewed data. Negative kurtosis indicates a flat distribution. The 95% confidence level
indicates you can be 95% sure that the true percentage of the population lies between
In case you want to use the sample mean as representative of the population mean:
𝑺𝒂𝒎𝒑𝒍𝒆 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 / (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 – 𝟏)
Let’s take an example to understand the calculation of the Population Mean formula in a better manner.
You can download this Population Mean Template here – Population Mean Template
Example #1
Let’s say you have a data set with 10 data points, and we want to calculate Population Mean for
that.
14
Data set: {14,61,83,92,2,8,48,25,71,12}
Solution:
15
• Population Mean = (14+61+83+92+2+8+48+25+71+12) / 10
Example #2
Let’s say you want to invest in IBM and are keen to look at its past performance and returns. You
want to go back 20 years and calculate monthly returns, but that will become very hectic. So you
have decided to take a sample of the last 10 months and calculate the return and mean. You
believe that the sample you have taken correctly represents the population.
Solution:
16
So if you see here, in the last 10 months, IBM’s return has fluctuated very much.
Sample Mean is calculated using the formula given below
𝑺𝒂𝒎𝒑𝒍𝒆 𝑴𝒆𝒂𝒏 = 𝑺𝒖𝒎 𝒐𝒇 𝑨𝒍𝒍 𝒕𝒉𝒆 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 / (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒕𝒆𝒎𝒔 𝒊𝒏 𝑺𝒂𝒎𝒑𝒍𝒆 – 𝟏)
• Sample Mean = (3.74% + 1.07% +4.34% + (-23.66)% + 7.66% + (-7.36)% + 18.25% + 2.76%
+ 1.48% + 0.00%) / (10 – 1)
Conclusion:
Thus, the Perform Statistical operations-Mean, Median, Mode and Standard Deviation, Variance, Skewness, Kurtosis
using toolpak in Excel output was Executed successfully.
17
EXP. NO:4 To Perform Z-Test, T-test & ANOVA
DATE:
Aim:
To Perform Z-test, T-test & ANOVA in Excel by following Steps.
Procedures:
• Array: Test the hypothesized sample mean for the given set of values.
• X: The hypothesized sample mean, which requires a test.
• Sigma: This is an optional argument that represents the population standard deviation. If it’s
not given or unknown, use the sample standard deviation.
18
To calculate the one-tailed probability value of a Z Test for the above data, let’s assume the
hypothesized population mean is 5. Now we will use the Z TEST formula as shown below:
19
The formula below calculates the two-tailed P-value of a Z TEST for the given hypothesized
population, which is 5.
20
i.e.H0: µ1 – µ 2 = 0
H1: µ1 – µ 2 ≠ 0
Where H1 is called an alternative hypothesis, the mean of the two populations is not equal.
Let’s take an example to understand the usage of two sample Z tests.
12. Example #2
Let’s take the example of student’s marks in two different subjects.
Now we need to calculate the variance of both subjects, so we will use the below formula for this:
21
The above same formula applies for Variance 2 (Subject 2) like below:
22
• Now, go to the Data Analysis tab in the extreme upper right corner under the DATA tab as
shown below screenshot:
• Now in the Variable 1 range box, select subject 1 range from A25:A35
• Similarly, in the Variable 2 range box, select subject 2 range from B25:B35
23
• Under the Variable 1 variance box, enter cell B38 variance value.
• Under the Variable 2 variance box, enter cell B39 variance value.
• In Output Range, Select the cell where you want to see the result. Here we have passed cell E24
and then clicked on OK.
24
Explanation
• We can reject the null hypothesis if z < -z Critical two-tail or z stat > z Critical two-tail.
• Here 1.279 > -1.9599 and 1.279 < 1.9599; hence we can’t reject the null hypothesis.
• Thus, the means of both populations don’t differ significantly.
25
P- value in Excel
Statisticians and researchers commonly use the p-value when they want to analyze two data groups.
They start by considering a null hypothesis, which assumes there is no relationship between the two
data groups. This serves as their initial assumption for the experiment. Then they conduct various
statistical tests, including the p-value, and interpret the results. For instance, if the p-value is greater
than the significance level (α) of 0.05 (5%), it suggests that the two data groups are indeed related.
Therefore, the initial assumption of the null hypothesis was incorrect. Conversely, if the p-value is less
than 0.05 (5%), it indicates that the two data groups are unrelated, supporting the initial null hypothesis
assumption.
Statisticians can use Excel to quickly and easily calculate the p-value. Although Microsoft does not
have any specific or direct formula for p-value in Excel, we can use functions like T.TEST and T.DIST
for the calculation. Moreover, there is another method: Analysis ToolPak, that basically simplifies the
T.TEST function for us.
1. T.TEST Function
2. Analysis ToolPak
3. T.DIST Function
You can download this P-Value Excel Template here – P-Value Excel Template
1. T.TEST Function
Purpose: We can use the T.TEST Function to calculate the p-value in Excel by directly adding the
data ranges to the function.
Syntax:
=T.TEST(range1, range2, tails, type)
In this syntax:
• type: It determines the type of t-test. [1 for before and after comparison, 2 for an equal number
of data in all columns, and 3 for an unequal number of observations in all the data.]
Example:
Let us compare the scores of students from Class B and Class C to check if Class C students have higher
scores than Class B students. First, we need to assume the null and alternate hypotheses for this test.
26
Null Hypothesis (H0): There is no difference in the scores of both divisions.
We will calculate the p-value to determine if we should accept or reject the null hypothesis.
Solution:
Step 1: Select cell B9 and write the below formula:
=T.TEST(A2:A7,B2:B7,1,2)
27
Step 2: Press “Enter,” and Excel will calculate the p-value as 0.38692 in cell B9.
Result:
Based on the analysis, we can conclude that the p-value obtained (0.38692, approximately 39%) is
higher than the significance level (α) of 0.05 (5%). Therefore, we reject the null hypothesis (H0), which
assumes no difference in scores between the students of both classes. Consequently, we accept the
alternate hypothesis, which suggests that Class C students have higher test scores than Class B students.
Analysis ToolPaK
Purpose: The Analysis ToolPak is an Excel Add-in feature that makes it easier to perform t-tests. By
simply inputting the necessary information in a new window, ToolPak automatically calculates and
shows the p-value.
Here,
Example:
Let us compare the scores of students from Class A and Class B to investigate whether Class A students
have higher scores on their exams than students from Class B.
28
Null Hypothesis (H0): There is no difference in the average scores between both divisions.
Alternate Hypothesis (H1): Class B students have higher average scores.
We want to calculate the p-value to check if our assumption in the null hypothesis is true or false.
Consider the following data:
Solution:
Step 1: Click the “Data Analysis” option from the “Data” Tab.
Step 2: A “Data Analysis” dialogue box will open, from which you have to select “t-Test: Two-
Sample Assuming Equal Variances”. Then click “OK”.
• Variable 1 Range: A1 to A7
• Variable 2 Range: B1 to B7
• Labels: Tick the checkbox
• Output Range: E1.
29
Excel will provide a detailed result for the p-value in cells E1 to G14, as shown below:
Result:
The tool gives us two p-values: one for a one-tailed t-test and the other for a two-tailed t-test. Both p-
values, 0.48 (approximately) and 0.96 (approximately), are greater than the significance level (α)
of 0.05 (5%).
As the p-value is higher than 0.05, it indicates that the alternate hypothesis is true, suggesting that Class
B students have scored more than Class A students. Therefore, we reject the null hypothesis, which
assumes no difference between the average scores of students from both classes.
30
2. T.DIST Function
Purpose: We can use Excel’s T.DIST Function to calculate the p-value by simply adding the test
statistic value and degree of freedom.
Syntax:
= T.DIST.RT(x, degrees_freedom)
In this syntax:
• Rt: It is for calculating one-tailed (one-sided) sample data. There is another variation
T.DIST.2T for two-tailed data for comparison between two sets of data.
• x: It is the test statistic value, i.e., the value we want to study
• degrees_freedom: Degrees of freedom for the T-distribution
3. Example:
The school headmaster believes that students cannot score above 50 without attending tuition classes,
so they plan to start a special class for all students. To investigate this belief, a teacher chooses a random
group of 6 students and records their scores in a specific subject. The average score of this group is
70.5, with a standard deviation of 19.06.
We need to calculate the p-value using the t-dist function to see if the headmaster’s assumption (students
that do not attend tuition classes cannot score more than 50) is true or not.
Null Hypothesis H(0): Students can score higher than 50 even without attending tuition.
Alternate Hypothesis H(1): Students score lower than 50 if they do not attend tuition.
Consider the following data:
Solution:
Step 1: Calculate the test statistic value (x) using the following formula,
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = (𝑥̅ − 𝜇) / (𝑠/√𝑛)
We can write the above formula in the Excel format as follows:
Enter the above formula in cell B12 and press “Enter”. As a result, Excel will show the x value as 2.634
(approx).
31
Step 2: Calculate degrees of freedom in cell B13 as follows:
𝑫𝒆𝒈𝒓𝒆𝒆 𝒐𝒇 𝒇𝒓𝒆𝒆𝒅𝒐𝒎 = 𝑁𝑜. 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (𝑛) – 1
=6– 1
=𝟓
Step 3: Now, we will find the p-value in Excel using the T.DIST Function.
=T.DIST.RT(B12,B13)
32
• Press “Enter,” and Excel will display the p-value as 023161286 in cell B14.
Result:
We can see that the p-value of 0.023 (2.3%) is less than the significance value (α) of 0.05 (5%).
Therefore, the null hypothesis is correct; the students can score more than 50 without attending tuition.
Thus, we accept the null hypothesis.
33
Anova using Excel
This example teaches you how to perform a single factor ANOVA (analysis of variance) in Excel. A
single factor or one-way ANOVA is used to test the null hypothesis that the means of several
populations are all equal.
Below you can find the salaries of people who have a degree in economics, medicine or history.
𝐻0: 𝜇1 = 𝜇2 = 𝜇3
H1: at least one of the means is different.
Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak add-in.
2. Select Anova: Single Factor and click OK.
3. Click in the Input Range box and select the range A2:C10.
4. Click in the Output Range box and select cell E1.
34
5. Click OK.
Result:
Conclusion:
if F > F crit, we reject the null hypothesis. This is the case, 15.196 > 3.443. Therefore, we reject
the null hypothesis. The means of the three populations are not all equal. At least one of the meansis
different. However, the ANOVA does not tell you where the difference lies. You need a t-Test to test
each pair of means.
35
EXP. NO:5 To perform Data Pre-processing operations in Handling data and normalization
DATE:
Aim:
To perform Data Pre-processing operations in Handling data and normalization in Excel.
Procedures:
Data preprocessing is a kind of process in data analysis. It is used to clean and transform raw
data intouseful information that can be used by computers. Before analyzing the data, we need to
make sure that the data should be clean and useful. Data preprocessing helps to improve the quality of
data, consistency of the data, and compatibility.
Data Preprocessing helps in many ways:
• It helps in eliminating errors.
• It helps in handling the missing values.
• It helps in removing duplicates.
• It helps in standardizing formats.
ISNA or ISBLANK
We can choose all those rows which are having missing values. We can also replace them with
appropriate substitutes.
Removing Duplicates
In this step, we need to remove the duplicates from the data. Duplicates can lead us to skewed
analysis results. Excel offers a simple way to remove duplicates. First, we need to select the data
range and go to the Data tab. Then click on the Remove Duplicates button. Then we can choose the
columns to check for duplicates. Excel will remove duplicate rows, keeping only unique values.
Standardizing Formats
In this step, we need to standardize the formats. Inconsistent data formats can create some challenges
for us during analysis. That’s why Excel allows you to standardize formats. We can use the features of
Excel like cell formatting, text functions (e.g., PROPER, UPPER, LOWER), and data validation
rules.
Filtering and Sorting
In this step, we need to filter and sort the data. Excel's filtering and sorting capabilities help explore
and organize large datasets. The Filter function allows you to display specific subsets of data based
on criteria. Sorting data in ascending or descending order can be done using the Sort function.
Example of Data Preprocessing
Suppose we have data of our Ninjas in an Excel spreadsheet, and we want to do data preprocessing;
for this, we need to follow the above steps:
Step 1: Data Collection
We have gathered the information about our Ninjas in an Excel spreadsheet.
36
Step 2: Data Cleaning
Now, we need to clean the data. Suppose we are using TRIM to remove irregular text spacing and
keep single spaces between words.
37
Now, we can do the same operation for every row to correct the spacing or we can scroll down the
operational column.
38
This will give the following output after scrolling up to the last row.
So, we have to fill all those rows that are empty or use some proper substitutes. After filling them, we
can move to our next step.
Step 4: Removing duplicates
Now, we have to remove all the duplicates that are present in a spreadsheet. So, we have to select all
the data and go to the Data tab and find the Remove Duplicates button.
39
Now, click on the OK button. Then it will give you
If you have duplicate values, it will give a message “duplicate values found and removed.”
Step 5: Standardizing formats
40
Now, we have to standardize the formats so that it cannot create a problem during analysis. Suppose
we want to make sure Courses should be in capital letters only. For this, we can use UPPER.
41
Let's sort this from smallest to largest.
Now we have preprocessed data finally in Excel. Now, let us understand the advantages and
disadvantages of data preprocessing in Excel.
Advantages of Data Preprocessing
• Excel provides a user-friendly interface so that we can easily do data preprocessing and other
data analysis tasks.
• Excel offers a wide range of functions and features that helps in different data preprocessing
needs.
• Excel is widely available, that’s why it is commonly used for data preprocessing.
• Excel integrates well with other Microsoft Office applications, facilitating seamless data
transfer and collaboration.
Disadvantages of Data Preprocessing
• Excel may not be suitable for handling large datasets.
• Excel's analytical capabilities are robust but may not match those offered by specialized
statistical or data analysis software.
• Data preprocessing tasks in Excel often require manual execution.
Conclusion:
Thus, the perform Data Pre-processing operations in Handling data and normalization in Excel output was Executed
successfully.
42
EXP. NO:6 To perform dimensionality reduction operation using PCA, KPCA,SVD
DATE:
Aim:
To perform dimensionality reduction operation using PCA, KPCA,SVD
Procedure:
The school system of a major city wanted to determine the characteristics of a great teacher, and
so theyasked 120 students to rate the importance of each of the following 9 criteria using a Likert scale
of 1 to10 with 10 representing that a particular characteristic is extremely important and 1 representing
that the characteristic is not important.
The sample covariance matrix S is shown in Figure 3 and can be calculated directly as
Here B4:J123 is the range containing all the evaluation scores and B126:J126 is the range containing
the means for each criterion. Alternatively, we can simply use the Real Statistics formula
COV(B4:J123) to produce the same result.
43
Figure 3 – Covariance Matrix
In practice, we usually prefer to standardize the sample scores. This will make the weights of the nine
criteria equal. This is equivalent to using the correlation matrix. Let R = [rij] where rij is the correlation
between xi and xj, i.e.
The sample correlation matrix R is shown in Figure 4 and can be calculated directly as
=MMULT(TRANSPOSE((B4:J123-B126:J126)/B127:J127),(B4:J123-
B126:J126)/B127:J127)/(COUNT(B4:B123)-1)
Here B127:J127 is the range containing the standard deviations for each criterion. Alternatively, we can
simply use the Real Statistics function CORR(B4:J123) to produce the same result.
Note that all the values on the main diagonal are 1, as we would expect since the variances have been
standardized. We next calculate the eigenvalues for the correlation matrix using the Real Statistics
eigVECTSym(M4:U12) formula, as described in Linear Algebra Background. The result appears in
range M18:U27 of Figure 5.
44
Figure 5 – Eigenvalues and eigenvectors of the correlation matrix
The first row in Figure 5 contains the eigenvalues for the correlation matrix in Figure 4. Below each
eigenvalue is a corresponding unit eigenvector. E.g. the largest eigenvalue is λ1 = 2.880437.
Corresponding to this eigenvalue is the 9 × 1 column eigenvector B1 whose elements are 0.108673, -
0.41156, etc.
As we described above, coefficients of the eigenvectors serve as the regression coefficients of the 9
principal components. For example, the first principal component can be expressed by
Thus for any set of scores (for the xj) you can calculate each of the corresponding principal components.
Keep in mind that you need to standardize the values of the xj first since this is how the correlation
matrix was obtained. For the first sample (row 4 of Figure 1), we can calculate the nine principal
components using the matrix equation Y = BX′ as shown in Figure 6.
Here B (range AI61:AQ69) is the set of eigenvectors from Figure 5, X (range AS61:AS69) is simply
the transpose of row 4 from Figure 1, X′ (range AU61:AU69) standardizes the scores in X (e.g. cell
AU61 contains the formula =STANDARDIZE(AS61, B126, B127), referring to Figure 2) and Y (range
AW61:AW69) is calculated by the formula
=MMULT(TRANSPOSE(AI61:AQ69),AU61:AU69)
45
Thus the principal component values corresponding to the first sample are 0.782502 (PC1), -1.9758
(PC2), etc.
As observed previously, the total variance for the nine random variables is 9 (since the variance was
standardized to 1 in the correlation matrix), which is, as expected, equal to the sum of the nine
eigenvalues listed in Figure 5. In fact, in Figure 7 we list the eigenvalues in decreasing order and show
the percentage of the total variance accounted for by that eigenvalue.
The values in column M are simply the eigenvalues listed in the first row of Figure 5, with cell M41
containing the formula =SUM(M32:M40) and producing the value 9 as expected. Each cell in column
N contains the percentage of the variance accounted for by the corresponding eigenvalue. E.g. cell N32
contains the formula =M32/M41, and so we see that 32% of the total variance is accounted for by the
largest eigenvalue. Column O simply contains the cumulative weights, and so we see that the first four
eigenvalues account for 72.3% of the variance.
Using Excel’s charting capability, we can plot the values in column N of Figure 7 to obtain a graphical
representation, called a scree plot.
46
Figure 8 – Scree Plot
We decide to retain the first four eigenvalues, which explain 72.3% of the variance. In section Basic
Concepts of Factor Analysis we will explain in more detail how to determine how many eigenvalues to
retain. The portion of Figure 5 that refers to these eigenvalues is shown in Figure 9. Since all but the
Expect value for PC1 is negative, we first decide to negate all the values. This is not a problem since
the negative of a unit eigenvector is also a unit eigenvector.
Those values that are sufficiently large, i.e. the values that show a high correlation between the principal
components and the (standardized) original variables, are highlighted. We use a threshold of ±0.4 for
this purpose.
This is done by highlighting the range R32:U40 and selecting Home > Styles|Conditional Formatting
and then choosing Highlight Cell Rules > Greater Than and inserting the value .4 and then selecting
Home > Styles|Conditional Formatting and then choosing Highlight Cell Rules > Less Than and
inserting the value -.4.
Note that Entertainment, Communications, Charisma and Passion are highly correlated with PC1,
Motivation and Caring are highly correlated with PC3 and Expertise is highly correlated with PC4.
Also, Expectation is highly positively correlated with PC2 while Friendly is negatively correlated with
PC2.
Ideally, we would like to see that each variable is highly correlated with only one principal component.
As we can see from Figure 9, this is the case in our example. Usually, this is not the case, however, and
we will show what to do about this in the Basic Concepts of Factor Analysis when we discuss rotation
in Factor Analysis.
In our analysis, we retain 4 of the 9 principal factors. As noted previously, each of the principal
components can be calculated by
47
If we retain only m principal components, then Y = BTX where Y is an m × 1 vector, B is a k × m matrix
(consisting of the m unit eigenvectors corresponding to the m largest eigenvalues) and X′ is the k × 1
vector of standardized scores as before. The interesting thing is that if Y is known we can calculate
estimates for standardized values for X using the fact that X′ = BBTX’ = B(BTX′) = BY (since B is an
orthogonal matrix, and so, BBT = I). From X′ it is then easy to calculate X.
In Figure 10 we show how this is done using the four principal components that we calculated from the
first sample in Figure 6. B (range AN74;AQ82) is the reduced set of coefficients (Figure 9), Y (range
AS74:AS77) are the principal components as calculated in Figure 6, X′ are the estimated standardized
values for the first sample (range AU74:AU82) using the formula =
𝑀𝑀𝑈𝐿𝑇(𝐴𝑁74: 𝐴𝑄82, 𝐴𝑆74: 𝐴𝑆77) and finally, X are the estimated scores in the first sample (range
AW74:AW82) using the formula = 𝐴𝑈74: 𝐴𝑈82 ∗ 𝑇𝑅𝐴𝑁𝑆𝑃𝑂𝑆𝐸(𝐵127: 𝐽127) +
𝑇𝑅𝐴𝑁𝑆𝑃𝑂𝑆𝐸(𝐵126: 𝐽126).
Conclusion:
Thus, the perform dimensionality reduction operation using PCA, KPCA,SVD in Excel output was successfully
Executed.
48
EXP. NO:7 To Perform Bivariate and multivariate Analysis on the dataset
DATE:
Aim:
To Perform Bivariate Analysis and multivariate in Excel
Procedure:
The term bivariate analysis refers to the analysis of two variables. You can remember this
becausethe prefix “bi” means “two.”
The purpose of bivariate analysis is to understand the relationship between two variables
1. Scatterplots
2. Correlation Coefficients
The following example shows how to perform each of these types of bivariate analysis in Excel using
the following dataset that contains information about two variables: (1) Hours spent studying
and (2) Exam score received by 20 different students:
49
1. Scatterplots
To create a scatterplot of hours vs. score, we can highlight cells A2:B21, then click the Insert tab
along the top ribbon, then click Insert Scatter Chart within the Charts group:
We can also modify the y-axis limits to gain a better view of the data points.
To do so, double click the y-axis. In the Format Axis panel that appears on the right side of the
screen, click Axis Options and then change the Minimum and Maximum bounds to 60 and 100,
respectively.
The x-axis shows the hours studied and the y-axis shows the exam score received.
50
From the plot we can see that there is a positive relationship between the two variables. As hours
studied increases, exam score tends to increase as well.
2. Correlation Coefficients
A Pearson Correlation Coefficient is a way to quantify the linear relationship between two variables.
We can use the following formula in Excel to calculate the correlation coefficient between hours
studied and exam score:
=CORREL(A2:A21, B2:B21)
This value is close to 1, which indicates a strong positive correlation between hours studied and exam
score received.
Simple linear regression is a statistical method we can use to quantify the relationship between two
variables.
51
To fit a simple linear regression model in Excel, click the Data tab along the top ribbon, then click
the Data Analysis option in the Analyze group. In the new panel that appears, click Regression and
then click OK.
Note: If you don’t see the Data Analysis option, you need to first load the Excel Analysis ToolPak.
In the panel that appears, enter the following information and then click OK:
Once you click OK, the results of the regression model will appear:
52
The fitted regression equation turns out to be:
This tells us that each additional hour studied is associated with an average increase of 3.8471 in
exam score.
We can also use the regression equation to estimate the score that a student will receive based on their
total hours studied.
For example, a student who studies for 3 hours is estimated to receive a score of 81.6147:
Conclusion:
Thus, the Perform Bivariate Analysis and multivariate in Excel output was Executed successfully.
53
EXP. NO:8 Apply and explore various plotting functions on the data set
DATE:
Aim:
To apply and explore various plotting functions on the data set in excel.
Procedure:
Visualizing Data with Charts
In Excel, charts are used to make a graphical representation of any set of data. A chart is a visual
representation of the data, in which the data is represented by symbols such as bars in a Bar Chart or
lines in a Line Chart. Excel provides you with many chart types and you can choose one that suits
your data or you can use the Excel Recommended Charts option to view charts customized to your
data and select one of those.
Suppose you have the target and actual profits for the fiscal year 2015-2016 that you obtained from
different regions.
54
As you observe, it is difficult to visualize the comparison quickly between the targets and actual in
this chart. It does not give a true impact on your results.
Use Vertical Columns for the target values and a Line with Markers for the actual values.
• Click the DESIGN tab under the CHART TOOLS tab on the Ribbon.
• Click Change Chart Type in the Type group. The Change Chart Type dialog box appears.
• Click Combo.
• Change the Chart Type for the series Actual to Line with Markers. The preview appears under
Custom Combination.
• Click OK.
55
Your Customized Combination Chart will be displayed.
As you observe in the chart, the Target values are in Columns and the Actual values are marked along
the line. The data visualization has become better as it also shows you the trend of your results.
However, this type of representation does not work well when the data ranges of your two data values
vary significantly.
56
Creating a Combo Chart with Secondary Axis
Suppose you have the data on the number of units of your product that was shipped and the actual
profits for the fiscal year 2015-2016 that you obtained from different regions.
If you use the same combination chart as before, you will get the following −
In the chart, the data of No. of Units is not visible as the data ranges are varying significantly.
In such cases, you can create a combination chart with secondary axis, so that the primary axis
displays one range and the secondary axis displays the other.
57
The Insert Chart dialog box appears with Combo highlighted.
58
Your Combo chart appears with Secondary Axis.
You can observe the values for Actual Profits on the primary axis and the values for No. of Units on
the secondary axis.
A significant observation in the above chart is for Quarter 3 where No. of Units sold is more, but the
Actual Profits made are less. This could probably be assigned to the promotion costs that were
incurred to increase sales. The situation is improved in Quarter 4, with a slight decrease in sales and a
significant rise in the Actual Profits made.
Suppose you want to project the Actual Profits made in Years 2013-2016.
59
As you observe, the data visualization is not effective as the years are not displayed. You can
overcome this by changing year to category.
Now, year is considered as a category and not a series. Your chart looks as follows −
60
.
61
You can use Trendline to graphically display trends in data. You can extend a Trendline in a chart
beyond the actual data to predict future values.
• Data Labels
Excel 2013 and later versions provide you with various options to display Data Labels. You can
choose one Data Label, format it as you like, and then use Clone Current Label to copy the formatting
to the rest of the Data Labels in the chart.
62
The Data Labels in a chart can have effects, varying shapes and sizes.
It is also possible to display the content of a cell as part of the Data Label with Insert Data Label
Field.
Quick Layout
You can use Quick Layout to change the overall layout of the chart quickly by choosing one of the
predefined layout options.
Different possible layouts will be displayed. As you move on the layout options, the chart layout
changes to that particular option.
Select the layout you like. The chart will be displayed with the chosen layout.
63
Using Pictures in Column Charts
You can create more emphasis on your data presentation by using a picture in place of columns.
The picture you have chosen will appear in place of columns in the chart.
64
•
Band Chart
You might have to present customer survey results of a product from different regions. Band Chart is
suitable for this purpose. A Band Chart is a Line Chart with an added shaded area to display the upper
and lower boundaries of groups of data.
Suppose your customer survey results from the east and west regions, month wise are −
65
Here, in the data < 50% is Low, 50% - 80% is Medium and > 80% is High.
With Band Chart, you can display your survey results as follows −
66
Change the chart type to −
67
• Click on one of the columns.
• Change gap width to 0% in Format Data Series.
68
To make the chart more presentable −
The final result is the Band Chart with the defined boundaries and the survey results represented
across the bands. One can quickly and clearly make out from the chart that while the survey results for
the region West are satisfactory, those for the region East have a decline in the last quarter and need
attention.
69
Thermometer Chart
When you have to represent a target value and an actual value, you can easily create a Thermometer
Chart in Excel that emphatically shows these values.
70
• Select the data.
• Create a Clustered Column chart.
71
• Right click on the Target Column.
• Click on Format Data Series.
• Click on Secondary Axis.
As you observe the Primary Axis and Secondary Axis have different ranges.
72
Both Primary Axis and Secondary Axis will be set to 0% - 100%. The Target Column hides the
Actual Column.
73
• Right click on the Chart Area.
• In the Format Chart Area options, select
o No fill for FILL
o No line for BORDER
74
You got your thermometer chart, with the actual value as against target value being shown. You can
make this thermometer chart more impressive with some formatting.
• Insert a rectangle shape superimposing the blue rectangular part in the chart.
• In Format Shape options, select −
o Gradient fill for FILL
o Linear for Type
o 1800 for Angle
• Set the Gradient stops at 0%, 50% and 100%.
• For the Gradient stops at 0% and 100%, choose the color black.
• For the Gradient stop at 50%, choose the color white.
75
• Insert an oval shape at the bottom.
• Format shape with same options.
Gantt Chart
76
A Gantt chart is a chart in which a series of horizontal lines shows the amount of work done in certain
periods of time in relation to the amount of work planned for those periods.
In Excel, you can create a Gantt chart by customizing a Stacked Bar chart type so that it depicts tasks,
task duration, and hierarchy. An Excel Gantt chart typically uses days as the unit of time along the
horizontal axis.
Note that Start of any Task is Start of previous Task + Duration. This is the case when the Tasks are
in hierarchy.
77
• Right-click on Start Series.
• In Format Data Series options, select No fill.
78
• In Chart Elements, deselect
o Legend
o Gridlines
• Format the Horizontal Axis to
o Adjust the range
o Major Tick Marks at 5 day intervals
o Minor Tick Marks at 1 day intervals
• Format Data Series to make it look impressive
• Give a Chart Title
Waterfall Chart
Waterfall Chart is one of the most popular visualization tools used in small and large businesses.
Waterfall charts are ideal for showing how you have arrived at a net value such as net income, by
breaking down the cumulative effect of positive and negative contributions.
Excel 2016 provides Waterfall Chart type. If you are using earlier versions of Excel, you can still
create a Waterfall Chart using Stacked Column Chart.
The columns are color coded so that you can quickly tell positive from negative numbers. The initial
and the final value columns start on the horizontal axis, while the intermediate values are floating
columns. Because of this look, Waterfall Charts are also called Bridge Charts.
79
• Prepare the data for Waterfall Chart
• Ensure the column Net Cash Flow is to the left of the Months Column (This is because you
will not include this column while creating the chart)
• Add 2 columns – Increase and Decrease for positive and negative cash flows respectively
• Add a column Start - the first column in the chart with the start value in the Net Cash Flow
• Add a column End - the last column in the chart with the end value in the Net Cash Flow
• Add a column Float – that supports the intermediate columns
• Compute the values for these columns as follows
• In the Float column, insert a row in the beginning and at the end. Place n arbitrary value
50000. This just to have some space to the left and right of the chart
80
• Select the cells C2:H18 (Exclude Net Cash Flow column)
• Create Stacked Column Chart
81
• Right click on Negative Series.
• Select Fill Color as Red.
82
• Right click on Start Series.
• Select Fill Color as Grey.
• Right click on End Series.
• Select Fill Color as Grey.
• Delete the Legend.
83
Give the Chart Title. The Waterfall Chart will be displayed.
Sparklines
Sparklines are tiny charts placed in single cells, each representing a row of data in your selection.
They provide a quick way to see trends.
84
Quick Analysis button appears at the bottom right of your selected data.
• Click on the Quick Analysis button. The Quick Analysis Toolbar appears with various
options.
85
Click SPARKLINES. The chart options displayed are based on the data and may vary.
Click Line. A Line Chart for each row is displayed in the column to the right of the data.
86
PivotCharts
Pivot Charts are used to graphically summarize data and explore complicated data.
A PivotChart shows Data Series, Categories, and Chart Axes the same way a standard chart does.
Additionally, it also gives you interactive filtering controls right on the chart so that you can quickly
analyze a subset of your data.
PivotCharts are useful when you have data in a huge PivotTable, or many complex worksheet data
that includes text and numbers. A PivotChart can help you make sense of this data.
• A PivotTable.
• A Data Table as a standalone without PivotTable.
87
Select Clustered Column from the option Column.
88
The PivotChart has three filters – Region, Salesperson and Month.
• Click the Region Filter Control option. The Search Box appears with the list of all Regions.
Check boxes appear next to Regions.
• Select East and South options.
The filtered data appears on both the PivotChart and the PivotTable.
89
PivotChart without a PivotTable
You can choose a cell in the existing worksheet itself, or in a new worksheet. Click OK.
An empty PivotChart and an empty PivotTable appear along with the PivotChart Field List to build
the PivotChart.
90
• Choose the Fields to be added to the PivotChart
• Arrange the Fields by dragging them into FILTERS, LEGEND (SERIES), AXIS
(CATEGORIES) and VALUES
• Use the Filter Controls on the PivotChart to select the Data to be placed on the PivotChart
Conclusion:
Thus, the various plotting functions on the data set in Excel output was Executed successfully.
91
EXP. NO:9 Explore the features of power BI Desktop
DATE:
The data models created from Power BI can be used in several ways for organizations, including the
following:
• telling stories through charts and data visualizations;
• examining "what if" scenarios within the data; and
• creating reports that can answer questions in real time and help with forecasting to make sure
departments meet business metrics.
Power BI can also provide executive dashboards for administrators or managers, giving
management more insight into how departments are doing.
92
Key features of Power BI
Microsoft has added a number of data analytics features to Power BI since its inception, and continues
to do so. Some of the most important features are the following:
• Artificial intelligence. Users can access image recognition and text analytics in Power BI,
create machine learning models using automated ML capabilities and integrate with Azure
Machine Learning.
• Hybrid deployment support. This feature provides built-in connectors that allow Power BI
tools to connect with a number of different data sources from Microsoft, Salesforce and other
vendors.
• Quick Insights. This feature allows users to create subsets of data and automatically apply
analytics to that information.
• Common data model support. Power BI's support for the common data model allows the
use of a standardized and extensible collection of data schemas (entities, attributes and
relationships).
• Cortana integration. This feature, which is especially popular on mobile devices, allows
users to verbally query data using natural language and access results using Cortana,
Microsoft's digital assistant.
• Customization. This feature allows developers to change the appearance of default
visualization and reporting tools and import new tools into the platform.
• APIs for integration. This feature provides developers with sample code and application
program interfaces (APIs) for embedding the Power BI dashboard in other software products.
• Self-service data prep. Using Power Query, business analysts can ingest, transform, integrate
and enrich big data into the Power BI web service. Ingested data can be shared across
multiple Power BI models, reports and dashboards.
• Modeling view. This allows users to divide complex data models by subject area into
separate diagrams, multiselect objects and set common properties, view and modify properties
in the properties pane, and set display folders for simpler consumption of complex data
models.
Conclusion:
Thus, the Explore the features of power BI was Studied.
93
EXP. NO:10 Prepare and Load Data
DATE:
Aim:
To Prepare and Load data using in Microsoft Power BI tool.
Procedure:
1. Go to the Get Data and from where you're getting the data and the format of the data is
needed to choose the source of data.
2. Then Open the source load your data into Power Bi.
94
3. Select the columns to be in your data which is to be visualized and then click on load button
to load your data.
4. Here the data is processing to get the data from Excel to Power Bi.
95
Conclusion:
Thus, the Prepare and Load data using in Microsoft Power BI tool was created and executed successfully
96
EXP. NO:11 Develop the Data Model
DATE:
Aim:
To develop the data model using Microsoft power BI tool
Procedure:
1. Now go to Modelling option in your Ribbon bar.
2. Then go to Manage Relationships to verify the relation between the tables is as
per requirement or not.
3. Now close this relations and open model view to visualize the relationships
between the tables.
97
4. Now these are not as per our requirement, So delete every single relationship by
selecting all relations in Manage Relationships.
98
5. After deleting every relationship, Create the relationships which are needed for our Business
Model.
Conclusion:
Thus, the develop the data model using in Microsoft Power BI was created and executed successfully
99
EXP. NO:12 Perform the DAX Calculations
DATE:
Aim:
To Perform the DAX Calculations using Microsoft Power BI
Procedure:
1. Create a new column to represent the data in a new way as per requirement.
2. In the formula box, type the condition to be applied and to be displayed as new column.
3. Create new measure to represent the data in a new way as per requirement.
100
4. In the formula box, type the condition to be applied and viewed as the new measure.
Conclusion:
Thus, the Perform the DAX Calculations using Microsoft Power BI was created and executed successfully.
101
EXP. NO:13 Design a report
DATE:
Aim:
To design a report in Microsoft power BI
Procedure:
102
2. To add a header for the report, add the text box which in the home ribbon.
3. To create a slicer, select slicer from the build visual and add employee first name .
4. Create one more slicer to represent the customer’s first name.
103
5. To create a map, select map from the build visual and add ship city as the location, count of
order id as the bubble sizes.
6. To create a line chart , select line chart from the build visual and add count of order id in y-
axis and order date in the x-axis .
104
7. Create a table to display the required columns from the data.
8. In the table display the Order ID, Employee First-Name, Customer First-Name.
To create a Donut chart, select the Donut chart from the build visual.
105
9. In Donut chart, display Count of Order Id and Status Id.
Conclusion:
Thus, the Design a report using Microsoft Power BI was created and executed successfully.
106
EXP. NO:14 Create Dashboard and perform data analysis
DATE:
Aim:
To Create Dashboard and perform data analysis in Microsoft Power BI tool.
Procedure:
Conclusion:
This is one of the ways to customize a Dashboard based on our requirements.
107