0% found this document useful (0 votes)
128 views12 pages

Bafbana Module 5

bafbana module
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views12 pages

Bafbana Module 5

bafbana module
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MODULE 5

Creating Distribution of Data

TABLE OF CONTENTS

MODULE OUTLINE

Overview 3
Module Duration 3
Learning Objectives 3
Input Information 3
Learning Activities 3
Assessment/Evaluation 3
Assignment 3
Learning Resources 3

MODULE PROPER

Frequency Distributions for Categorical Data 4


Relative Frequency and Percent Frequency Distributions 5
Frequency Distributions for Quantitative Data 6
Histograms 8
Cumulative Distributions 11

San Mateo Municipal College Module 5/BAFBANA1/Page 2


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Creating Distribution of Data
MODULE 5 OUTLINE

OVERVIEW

Distributions help summarize many characteristics of a data set by describing how often certain values for a variable appear in a data
set. Distributions can be created for both categorical and quantitative data, and they assist the analyst in gauging variation.

MODULE DURATION

I. October 25 1 to November 6, 2021- Synchronous Meeting and Asynchronous Learning


II. For asynchronous learning inquiries, you may reach me through messenger group/personal message from Monday thru
Thursday at 5:00PM to 8:00PM.
LEARNING OBJECTIVES

After completing this module, you are expected to:


I. determine frequency distributions for Categorical Data;
II. compute relative Frequency and Percent Frequency Distributions;
III. compute frequency distributions for Quantitative Data; and
IV. display histograms.

INPUT INFORMATION

Creating Distribution from Data

LEARNING ACTIVITIES

I. Collaborative discussion during synchronous meeting.


II. Asynchronous Learning

ASSESSMENT/EVALUATION
I. Synchronous Test with time limit.
A long test link will be provided through our group chat. This is a synchronous test with a time limit.

II. Asynchronous Learning


Individuat Activity: See Assignment below.
Group Activity: See assignment below.

ASSIGNMENT
Individual Activity: Create a Frequency/ Percent Distribution TV Shows Programs.

Group Activity: Create a Histogram and Shape of Distribution on CEO Time.

LEARNING RESOURCES

Book/E-book:
1. Fundamentals of Business Intelligence by Wilfried Grossmann, Stefanie Rinderle-Ma, Springer Heidelberg New York
Dordrecht London© Springer-Verlag Berlin Heidelberg 2015
2. Introduction to Business Analytics by Jonathan P. Pinder, Copyright,© 2017 Elsevier Inc., England

San Mateo Municipal College Module 5/BAFBANA1/Page 3


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
BUSINESS ANALYTICS – Creating Distribution Table
MODULE 5 PROPER

Frequency Distributions for Categorical Data

It is often useful to create a frequency distribution for a data set. A frequency distribution is a summary of data that shows the
number (frequency) of observations in each of several non-overlapping classes, typically referred to as bins, when dealing
with distributions.

Consider the data in Table 2.3, taken from a sample of 50 soft drink purchases. Each purchase is for one of five
popular soft drinks, which define the five bins: Coca-Cola, Diet Coke, Dr. Pepper, Pepsi, and Sprite. To develop a frequency
distribution for these data, we count the number of times each soft drink appears in Table 2.3. Coca-Cola appears 19 times,
Diet Coke appears 8 times, Dr. Pepper appears 5 times, Pepsi appears 13 times, and Sprite appears 5 times. These
counts are summarized in the frequency distribution in Table 2.4. This frequency distribution provides a summary of how the
50 soft drink purchases are distributed across the five soft drinks. This summary offers more insight than the original data
shown in Table 2.3. The frequency distribution shows that Coca-Cola is the leader, Pepsi is second, Diet Coke is third, and
Sprite and Dr. Pepper are tied for fourth. The frequency distribution thus summarizes information about the popularity of the
five soft drinks.

We can use Excel to calculate the frequency of categorical observations occurring in a data set using the COUNTIF function.
Figure 2.9 shows the sample of 50 soft drink purchases in an Excel spreadsheet. Column D contains the five different soft
drink categories as the bins. In cell E2, we enter the formula =COUNTIF ($A$2:$B$26, D2), where A2:B26 is the range
for the sample data, and D2 is the bin (Coca-Cola) that we are trying to match. The COUNTIF function in Excel counts the
number of times a certain value appears in the indicated range. In this case we want to count the number of times Coca-Cola
appears in the sample data. The result is a value of 19 in cell E2, indicating that Coca-Cola appears 19 times in the sample
data. We can copy the formula from cell E2 to cells E3 to E6 to get frequency counts for Diet Coke, Pepsi, Dr. Pepper, and
Sprite. By using the absolute reference $A$2:$B$26 in our formula, Excel always searches the same sample data for the
values we want when we copy the formula.

San Mateo Municipal College Module 5/BAFBANA1/Page 4


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Relative Frequency and Percent Frequency Distributions

A frequency distribution shows the number (frequency) of items in each of several non-overlapping bins. However,
we are often interested in the proportion, or percentage, of items in each bin. The relative frequency of a bin equals the fraction
or proportion of items belonging to a class. For a data set with n observations, the relative frequency of each bin
can be determined as follows:

San Mateo Municipal College Module 5/BAFBANA1/Page 5


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
A relative frequency distribution is a tabular summary of data showing the relative frequency for each bin. A percent
frequency distribution summarizes the percent frequency of the data for each bin. Table 2.5 shows a relative frequency
distribution and a percent frequency distribution for the soft drink data. Table 2.4 shows that the relative frequency for Coca-
Cola is 19/50 5 0.38, the relative frequency for Diet Coke is 8/50 5 0.16, and so on. From the percent frequency distribution,
we see that 38 percent of the purchases were Coca-Cola, 16 percent of the purchases were Diet Coke, and so on. We can
also note that 38 percent + 26 percent + 16 percent 5 80 percent of the purchases were the top three soft drinks.

A percent frequency distribution can be used to provide estimates of the relative likelihoods of different values of a
random variable. So, by constructing a percent frequency distribution from observations of a random variable, we can estimate
the probability distribution that characterizes its variability. For example, the volume of soft drinks sold by
a concession stand at an upcoming concert may not be known with certainty. However, if the data used to construct Table
2.5 are representative of the concession stand’s customer population, then the concession stand manager can use this
information to determine the appropriate volume of each type of soft drink.

Frequency Distributions for Quantitative Data

We can also create frequency distributions for quantitative data, but we must be more careful in defining the non-
overlapping bins to be used in the frequency distribution. For example, consider the quantitative data in Table 2.6. These
data show the time in days required to complete year-end audits for a sample of 20 clients of Sanderson and Clifford, a
small public accounting firm. The three steps necessary to define the classes for a frequency distribution with quantitative
data are:
1. Determine the number of non-overlapping bins.
2. Determine the width of each bin.
3. Determine the bin limits.

Let us demonstrate these steps by developing a frequency distribution for the audit time data in Table 2.6.

Number of bins

Bins are formed by specifying the ranges used to group the data. As a general guideline, we recommend using between 5
and 20 bins. For a small number of data items, as few as five or six bins may be used to summarize the data. For a larger
number of data items, more bins are usually required. The goal is to use enough bins to show the variation in the data, but
not so many classes that some contain only a few data items. Because the number of data items in Table 2.6 is relatively
small (n= 20), we chose to develop a frequency distribution with five bins.

San Mateo Municipal College Module 5/BAFBANA1/Page 6


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Width of the bins

Second, choose a width for the bins. As a general guideline, we recommend that the width be the same for each bin. Thus
the choices of the number of bins and the width of bins are not independent decisions. A larger number of bins means
a smaller bin width and vice versa. To determine an approximate bin width, we begin by identifying the largest and smallest
data values. Then, with the desired number of bins specified, we can use the following expression to determine the
approximate bin width.

The approximate bin width given by equation (2.1) can be rounded to a more convenient value based on the preference of
the person developing the frequency distribution. For example, an approximate bin width of 9.28 might be rounded to 10
simply because 10 is a more convenient bin width to use in presenting a frequency distribution.

For the data involving the year-end audit times, the largest data value is 33, and the smallest data value is 12. Because we
decided to summarize the data with five classes, using equation (2.1) provides an approximate bin width of (33 - 12)/5 = 4.2.
We therefore decided to round up and use a bin width of five days in the frequency distribution.

In practice, the number of bins and the appropriate class width are determined by trial and error. Once a possible number of
bins is chosen, equation (2.1) is used to find the approximate class width. The process can be repeated for a different number
of bins. Ultimately, the analyst uses judgment to determine the combination of the number of bins and
bin width that provides the best frequency distribution for summarizing the data.

For the audit time data in Table 2.6, after deciding to use five bins, each with a width of five days, the next task is to specify
the bin limits for each of the classes.

Bin limits

Bin limits must be chosen so that each data item belongs to one and only one class. The lower bin limit identifies the smallest
possible data value assigned to the bin. The upper bin limit identifies the largest possible data value assigned to the class. In
developing frequency distributions for qualitative data, we did not need to specify bin limits because
each data item naturally fell into a separate bin. But with quantitative data, such as the audit times in Table 2.6, bin limits are
necessary to determine where each data value belongs. Using the audit time data in Table 2.6, we selected 10 days as the
lower bin limit and 14 days as the upper bin limit for the first class. This bin is denoted 10–14 in Table 2.7. The smallest data
value, 12, is included in the 10–14 bin. We then selected 15 days as the lower bin limit and 19 days as the upper bin limit of
the next class. We continued defining the lower and upper bin limits to obtain a total of five classes: 10–14, 15–19, 20–24,
25–29, and 30–34. The largest data value, 33, is included in the 30–34 bin. The difference between the upper bin limits of
adjacent bins is the bin width. Using the first two upper bin limits of 14 and 19, we see that the bin width is 19 2 14 5 5.
With the number of bins, bin width, and bin limits determined, a frequency distribution can be obtained by counting the number
of data values belonging to each bin. For example, the data in Table 2.6 show that four values—12, 14, 14, and 13—belong
to the 10–14 bin. Thus, the frequency for the 10–14 bin is 4. Continuing this counting process for the 15–19,
20–24, 25–29, and 30–34 bins provides the frequency distribution in Table 2.7. Using this frequency distribution, we can
observe that:

● The most frequently occurring audit times are in the bin of 15–19 days. Eight of the 20 audit times are in this bin.
● Only one audit required 30 or more days.

Other conclusions are possible, depending on the interests of the person viewing the frequency distribution. The value of a
frequency distribution is that it provides insights about the data that are not easily obtained by viewing the data in their original
unorganized form.

San Mateo Municipal College Module 5/BAFBANA1/Page 7


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Table 2.7 also shows the relative frequency distribution and percent frequency distribution for the audit time data. Note that
0.40 of the audits, or 40 percent, required from 15 to 19 days. Only 0.05 of the audits, or 5 percent, required 30 or more
days. Again, additional interpretations and insights can be obtained by using Table 2.7.

Frequency distributions for quantitative data can also be created using Excel. Figure 2.10 shows the data from Table 2.6
entered into an Excel Worksheet. The sample of 20 audit times is contained in cells A2:D6. The upper limits of the defined
bins are in cells A10:A14. We can use the FREQUENCY function in Excel to count the number of observations in each bin:

Step 1. Select cells B10:B14


Step 2. Enter the formula =FREQUENCY (A2:D6, A10:A14). The range A2:D6
defines the data set, and the range A10:A14 defines the bins
Step 3. Press CTRL+SHIFT+ENTER

Excel will then fill in the values for the number of observations in each bin in cells B10 through B14 because these were the
cells selected in Step 1 above (see Figure 2.10)

Histograms

A common graphical presentation of quantitative data is a histogram. This graphical summary can be prepared for data
previously summarized in either a frequency, a relative frequency, or a percent frequency distribution. A histogram is
constructed by placing the variable of interest on the horizontal axis and the selected frequency measure (absolute
frequency, relative frequency, or percent frequency) on the vertical axis. The frequency measure of each class is shown by
drawing a rectangle whose base is determined by the class limits on the horizontal axis and whose height is the
corresponding frequency measure.

Figure 2.11 is a histogram for the audit time data. Note that the class with the greatest frequency is shown by the rectangle
appearing above the class of 15–19 days. The height of the rectangle shows that the frequency of this class is 8. A histogram
for the relative or percent frequency distribution of these data would look the same as the histogram in Figure 2.11 with the
exception that the vertical axis would be labeled with relative or percent frequency values.

San Mateo Municipal College Module 5/BAFBANA1/Page 8


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Histograms can be created in Excel using the Data Analysis ToolPak. We will use the sample of 20 year-end audit times and
the bins defined in Table 2.7 to create a histogram using the Data Analysis ToolPak. As before, we begin with an Excel
Worksheet where the sample of 20 audit times is contained in cells A2:D6, and the upper limits of the bins defined n Table
2.7 are in cells A10:A14 (see Figure 2.10).

Step 1. Click the DATA tab in the Ribbon


Step 2. Click Data Analysis in the Analysis group
Step 3. When the Data Analysis dialog box opens, choose Histogram from the list of
Analysis Tools, and click OK

In the Input Range: box, enter A2:D6


In the Bin Range: box, enter A10:A14

San Mateo Municipal College Module 5/BAFBANA1/Page 9


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
The histogram created by Excel for these data is shown in Figure 2.13. We have modified the bin ranges in column
A by typing the values shown in Figure 2.13 into cells A2:A6 so that the chart created by Excel shows both the lower and
upper limits for each bin. We have also removed the gaps between the columns in the histogram in Excel to match the
traditional format of histograms. To remove the gaps between the columns in the Histogram created by Excel, follow these
steps:

One of the most important uses of a histogram is to provide information about the shape, or form, of a distribution. Skewness,
or the lack of symmetry, is an important characteristic of the shape of a distribution. Figure 2.14 contains four histograms
constructed from relative frequency distributions that exhibit different patterns of skewness. Panel A shows the histogram for
a set of data moderately skewed to the left. A histogram is said to be skewed to the left if its tail extends farther to the left than
to the right. This histogram is typical for exam scores, with no scores above 100 percent, most of the scores above 70 percent,
and only a few really low scores. Panel B shows the histogram for a set of data moderately skewed to the right. A histogram
is said to be skewed to the right if its tail extends farther to the right than to the left. An example of this type of histogram would
be for data such as housing prices; a few expensive houses create the skewness in the right tail. Panel C shows a symmetric
histogram. In a symmetric histogram, the left tail mirrors the shape of the right tail. Histograms for data found in applications
are never perfectly symmetric, but the histogram for many applications may be roughly symmetric. Data for SAT scores, the
heights and weights of people, and so on lead to histograms that are roughly symmetric. Panel D shows a histogram highly
skewed to the right. This histogram was constructed from data on the amount of customer purchases over one day at a
women’s apparel store. Data from applications in business and economics often lead to histograms that are skewed to the
right. For instance, data on housing prices, salaries, purchase amounts, and so on often result in histograms skewed to the
right.

San Mateo Municipal College Module 5/BAFBANA1/Page 10


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Cumulative Distributions

A variation of the frequency distribution that provides another tabular summary of quantitative data is the cumulative frequency
distribution, which uses the number of classes, class widths, and class limits developed for the frequency distribution.
However, rather than showing the frequency of each class, the cumulative frequency distribution shows the number of data
items with values less than or equal to the upper class limit of each class. The first two columns
of Table 2.8 provide the cumulative frequency distribution for the audit time data. To understand how the cumulative
frequencies are determined, consider the class with the description “Less than or equal to 24.” The cumulative frequency for
this class is simply the sum of the frequencies for all classes with data values less than or equal to 24. For the frequency
distribution in Table 2.7, the sum of the frequencies for classes 10–14, 15–19, and 20–24 indicates that 4 + 8 + 5 = 17 data
values are less than or equal to 24. Hence, the cumulative frequency for this class is 17. In addition, the cumulative frequency
distribution in Table 2.8 shows that four audits were completed in 14 days or less and that 19 audits
were completed in 29 days or less.

As a final point, a cumulative relative frequency distribution shows the proportion of data items, and a cumulative percent
frequency distribution shows the percentage of data items with values less than or equal to the upper limit of each class. The
cumulative relative frequency distribution can be computed either by summing the relative frequencies in the relative frequency
distribution or by dividing the cumulative frequencies by the total number of items. Using the latter approach, we found the
cumulative relative frequencies in column 3 of Table 2.8 by dividing the cumulative frequencies in column 2 by the total number
of items (n = 20). The cumulative percent frequencies were again computed by multiplying the relative frequencies by 100.
The cumulative relative and percent frequency distributions show that 0.85 of the audits, or 85 percent, were completed in 24
days or less, 0.95 of the audits, or 95 percent, were completed in 29 days or less, and so on.

San Mateo Municipal College Module 5/BAFBANA1/Page 11


College of Business and Accountancy Prepared by Arlene F. Musones, MBA
Individual Activity:

Group Activity:

San Mateo Municipal College Module 5/BAFBANA1/Page 12


College of Business and Accountancy Prepared by Arlene F. Musones, MBA

You might also like