Statistical Analysis Excel Part 1
Statistical Analysis Excel Part 1
with
Excel
1
1.Introduc+on
Autofilling cells:
The first is autofill, Excel’s capability for repea/ng a calcula/on throughout a worksheet. Insert
a formula into a cell, and you can drag that formula into adjoining cells.
Figure 1.1 is a worksheet of expenditures for R&D in science and engineering at colleges and
universi/es for the years shown. The data, taken from a U.S. Na/onal Science Founda/on
report, are in millions of dollars. Column H holds the total for each field, and Row 11 holds the
total for each year. (More about column I in a moment.)
2
I started with Column H blank and with row 11 blank. How did I get the totals into column H
and row 11?
If I want to create a formula to calculate the first row total (for Physical Sciences), one way
(among several) is to enter
= D2 + E2 + F2 + G2
into cell H2. (A formula always begins with “=”.) Press Enter and the total appears in H2.
Now, to put that formula into cells H3 through H10, the trick is to posi/on the cursor on the
lower-right corner of H2 un/l a “+” appears, hold down the le^ mouse bu_on, and drag the
mouse through the cells. That “+” is called the cell’s fill handle.
When you finish dragging, release the mouse bu_on and the row totals appear. This saves
huge amounts of /me, because you don’t have to reenter the formula eight /mes.
Same thing with the column totals. One way to create the formula that sums up the numbers
in the first column (1990) is to enter
=D2 + D3 + D4 + D5 + D6 + D7 + D8 + D9 + D10
into cell D11. Posi/on the cursor on D11’s fill handle, drag through row 11 and release in
column H, and you autofill the totals into E11 through H11.
Dragging isn’t the only way to do it. Another way is to select the array of cells you want to
autofill (including the one that contains the formula), and click
Home | Fill
Where’s Fill? On the Home tab, in the Edi/ng area, you see a down arrow.
That’s Fill. Clicking Fill opens the Fill pop-up menu (see Figure 1.2). Select Down and you
accomplish the same thing as dragging and dropping.
3
S/ll another way is to select Series from the Fill pop-up menu. Doing this opens the Series
dialog box (see Figure 1.3). In this dialog box, select the AutoFill radio bu_on, click OK, and
you’re all set. This does take one more step, but the Series dialog box is a bit more compa/ble
with earlier versions of Excel.
I bring this up because sta/s/cal analysis o^en involves repea/ng a formula from cell to cell.
The formulas are usually more complex than the ones in this sec/on, and you might have to
repeat them many /mes, so it pays to know how to autofill.
A quick way to autofill is to click in the first cell in the series, move the cursor to that cell’s
lower-right corner un/l the autofill handle appears, and double-click. This works in both PC
and Mac.
Referencing cells
The second important fundamental is the way Excel references worksheet cells. Consider
again the worksheet in Figure 1.1 Each autofilled formula is slightly different from the original.
This, remember, is the formula in cell H2:
= D2 + E2 + F2 + G2
= D3 + E3 + F3 + G3
This is perfectly appropriate. You want the total in each row, so Excel adjusts the formula
accordingly as it automa/cally inserts it into each cell. This is called rela/ve referencing — the
4
reference (the cell label) gets adjusted rela/ve to where it is in the worksheet. Here, the
formula directs Excel to total up the numbers in the cells in the four columns immediately to
the le^.
Now for another possibility. Suppose you want to know each row total’s propor/on of the
grand total (the number in H11). That should be straight-forward, right? Create a formula for
I2, and then autofill cells I3 through I10. Similar to the earlier example, start by entering this
formula into I2:
=H2/H11
Press Enter and the propor/on appears in I2. Posi/on the cursor on the fill handle, drag
through column I, release in I10, and . . . Figure 1.4 shows the unhappy result — the extremely
ugly #/DIV0! in I3 through I10.
The story is this: Unless you tell it not to, Excel uses rela/ve referencing when you autofill. So
the formula inserted into I3 is not
=H3/H11
Instead, it’s
=H3/H12
Why does H11 become H12? Rela/ve referencing assumes that the formula means divide the
number in the cell by whatever number is nine cells south of here in the same column.
5
Because H12 has nothing in it, the formula is telling Excel to divide by zero, which is a no-no.
The idea is to tell Excel to divide all the numbers by the number in H11, not by whatever
number is nine cells south of here. To do this, you work with absolute referencing. You show
absolute referencing by adding $ signs to the cell ID. The correct formula for I2 is
= H2/$H$11
This tells Excel not to adjust the column and not to adjust the row when you autofill. Figure
1.5 shows the worksheet with the propor/ons, and you can see the correct formula in the
formula bar (an area above the worksheet and below the Ribbon).
To convert a rela/ve reference into absolute reference format, select the cell address (or
addresses) you want to convert, and press the F4 key. F4 is a toggle that goes between rela/ve
reference (H11, for example), absolute reference for both the row and column in the address
($H$11), absolute reference for the row-part only (H$11), and absolute reference for the
column-part only ($H11).
Many of Excel’s sta/s/cal features are built into its worksheet func/ons. You accessed the
worksheet func/ons by using the Excel Insert Func/on bu_on, labeled with the symbol fx.
Clicking this bu_on opens the Insert Func/on dialog box, which presents a list of Excel’s
func/ons and a capability for searching for Excel func/ons. Although Excel provides easier
ways to access the worksheet func/ons, this latest version preserves this bu_on and offers
6
addi/onal ways to open the Insert Func/on dialog box. I discuss all of this in more detail in a
moment.
Figure 2-1 shows the loca/on of the Insert Func/on bu_on and the Formula Bar. They’re on
the right of the Name box. All three are just below the Ribbon. Inside the Ribbon, in the
Formulas tab, is the Func/on Library. The Formula Bar is like a clone of a cell you select:
Informa/on entered into the Formula Bar goes into the selected cell, and informa/on entered
in the selected cell appears in the Formula Bar.
Figure 2-1: The Func/on Library, the Name box, the Formula Bar, and the Insert Func/on
bu_on.
Figure 2-1 shows Excel with the Formulas tab open. This shows you another loca/on for the
Insert Func/on bu_on. Labeled fx, it’s in the extreme le^ of the Ribbon, in the Func/on Library
area. As I men/on earlier in this sec/on, when you click the Insert Func/on bu_on, you open
the Insert Func/on dialog box. (See Figure 2-2.)
7
Figure 2-2: The Insert Func/on dialog box.
This dialog box enables you to search for a func/on that fits your needs, or to scroll through a
list of Excel func/ons.
So in addi/on to clicking the Insert Func/on bu_on next to the Formula bar, you can open the
Insert Func/on dialog box by selec/ng
Worksheet func7ons
As I point out in the preceding sec/on, the Func/on Library area of the Formulas tab shows
all the categories of worksheet func/ons.
1. Type your data into a data array and select a cell for the result.
2. Select the appropriate formula category and choose your func/on from its pop-up menu.
3. In the Func/on Arguments dialog box, type the appropriate values for the func/on’s
arguments.
8
To give you an example, I explore a func/on that typifies how Excel’s worksheet func/ons
work. This func/on, SUM, adds up the numbers in cells you specify and returns the sum in s/ll
another cell that you specify.
Although adding numbers together is an integral part of sta/s/cal number crunching, SUM is
not in the Sta/s/cal category. It is, however, a typical worksheet func/on and it shows a
familiar opera/on.
1. Enter your numbers into an array of cells and select a cell for the
result.
In this example, I’ve entered 45, 33, 18, 37, 32, 46, and 39 into cells C2 through C8,
respec/vely, and selected C9 to hold the sum.
2. Select the appropriate formula category and choose your func/on from its pop-up menu.
This opens the Func/on Arguments dialog box.
I selected Formulas | Math & Trig and scrolled down to find and choose SUM.
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
Excel guesses that you want to sum the numbers in cells C2 through C8 and iden/fies that
array in the Number1 box. Excel doesn’t keep you in suspense: The Func/on Arguments dialog
box shows the result of applying the func/on. In this example, the sum of the numbers in the
array is 250. (See Figure 2-3.)
Note a couple of points. First, as Figure 2-3 shows, the Formula Bar holds
=SUM(C2:C8)
9
Figure 2-3: Using SUM.
A^er you get familiar with a worksheet func/on and its arguments, you can bypass the menu
and type the func/on directly into the cell or into the formula bar, beginning with =. When
you do, Excel opens a helpful menu as you type the formula. (See Figure 2-4.) The menu shows
possible formulas beginning with the le_er(s) you type, and you can select one by double-
clicking it.
Another noteworthy point is the set of boxes in the Func/on Arguments dialog box in Figure
2-3. In the figure, you see just two boxes, Number1 and Number2. The data array appears in
Number1. So what’s Number2 for? The Number2 box allows you to include an addi/onal
argument in the sum. And it doesn’t end there. Click in the Number2 box and the Number3
10
box appears. Click in the Number3 box, and the Number4 box appears . . . and on and on. The
limit is 255 boxes, with each box corresponding to an argument.
A value can be another array of cells anywhere in the worksheet, a number, an arithme/c
expression that evaluates to a number, a cell ID, or a name that you have a_ached to a range
of cells. (Regarding that last one: Read the upcoming sec/on “What’s in a name? An array of
possibili/es.”) As you type in values, the SUM dialog box shows the updated sum. Clicking OK
puts the updated sum into the selected cell.
You won’t find this mul/-argument capability on every worksheet func/on. Some are designed
to work with just one argument. For the ones that do work with mul/ple arguments, however,
you can incorporate data that reside all over the worksheet. Figure 2-5 shows a worksheet
with a Func/on Arguments dialog box that includes data from two arrays of cells, two
arithme/c expressions, and one cell. No/ce the format of the func/on in the Formula Bar (a
comma separates successive arguments).
If you select a cell in the same column as your data and just below the last data cell, Excel
correctly guesses the data array that you want to work on. Excel doesn’t always guess what
you want to do, however. Some/mes when Excel does guess, its guess is incorrect. When
either of those things happens, it’s up to you to enter the appropriate values into the Func/on
Arguments dialog box.
11
Sta7s7cal func7ons
In the preceding example, I show you a func/on that’s not in the category of sta/s/cal
func/ons. In this sec/on, I show you how to create a shortcut to Excel’s sta/s/cal func/ons.
You can get to Excel’s sta/s/cal func/ons by selec/ng Formulas | More Func/ons | Sta/s/cal
and then choosing from the resul/ng pop-up menu. (See Figure 2-6.)
Although Excel has buried the sta/s/cal func/ons several layers deep, you can use a handy
technique to make them as accessible as any of the other categories: You add them to the
Quick Access Toolbar in the upper-le^ corner. (Every Office applica/on has one.)
To do this,
select Formulas | More Func/ons and right-click on Sta/s/cal. On the pop-up menu, pick the
first op/on Add to Quick Access Toolbar. (See Figure 2-7.) Doing this adds a bu_on to the Quick
Access Toolbar.
Clicking the new bu_on’s down arrow opens the pop-up menu of sta/s/cal func/ons. (See
Figure 2-8.)
12
From now on, when I deal with a sta/s/cal func/on, I assume that you’ve created this
shortcut, so you can quickly open the menu of sta/s/cal func/ons. The next sec/on provides
an example.
Figure 2-7: Adding the Sta/s/cal func/ons to the Quick Access Toolbar
13
Array func7ons
Most of Excel’s built-in func/ons are formulas that calculate a single value (like a sum) and put
that value into a worksheet cell. Excel has another type of func/on. It’s called an array func/on
because it calculates mul/ple values and puts those values into an array of cells, rather than
into a single cell.
FREQUENCY is a good example of an array func/on (and it’s an Excel sta/s/cal func/on, too).
Its job is to summarize a group of scores by showing how the scores fall into a set of intervals
that you specify.
For example, given these scores 77, 45, 44, 61, 52, 53, 68, 55 and these intervals 50, 60, 70,
80.
FREQUENCY shows how many are less than or equal to 50 (2 in this example), how many are
greater than 50 and less than or equal to 60 (that would be 3), and so on. The number of
scores in each interval is called a frequency.
I’ve put Frequency as the label at the top of column D, so I select D2 through D10 for the
resul/ng frequencies. Why the extra cell?
FREQUENCY returns a ver/cal array that has one more cell than the frequencies array.
4. From the Sta/s/cal Func/ons menu, select FREQUENCY to open the Func/on Arguments
dialog box.
I used the shortcut I installed on the Quick Access Toolbar to open this menu and select
FREQUENCY.
5. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
I begin with the Data_array box. In this box, I entered the cells that hold the scores. In this
example, that’s B2:B16.
I’m assuming you know Excel well enough to know how to do this in several ways.
14
Next, I iden/fy the intervals array. FREQUENCY refers to intervals as “bins,” and holds the
intervals in the Bins_array box.
For this example, C2:C9 goes into the Bins_array box. A^er iden/fying both arrays, the Insert
Func/on dialog box shows the frequencies inside a pair of curly brackets.
6. Press Ctrl+Shi^+Enter to close the Func/on Arguments dialog box and put the values in
the selected array. For the Mac, it’s Command+Enter.
This is very important. Because the dialog box has an OK bu_on, the tendency is to click OK,
thinking that puts the results into the worksheet.
That doesn’t get the job done when you work with an array func/on, however. Always use
the keystroke combina/on Ctrl+Shi^+Enter (Command+Enter on the Mac) to close the
Func/on Arguments dialog box for an array func/on.
A^er closing the Func/on Arguments dialog box, the frequencies go into the
appropriate cells, as Figure 2-10 shows.
15
Figure 2-10: The finished frequencies.
{= FREQUENCY(B2:B16,C2:C9)}
The curly brackets are Excel’s way of telling you that this is an array func/on.
16
In order to use these tools, you first have to load them into Excel.
Doing this opens the Excel Op/ons dialog box. Then follow these steps:
17
2. Near the bo_om of the list, you see a drop-down list labeled Manage.
From this list, select Excel Add-Ins.
3. Click Go.
This opens the Add-Ins dialog box. (See Figure 2-11.)
4. Select the check box next to Analysis Toolpak and then click OK.
When Excel finishes loading the Toolpak, you’ll find a Data Analysis bu_on in the Analysis area
of the Data tab. In general, the steps for using a data analysis tool are:
2. Click Data | Data Analysis to open the Data Analysis dialog box.
3. In the Data Analysis dialog box, select the data analysis tool you want to work with.
4. Click OK (or just double-click the selec/on) to open the dialog box for the selected tool.
18
Here’s an example to get you accustomed to using these tools. In this example, I go through
the Descrip/ve Sta/s/cs tool. This tool calculates a number of sta/s/cs that summarize a set
of scores.
2. Click Data | Data Analysis to open the Data Analysis dialog box.
3. Click Descrip/ve Sta/s/cs and click OK (or just double-click Descrip/ve Sta/s/cs) to open
the Descrip/ve Sta/s/cs dialog box.
4. Iden/fy the data array. In the Input Range box, enter the cells that hold the data. For this
example, that’s B1 through B9. The easiest way to do this is to move the cursor to the top
cell (B1), press the Shi^ key, and click in the bo_om cell (B9). That puts the absolute
reference format $B$1:$B$9 into Input Range.
5. Select the Columns radio bu_on to indicate that the data are organized by columns.
6. Select the Labels in First Row check box, because the Input Range includes the column
heading.
7. Select the New Worksheet Ply radio bu_on, if it isn’t already selected. This tells Excel to
create a new tabbed sheet within the current worksheet, and to send the results to the
newly created sheet.
8. Click the Summary Sta/s/cs check box and leave the others unchecked. Click OK
19
The new tabbed sheet (ply) opens, displaying sta/s/cs that summarize the data. Figure 2-13
shows the new ply, a^er you widen Column A.
3 Describing Data
Visual presenta/on helps in another way: It’s valuable for presen/ng ideas to groups and
making them understand your point of view.
First of all, Excel uses the word “chart” instead of “graph.” Most chart formats have a
horizontal axis and a ver/cal axis.
By conven/on, the horizontal axis is also called the x-axis and the ver/cal axis is also called the
y-axis. Also, by conven/on, what goes on the horizontal axis is called the independent variable
and what goes on the ver/cal axis is called the dependent variable.
20
Inser7ng a chart
When you create a chart, you insert it into a spreadsheet. This immediately clues you that the
chart crea/on tools are in the Charts area of the Insert tab. (See Figure 3-1.)
The Insert Chart dialog box opens. This dialog box presents Excel’s best guesses for the kind
of chart that captures your data. Choose one, and Excel creates a chart in your worksheet.
Click on the chart, and Excel adds a Design tab and a Layout tab to the Ribbon. These tabs
allow you to make all kinds of changes to your chart.
One more important concept about Excel graphics. In Excel, a chart is dynamic. This means
that a^er you create a chart, changing its worksheet data results in an immediate change in
the chart.
Example
21
Figure 3-2: Table data entered into a worksheet
Selec/ng Insert | Charts | Recommended Charts opens the Insert Chart dialog box in Figure 3-
4. I scrolled down the recommended charts in the le^ column and selected Excel’s fi^h
recommenda/on. This type of chart is called Clustered Column.
22
4. Modify the chart.
Figure 3-4 shows the resul/ng chart, as well as the Design tab and the Format tab. These tabs
combine to form Chart Tools. As you can see, I have to do some modifying. Why? Excel has
guessed wrong about how I wanted to design the chart. It looks okay, but it will look be_er if
I relocate the legend (the part below the x-axis that shows what all the colors mean).
Excel’s AVERAGE worksheet func/on calculates the mean of a set of numbers. Figure 3-5
shows the data and Func/on Arguments dialog box for AVERAGE.
23
Figure 3-5: Working with AVERAGE.
1. In your worksheet, enter your numbers into an array of cells and select the cell where you
want AVERAGE to place the result.
For this example, I entered 56, 78, 45, 49, 55, and 62 into cells B2 through B7, and I selected
B8 for the result.
2. From the Sta/s/cal Func/ons menu, choose AVERAGE to open the AVERAGE Func/on
Arguments dialog box.
3. In the Func/on Arguments dialog box, enter the values for the arguments.
If the array of number-containing cells isn’t already in the Number1 box, enter it into that box.
The mean (57.5 for this example) appears in this dialog box.
This puts the mean into the cell selected in the worksheet. In this example, that’s B8.
As you can see in Figure 3-5, the formula in the Formula bar is
=AVERAGE(B2:B7)
24
AVERAGEA does the same thing as AVERAGE, but with one important difference. When
AVERAGE calculates a mean, it ignores cells that contain text and it ignores cells that contain
the expressions TRUE or FALSE. AVERAGEA takes text and expressions into considera/on when
it calculates a mean. As far as AVERAGEA is concerned, if a cell has text or FALSE, it has a value
of 0.
If a cell holds the word TRUE, it has a value of 1. AVERAGEA includes these values in the mean.
1. Type the numbers into the worksheet and select a cell for the result.
For this example, I entered the numbers 56, 78, 45, 49, 55, 62 in cells B2 through B7 and select
B9. This leaves B8 blank. I did this because I’m going to put different values into B8 and show
you the effect on AVERAGEA.
2. From the Sta/s/cal Func/ons menu, select AVERAGEA to open the AVERAGEA Func/on
Arguments dialog box.
3. In the Func/on Arguments dialog box, enter the values for the arguments.
This /me I entered B2:B8 into the Number1 box. The mean (57.5) appears in this dialog box.
AVERAGEA ignores blank cells, just as AVERAGE does.
4. Click OK to close the Func/on Arguments dialog box, and the answer appears in the selected
cell.
Now for some experimenta/on. In B8, if I type xxx, the mean in B9 changes from 57.5 to
49.28571. Next, typing TRUE into B8 changes the mean in B9 to 49.42857. Finally, a^er typing
FALSE into B8, the mean changes to 49.28571.
Why the changes? AVERAGEA evaluates a text string like xxx as zero. Thus, the average in this
case is based on seven numbers (not six), one of which is zero. AVERAGEA evaluates the value
TRUE as 1. So the average with TRUE in B8 is based on seven numbers, one of which is 1.00.
AVERAGEA evaluates FALSE as zero, and calculates the same average as when B8 holds xxx.
The worksheet func/on MEDIAN (you guessed it) calculates the median of a
group of numbers. Here are the steps:
1. Type your data into a worksheet and select a cell for the result.
25
I used 45, 49, 55, 56, 62, 78 for this example, in cells B2 through B7, with cell B8 selected for
the median. I arranged the numbers in increasing order, but you don’t have to do that to use
MEDIAN.
2. From the Sta/s/cal Func/ons menu, select MEDIAN to open the MEDIAN Func/on
Arguments dialog box.
3. In the Func/on Arguments dialog box, enter the values for the arguments.
The Func/on Arguments dialog box opens with the data array in the Number1 box. The
median appears in that dialog box. (It’s 55.5 for this example.) Figure 3-6 shows the dialog box
along with the array of cells and the selected cell.
4. Click OK to close the dialog box and the answer appears in the selected cell.
Figure 3-6: The MEDIAN Function Arguments dialog box along with the array of cells and the selected cell.
2. From the Sta/s/cal Func/ons menu, select MODE.SNGL to open the MODE.SNGL Func/on
Arguments dialog box. (See Figure 3-7.)
26
3. In the Func/on Arguments dialog box, type the values for the arguments.
The Func/on Arguments dialog box opens. I entered B2:B10 in the Number1 box and the mode
(75 for this example) appears in the dialog box.
4. Click OK to close the dialog box and the answer appears in the selected cell.
Figure 3-7: The MODE. SNGL Function Arguments dialog box along with the array of cells and the selected cell.
For a set of numbers that has more than one mode (that is, if it’s mul/modal), use Excel’s
MODE.MULT func/on. This is an array func/on: It returns (poten/ally) an array of answers,
not just one.
You select an array of cells for the results, and when you finish with the dialog box you press
Ctrl+Shi^+Enter to populate the array.
1. Type your data into a worksheet and select a ver/cal array of cells for the results.
I typed 57, 23, 77, 75, 57, 75, 91, 57, and 75 into cells B2:B10. I selected B11:B14 for the results.
No/ce that this set of numbers has two modes, 57 and 75.
2. From the Sta/s/cal Func/ons menu, select MODE.MULT to open the MODE.MULT Func/on
Arguments dialog box. (See Figure 3-8.)
3. In the Func/on Arguments dialog box, type the values for the arguments.
IMPORTANT: Do not click OK.
4. Because this is an array func/on, press Ctrl+Shi^+Enter to put MODE. MULT’s answers into
the selected array. Nothing in the dialog box even remotely hints that you have to do this.
27
Figure 3-9 shows what happens a^er you press Ctrl+Shi^+Enter. Because I allocated four cells
for the results and only two modes were in the set of numbers, error messages show up in the
remaining two cells.
Figure 3-8: The MODE. MULT Function Arguments dialog box along with the array of data cells and the array of
cells for the results.
Figure 3-9: The results of MODE. MULT. Note the curly brackets around the formula in the Formula bar. This
indicates an array formula.
3.3 DEVIATION
28
VAR.P and VARPA
Excel’s two worksheet func/ons, VAR.P and VARPA, calculate the popula/on variance.
Start with VAR.P. Figure 3-10 shows the Func/on Arguments dialog box for VAR.P along with
data.
1. Put your data into a worksheet and select a cell to display the result. Figure 5-1 shows that
for this example, I’ve put the numbers 50, 47, 52, 46, and 45 into cells B2 through B6 and
selected B8 for the result.
2. From the Sta/s/cal Func/ons menu, select VAR.P to open the VAR.P Func/on Arguments
dialog box.
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
I entered B2:B7 in the Number1 field, rather than B2:B6. I did this to show you how VAR.P
handles blank cells. The popula/on variance, 6.8, appears in the Func/on Arguments dialog
box.
4. Click OK to close the dialog box and put the result in the selected cell.
Had I defined Score as the name of B2:B7, the formula in the formula bar would be
=VAR.P(Score)
When VAR.P calculates the variance in a range of cells, it only sees numbers. If text, blanks
(like B7), or logical values are in some of the cells, VAR.P ignores them.
29
VARPA, on the other hand, does not. VARPA takes text and logical values into considera/on
and includes them in its variance calcula/on.
How? If a cell contains text, VARPA sees that cell as containing a value of zero. If a cell contains
the logical value FALSE, that’s also zero as far as VARPA is concerned. In VARPA’s view of the
world, the logical value TRUE is one. Those zeros and ones get added into the mix and affect
the mean and the variance.
The worksheet func/ons VAR.S and VARA calculate the sample variance.
Figure 3-11 shows the Func/on Arguments dialog box for VAR.S with 50, 47, 52, 46, and 45
entered into cells B2 through B6. Cell B7 is part of the cell range, but I le^ it empty.
The rela/onship between VAR.S and VARA is the same as the rela/onship between VAR.P
and VARPA: VAR.S ignores cells that contain logical values (TRUE and FALSE) and text. VARA
includes those cells. Once again, TRUE evaluates to 1.0 and FALSE evaluates to 0. Text in a
cell causes VARA to see that cell’s value as 0.
30
The Excel worksheet func/ons STDEV.P and STDEVPA calculate the popula/on
standard devia/on. Follow these steps:
1. Type your data into an array and select a cell for the result.
2. In the Sta/s/cal Func/ons menu, select STDEV.P to open the STDEV.P Func/on Arguments
dialog box.
3. In the Func/on Arguments dialog box, type the appropriate values for the arguments.
A^er you enter the data array, the dialog box shows the value of the popula/on standard
devia/on for the numbers in the data array.
4. Click OK to close the dialog box and put the result into the selected cell.
Figure 3-12: The Function Arguments dialog box for STDEV.P, along with the data.
Like VARPA, STDEVPA uses any logical values and text values it finds when it calculates the
popula/on standard devia/on. TRUE evaluates to 1.0 and FALSE evaluates to 0. Text in a cell
gives that cell a value of 0.
The Excel worksheet func/ons STDEV.S and STDEVA calculate the sample standard devia/on.
To work with STDEV.S, follow these steps:
1. Type your data into an array and select a cell for the result.
31
2. In the Sta/s/cal Func/ons menu, select STDEV.S to open the STDEV.S Func/on Arguments
dialog box.
3. In the Func/on Arguments dialog box, type the appropriate values for the arguments.
With the data array entered, the dialog box shows the value of the popula/on standard
devia/on for the numbers in the data array.
4. Click OK to close the dialog box and put the result into the selected cell.
STDEVA uses text and logical values in its calcula/ons. Cells with text have values of 0, and
cells whose values are FALSE also evaluate to 0. Cells that evaluate to TRUE have values of
1.0.
3.4 SHAPE
Skewness indicates how symmetrically the scores are distributed. Kurtosis shows you whether
or not your scores are distributed with a peak in the neighborhood of the mean.
To use SKEW:
1. Type your numbers into a worksheet and select a cell for the result.
For this example, I’ve entered scores into the first ten rows of Columns B, C, D, and E. (See
Figure 3-14.) I selected cell H2 for the result.
32
2. From the Sta/s/cal Func/ons menu, select SKEW to open the Func/on Arguments dialog
box for SKEW.
3. In the Func/on Arguments dialog box, type the appropriate values for the arguments.
In the Number1 box, enter the array of cells that holds the data. For this example, the array is
B1:E10. With the data array entered, the Func/on Arguments dialog box shows the skewness,
which for this example is nega/ve.
The Func/on Arguments dialog box for SKEW.P (the skewness of a popula/on) looks the same.
As I men/on earlier, popula/on skewness incorporates N rather than N-1.
KURT
Figure 3-15 shows the scores from the preceding example, a selected cell, and the Func/on
Arguments dialog box for KURT.
33
Figure 3-15: Using KURT to calculate kurtosis.
To use KURT:
1. Enter your numbers into a worksheet and select a cell for the result.
For this example, I entered scores into the first ten rows of Columns B, C, D, and E. I selected
cell H2 for the result.
2. From the Sta/s/cal Func/ons menu, select KURT to open the Func/on Arguments dialog
box for KURT.
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
In the Number1 box, I entered the array of cells that holds the data.
Here, the array is B1:E10. With the data array entered, the Func/on Arguments dialog box
shows the kurtosis, which for this example is nega/ve.
4. Probability
NORM.DIST
34
Excel’s NORM.DIST worksheet func/on enables you to find normal distribu/on areas without
relying on tables. NORM.DIST finds a cumula/ve area. You supply a score, a mean, and a
standard devia/on for a normal distribu/on, and NORM.DIST returns the propor/on of area
to the le^ of the score (also called cumula/ve propor/on or cumula/ve probability).
For example, Figure 4-1 shows that in the IQ distribu/on, .8413 of the area is to the le^ of 116.
In Figure 4-2, I use NORM.DIST to find this propor/on. Here are the steps:
1. Select a cell for NORM.DIST’s answer. For this example, I selected C2.
2. From the Sta/s/cal Func/ons menu, select NORM.DIST to open the Func/on Arguments
dialog box for NORM.DIST.
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
In the X box, I entered the score for which I want to find the cumula/ve area. In this example,
that’s 116.
In the Mean box, I entered the mean of the distribu/on, and in the Standard_dev box, I enter
the standard devia/on. Here, the mean is 100 and the standard devia/on is 116.
In the Cumula/ve box, I entered TRUE. This tells NORM.DIST to find the cumula/ve area. The
dialog box shows the result.
35
Figure 4-2 shows that the cumula/ve area is .84134476 (in the dialog box). If you enter FALSE
in the Cumula/ve box, NORM.DIST returns the height of the normal distribu/on at 116.
NORM.INV
NORM.INV is the flip side of NORM.DIST. You supply a cumula/ve probability, a mean, and a
standard devia/on, and NORM.INV returns the score that cuts off the cumula/ve probability.
For example, if you supply .5000 along with a mean and a standard devia/on, NORM.INV
returns the mean.
NORM.S.DIST
36
NORM.S.DIST is like its counterpart NORM.DIST, except that it’s designed for a normal
distribu/on whose mean is 0 and whose standard devia/on is 1.00 (that is, a standard normal
distribu/on).
You supply a z-score and it returns the area to the le^ of the z-score — the probability that a
z-score is less than or equal to the one you supplied. You also supply either TRUE or FALSE for
an argument called Cumula/ve: TRUE if you’re looking for the cumula/ve probability, FALSE if
you’re trying to find f(x).
Figure 4-4 shows the Func/on Arguments dialog box with 1 as the z-score, and TRUE in the
Cumula/ve box. The dialog box presents .841344746, the probability that a z-score is less than
or equal to 1.00 in a standard normal distribu/on. Clicking OK puts that result into a selected
cell.
NORM.S.INV
NORM.S.INV is the flip side of NORM.S.DIST. You supply a cumula/ve probability and
NORM.S.INV returns the z-score that cuts off the cumula/ve probability. For example, if you
supply .5000, NORM.S.INV returns 0, the mean of the standard normal distribu/on.
Figure 4-5 shows the Func/on Arguments dialog box for NORM.S.INV, with .75 as the
cumula/ve probability. The dialog box shows the answer, .67448975, the z-score at the 75th
percen/le of the standard normal distribu/on.
37
Figure 4-5: Working with NORM.S.INV.
STANDARDIZE
Excel’s STANDARDIZE worksheet func/on calculates z-scores. Figure 4-6 shows a set of exam
scores along with their mean and standard devia/on.
I used AVERAGE and STDEVP to calculate the sta/s/cs. The Func/on Arguments dialog box for
STANDARDIZE is also in the figure.
Figure 4-6: Exam scores and the Func/on Arguments dialog box for STANDARDIZE.
38
2. From the Sta/s/cal Func/ons menu, select STANDARDIZE to open the Func/on Arguments
dialog box for STANDARDIZE.
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
First, I entered the cell that holds the first exam score into the X box. In this example, that’s
D2. In the Mean box, I entered the cell that holds the mean — C23 for this example. It has to
be in absolute reference format, so the entry is $C$23. You can type it that way, or you select
C23 and then highlight the Mean box and press the F4 key.
In the Standard_dev box, I entered the cell that holds the standard devia/on. The appropriate
cell in this example is C24. This also has to be in absolute reference format, so the entry is
$C$24.
4. Click OK to close the Func/on Arguments dialog box and put the z-score for the first exam
score into the selected cell.
To finish up, I posi/on the cursor on the selected cell’s autofill handle, hold the le^ mouse
bu_on down, and drag the cursor to autofill the remaining z-scores.
39
Figure 4-7: The auto- filled array of z-scores.
These are Excel’s worksheet func/ons for the binomial distribu/on. Use BINOM.DIST to
calculate the probability of gezng four 3’s in ten tosses of a fair die:
2. From the Sta/s/cal Func/ons menu, select BINOM.DIST to open its Func/on Arguments
dialog box (see Figure 4-8).
40
Figure 4-8: The BINOM.DIST Func/on Arguments dialog box.
3. In the Func/on Arguments dialog box, type the appropriate values for the arguments.
In the Number_s box, I entered the number of successes. For this example, the number of
successes is 4. In the Trials box, I entered the number of trials. The number of trials is 10. In
the Probability_s box, I entered the probability of a success. I entered 1/6, the probability of
a 3 on a toss of a fair die.
In the Cumula/ve box, one possibility is FALSE for the probability of exactly the number of
successes entered in the Number_s box. The other is TRUE for the probability of gezng that
number of successes or fewer. I entered FALSE.
With values entered for all the arguments, the answer appears in the dialog box.
To give you a be_er idea of what the binomial distribu/on looks like, I use BINOM.DIST (with
FALSE entered in the Cumula/ve box) to find pr(0) through pr(10), and then I use Excel’s
graphics capabili/es (see Chapter 3) to graph the results. Figure 4-9 shows the data and the
graph.
41
Figure 4-9: The binomial distribu/on for x successes in ten tosses of a die, with p = 1/6
Incidentally, if you type TRUE in the Cumula/ve box, the result is .984 (and some more decimal
places), which is pr(0) + pr(1) + pr(2) + pr(3) + pr(4).
Figure 4-9 is helpful if you want to find the probability of gezng between four and six
successes in ten trials. Find pr(4), pr(5), and pr(6) and add the probabili/es.
A much easier way, especially if you don’t have a chart like Figure 16-5 handy or if you don’t
want to apply BINOM.DIST three /mes, is to use BINOM.
DIST.RANGE. Figure 4-10 shows the dialog box for this func/on, supplied with values for the
arguments. A^er entering all the arguments, the answer (0.069460321) appears in the dialog
box.
42
BINOM.INV
This func/on is tailor-made for binomial-based hypothesis tes/ng. Give BINOM.INV the
number of trials, the probability of a success, and a criterion cumula/ve probability.
BINOM.INV returns the smallest value of x (the number of successes) for which the cumula/ve
probability is greater than or equal to the criterion.
Here are the steps for the hypothesis tes/ng example I just showed you:
2. From the Sta/s/cal Func/ons menu, select BINOM.INV and click OK to open its Func/on
Arguments dialog box (see Figure 4-11).
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
In the Trials box, I entered 10, the number of trials. In the Probability_s box, I entered the
probability of a success. In this example it’s 1/6, the value of π according to H0.
In the Alpha box, I entered the cumula/ve probability to exceed. I entered .95, because I want
to find the cri/cal value that cuts off the upper 5 percent of the binomial distribu/on.
With values entered for the arguments, the cri/cal value, 4, appears in the dialog box.
43
4. Click OK to put the answer into the selected cell. As it happens, the cri/cal value is the
number of successes in the sample. The decision is to reject H0.
Poisson
POISSON.DIST
Here are the steps for using Excel’s POISSON.DIST for the preceding example:
2. From the Sta/s/cal Func/ons menu, select POISSON.DIST to open its Func/on Arguments
dialog box (see Figure 4-12).
3. In the Func/on Arguments dialog box, enter the appropriate values for the arguments.
In the X box, I entered the number of events for which I’m determining the probability. I’m
looking for pr(1), so I entered 1.
In the Mean box, I entered the mean of the process. That’s N π, which for this example is 1.
In the Cumula/ve box, it’s either TRUE for the cumula/ve probability or FALSE for just the
probability of the number of events. I entered FALSE.
44
With the entries for X, Mean, and Cumula/ve, the answer appears in the dialog box. The
answer for this example is .367879441.
In the example, I show you the probability for two defec/ve joints in 1,000 and the probability
for three. To follow through with the calcula/ons, I’d type 2 into the X box to calculate pr(2),
and 3 to find pr(3).
As I men/on earlier, in the 21st century, it’s pre_y easy to calculate the binomial probabili/es
directly. Figure 4-13 shows you the Poisson and the binomial probabili/es for the numbers in
Column B and the condi/ons of the example. I graphed the probabili/es so you can see how
close the two really are. I selected cell D3 so the formula box shows you how I used
BINOM.DIST to calculate the binomial probabili/es.
Although the Poisson’s usefulness as an approxima/on is outdated, it has taken on a life of its
own. Phenomena as widely disparate as reac/on /me data in psychology experiments,
degenera/on of radioac/ve substances, and scores in professional hockey games seem to fit
Poisson distribu/ons. This is why business analysts and scien/fic researchers like to base
models on this distribu/on.
45