Using Ibm Spss Statistics 3e
Using Ibm Spss Statistics 3e
For the most optimal reading experience we recommend using our website.
https://methods.sagepub.com/dict/mono/using-ibm-spss-statistics-3e/toc
Front Matter
• Copyright
• Acknowledgements
• Preface
• Acknowledgments
• About the Author
Chapters
Back Matter
Copyright
FOR INFORMATION:
E-mail: [email protected]
1 Oliver’s Yard
55 City Road
United Kingdom
Mathura Road
India
3 Church Street
Singapore 049483
All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying, recording, or by any information storage and retrieval system, without
permission in writing from the publisher.
All trademarks depicted within this book, including trademarks appearing as part of a screenshot, figure, or
other image are included solely for the purpose of illustration and are the property of their respective holders.
The use of the trademarks in no way indicates any relationship with, or endorsement by, the holders of said
trademarks. SPSS is a registered trademark of International Business Machines Corporation.
ISBN 978-1-5443-1889-9
Leah Fargotstein
Elizabeth Wells
Andrew Olson
Chelsea Neve
Barbara Coster
Maria Sosnowski
Glenn Vogel
Shari Countryman
Acknowledgements
I dedicate this textbook to my three children, Sally, James (1965–1996), and Wendy. The
encouragement and support for their father in his educational pursuits was (and is) far above the call
to duty.
—James O. Aldrich
Preface
This third edition was written while using Version 25 and 24 of IBM SPSS Statistics.1 Although 25 is the most
recent version available, the information in this book is almost always compatible with the earlier releases.
The reader should also note that the student version of the software packages does not have some of the
same features. However, the differences in the student version are rarely encountered and will have little
effect on learning SPSS and its application to statistical analysis.
At the behest of the users and reviewers, all of the datasets used in this third edition are made available on
a new companion website hosted by SAGE (study.sagepub.com/aldrich3e). One of the most useful features
in this third edition was the naming and categorization of all datasets used in the book. Such naming made
it possible to have these datasets readily available for the professor and student on the companion website.
These datasets and the website are discussed in greater detail in the next two sections of this preface.
This third edition has also been reorganized into four sections: I. SPSS Commands and Assignment of Levels
of Measurement, II. Descriptive Statistics and Graphing, III. Basic Inferential Statistics, and IV. Relational
Statistics—Prediction, Describing, and Exploring Multi-Variable Relationships. The reader is referred to the
preceding Detailed Contents for specifics regarding this reorganization.
In response to users and reviewers, a new chapter on inferential statistics was added. This new Chapter 12
is not intended to teach all the “ins and outs” of inferential statistics but to supplement the SPSS software
package. Topics such as sampling, statistical significance, and hypothesis testing are addressed to give
the SPSS user a foundational understanding of the results of SPSS’s statistical procedures. More specific
information on the contents of this new chapter can be found in the Detailed Contents. Related to this, there is
also a new Appendix B that provides examples of the use of the normal curve and z-table to solve probability-
type problems. This is intended to make it easier for the IBM SPSS Statistics software user to visualize the
significance level and rejection area of the mathematical normal curve. This new appendix is an effort to
encourage the student/statistician to examine and understand the real meaning and importance of the Sig.
(2-tailed) column shown in much of SPSS’s output.
The Review Exercises section of each chapter now has five problems. As in the second edition, the first
three exercises have their solutions provided in Appendix C. The expanded number of problems (numbers
4 and 5) provide a wider range of applications for the particular statistical procedure. The answers to
these two additional problems are only made available for the professor through the companion website
(study.sagepub.com/aldrich3e) so that they may serve as lab work, homework, and/or test questions.
As in the first and second editions, this third edition can be used in conjunction with an instructor or as a
self-instructional guide. It retains the well-received bulleted points, which inform the SPSS user, in exacting
terms, what has to be done to accomplish certain statistical operations. The numerous screenshots are
complemented with a generous supply of callouts that are used to direct the reader’s attention to specific
control points. It is thought that as the student progresses through the chapters, many of the detailed bullet
points will become unnecessary. One reason for the detailed instructions was to make it possible to pick up
the book and turn to any statistical procedure (e.g., Logistic Regression) and conduct the analysis with little
prior knowledge of the IBM SPSS Statistics software package. In this capacity, the book performs well as a
“how to” manual.
A website hosted by SAGE has been added to the third edition. This website features two major sections, one
for the professor and another for the student.
Both student and instructor-facing sites include step-by-step SPSS tutorial videos created by the author
that provide screencast demonstrations of major concepts from each chapter. Videos can be shown in class
or watched at home for study and practice.
Visit study.sagepub.com/aldrich3e
The professor-facing section provides direct access to all datasets used in the book. Some of these datasets
are only made available for the professor and not the student. This is because this book considers data
structuring and entering as an important part in learning how to use the SPSS Statistics software package.
The book goes into great detail on how to structure and then enter data for analysis. The restricted access
to selected datasets allows the professor to decide which datasets should be made available to the student.
Given this restricted access, the professor is able to decide how much time the student spends structuring
and entering data. All datasets are described in the next section of this preface.
The professor-facing side of the website also provides the answers, with explanations, for exercises 4 and
5 found at the end of each chapter. These two exercises and answers make excellent source material to
test student chapter competence. These two additional problems could also be assigned as lab or homework
material. This section also contains a mix of 10 true/false and multiple-choice quiz-type questions. Also
included for each of these 10 questions are answers, cognitive domain according to Bloom’s taxonomy,
answer location, and difficulty level.
The student-facing section contains many of the datasets, but not all. As mentioned above, selected datasets
appear only on the professor-facing section of the website. These hidden datasets are intended to provide
the student with the challenge of directly structuring and entering data. An example of such a dataset can
be found in Appendix A. The major portion of Chapter 5 is devoted to step-by-step instructions on how to
structure and enter the data found in Appendix A. Other datasets are found throughout the book and are
intended to add to the student’s data entry learning experience. The student-facing section also contains 10
true/false and multiple-choice practice quiz questions with answers.
All datasets used in this book, except the SPSS sample files, are presented in the text in table format. These
data tables are provided for those users not having direct access to the companion website. There is also a
complete list of all datasets, by chapter, used in this third edition that can be found in Appendix D.
As in any book concerned with data analysis, many datasets and a large amount of data are required. There
are three main categories of datasets used in the third edition: (1) SPSS sample files, (2) datasets used as
chapter examples, and (3) datasets used in Review Exercises.
Notice that many datasets from the SPSS sample files are used throughout the book. These datasets are
the result of data manufactured, by SPSS, for instructional purposes. These datasets were installed as part
of the SPSS software package. Instructions on how to open these important files are given in Section 3.6.
These IBM datasets can also be directly downloaded from the companion website. These SPSS sample files
are used in the chapters as examples and also in many of the Review Exercises. These require that you
download them and then follow bullet points to perform the analysis. Datasets with only a name and “.sav”
are identified as SPSS’s sample files.
The data used in chapters explaining statistical procedures can be SPSS sample files or datasets created by
the student. Many of the earlier chapters give very specific instructions on how to input the variable and data
information that is provided in tables. Doing this is intended to teach the student how to structure variables and
data in a manner “understood” by the SPSS software. The datasets used in chapter examples are available
on the companion website (study.sagepub.com/aldrich3e), some with access for the professor only. This is
done to give the professor the option of making them directly available to the student—as some are lengthy
and can be time-consuming to enter. The naming of these example datasets follows the same general pattern
of hospital_expl_chap12.sav. Note the “expl” and “chap12” in the naming of these datasets. Datasets with
“expl” in their name are used in chapter examples.
The datasets used in the Review Exercises at the end of each chapter are also a combination of SPSS
sample files and student-entered data. Many of these can be downloaded directly from the companion
website by the student. These student-created datasets have the following naming convention:
prob_24.3_church_attend.sav. Note that these datasets always begin with “prob”; datasets beginning with
“prob” are used in the Review Exercises.
In some cases, actual data were used, such as the dataset listed in Appendix A, and Tables A.1 and
A.2, called class survey1.sav. However, in most instances, especially in the Review Exercises, the data
were manufactured for the purpose of demonstrating a particular statistical technique. The results of the
demonstrated analysis should be considered as a demonstration of a statistical process—not as research
facts. We encourage readers to use their own data to duplicate some of the techniques illustrated in this book.
Book’s Uniqueness
A novel approach taken in this book is the inclusion of parametric and nonparametric statistical tests in
the same chapters. Other books describe parametric and nonparametric tests in separate chapters, which
tends to add unnecessary confusion. Placing of the nonparametric and parametric tests together in the same
chapter is convenient to learning how and when to use these tests.
The book is unique in that it encourages the reader to interact with IBM SPSS on the computer as he or she
works through the examples and Review Exercises in each chapter. Every effort has been made to ensure
that the book is “user-friendly” as the reader is guided through the interactive learning process. Bulleted
phrases provide straightforward step-by-step instructions that are followed by the reader to successfully
complete the statistical procedures.
This third edition of Using IBM® SPSS® Statistics: An Interactive Hands-On Approach continues to be a
useful resource for readers who have some background in statistics. However, it will also provide a wealth of
basic information to those individuals who know little or nothing about statistics. What this means is that this
book is for those who want SPSS to do the actual statistical and analytical work for them. They want to know
how to organize and code the data and then enter it into SPSS in a way that allows SPSS to make sense
of them. Once this is accomplished, they want to know how to ask SPSS to analyze the data and produce a
report with tables and charts in a manner understood by the user. In short, they want the IBM SPSS Statistics
software package to accomplish the tedious work needed to successfully accomplish statistical analysis!
All chapters include bullet points, screenshots, and callouts showing the reader exactly how and where to
enter SPSS commands. Chapters and appendices are briefly described next.
The material covered in Chapters 1 through 8 provide basic but essential information regarding navigating
in SPSS, getting data in and out of SPSS, and determining the appropriate level of measurement required
for a specific statistical procedure. Chapters 5 and 6 describe additional methods for entering data, entering
variable information, computing new variables, recoding variables, and data transformation. In Chapter 5, you
will enter variable descriptions and raw data from an important dataset (class_survey_1_expl.sav) found in
Appendix A. This dataset will be used in many of the subsequent chapters. Chapter 7 provides directions
for printing files, the output from statistical analysis, and graphs. Chapter 8 describes and explains the Help
Menu available in SPSS and how to find information on various statistical tests and procedures.
Chapter 9 describes and explains basic descriptive statistics. Chapters 10 and 11 provide hands-on
experience in creating and editing professional-quality graphs for data at all levels of measurement.
Chapters 12 through 21 provide hands-on experience in employing the various statistical procedures and
tests available in SPSS, including both parametric and nonparametric tests.
In this section, you will find chapters on correlation, regression, and factor analysis.
Appendices.
Appendix A contains an essential dataset that is entered by the student in Chapter 5. It is named and
saved—as class_survey_1_expl.sav— and then used throughout the book. This dataset is then
modified—saved as class_survey_2_expl.sav—and also used in other chapters. Appendix B also provides
the reader with examples of normal curve probability problems. The appendix includes figures of various
normal curves, a z-table and three exercises demonstrating the use of the z-table and its application to the
normal curve. Appendix C gives the answers and detailed explanations for the first three Review Exercises
that are provided at the end of each chapter. Appendix D presents a comprehensive list, by chapter, of all
datasets used in the third edition.
As the reader will note in the first lesson in Chapter 1, a simple format is used to assist the reader in
responding to requests. The reader will be moving the mouse around the computer screen and clicking and
dragging items. They will also use the mouse to hover over various items in order to learn what these items
do and how to make them respond by clicking on them. Things the reader should click on or select are in
boldface. Other important terms in the book are in italics. Still other items are sometimes enclosed in quotes.
The reader will often be requested to enter information and data while working through the examples and
exercises in this book. To help in this procedure, we often present figures that show SPSS windows and then
show exactly, using step-by-step bulleted points, where to enter this information or data from the keyboard.
And, at times, we use callouts in combination with screenshots to clearly show control points and where to
click or unclick specific items.
New to the third edition is the development of a companion website (study.sagepub.com/aldrich3e) that
makes it possible for the student to directly download many of the lengthier datasets presented in the Review
Exercises. Some of the datasets used in the chapter examples can be downloaded from the book’s website
but only with the permission of the professor.
In Summary
The IBM SPSS Statistics program is an outstanding, powerful, and often intuitive statistical package. A
primary reason for writing this book was to make the benefits of the SPSS program available, not only to the
novice, but also to the more experienced user of statistics. We feel this third edition is appropriate for lower-
division and upper-division undergraduate courses in statistics and research methods. As the book expanded
over the two prior editions, it has proven to be useful for students at the master’s and doctoral levels as well.
The book has been shown to be helpful for students and professionals seeking an introduction to the more
complex statistical methods and how they are handled by SPSS. Students have also found value in the text
when doing the analysis required when writing and researching thesis and dissertation projects.
SPSS tutorial videos on key topics from the book are available online at study.sagepub.com/aldrich3e
Notes
1 IBM and SPSS are registered trademarks of International Business Machines Corporation.
Acknowledgments
I first thank my students, who for many years followed my often hastily written instructions on how to get
SPSS to do what it was supposed to do. Continued thanks and appreciation go to James Cunningham, who
initiated and coauthored the first edition. I also thank Hilda M. Rodriguez for her careful and tireless review of
all the SPSS steps and screenshots presented in all three editions of this book.
I wish to thank the professionals at SAGE Publications for their valuable contributions to the publication of
this book. They were always there, from the initial drafts, throughout production, and finally to marketing.
My first experience with SAGE was excellent, but it has only gotten better over the years while publishing
two books and the third edition of this one. I realize more and more that publication requires a team of
dedicated individuals, and I am proud to work with such a team envisioned by SAGE’s founder, Sara McCune.
My first contact with SAGE was with, now retired, Vicki Knight, who saw the merit in the first edition and
made this writing project possible. The new acquisitions editor, Leah Fargotstein, moved seamlessly into that
position and always had words of encouragement as, together, we structured a much revised third edition.
Editorial Assistant for Research and Statistics Elizabeth Wells and Production Editor Andrew Olson always
kept me on track during the editing and production process. Content Development Editor Chelsea Neve was
always responsive to my many questions and concerns. Thanks also to Barbara Coster for her excellent
proofreading. Glenn Vogel produced a perfect cover for the book. Many thanks to Shari Countryman,
marketing manager, and Jade Henderson, marketing associate, for their efforts in bringing our work to the
attention of potential users. I am especially thankful for the work done during the copy editing process by Terri
Lee Paulsen. Terri’s attention to detail was very much appreciated and played a major role in the professional
appearance of this writing project. I also wish to thank Maria Sosnowski for a superb job on indexing.
We also thank V. Monica Young (Author’s Program) and Amy Bradley (External Submissions) at IBM Chicago
for their timely assistance in programming and permissions requirements. Kevin Renn of IBM’s External
Submissions facilitated permissions allowing the downloading of the SPSS Sample files directly from the
book’s SAGE website.
I, along with SAGE, would also like to acknowledge the contributions of the following reviewers:
James O. Aldrich (Doctor of Public Administration, University of La Verne) is a retired lecturer on statistics
and research methods at California State University, Northridge. He has also taught graduate-level research
courses for the University of La Verne. Dr. Aldrich held the appointment of Instructor in the Department of
Pathology at the University of Southern California, School of Medicine, where he served as the principal
investigator and codirector of a National Cancer Institute research project. He has served on various
committees for the Los Angeles chapter of the American Statistical Association and has also taught
biostatistics, epidemiology, social statistics, and research methods courses for 20 years. The primary
computer program used for his coursework has been the IBM SPSS Statistics software package. SAGE
published, in 2013, Building SPSS Graphs to Understand Data, coauthored with Hilda M. Rodriguez.
First Encounters
Hi, and welcome to the IBM SPSS Statistics software package. The value in learning how to utilize the power
of the SPSS program has the potential to have a tremendous impact on one’s life. This impact can be espe-
cially evident when one is searching for the truth, which is often hidden within a mass of data. One way to
discover truth is to utilize the scientific method and mathematical calculations; both of these are made more
easily accessible by using SPSS.
Science, when reduced to its simplest definition, can be said to be the measurement and analysis of obser-
vations made by humans. Taking this into account, one of the reasons for writing this book was to provide
readers with the knowledge to use the power of the IBM SPSS Statistics software package to measure and
analyze data from their own research/observations. This first chapter begins with a demonstration of the entry
and analysis of data. It is the ability to analyze one’s own data, see them come to life, that makes data analy-
sis an exciting adventure into the unknown. Many (or most) of the SPSS instructional textbooks only utilize
existing datasets and provide minimal, if any, guidance on how to structure and enter data. Therefore this
first chapter, and the entire book, continues with the philosophy that it is wise to know how to enter personal
data into the IBM SPSS software package. On leaving the academy and finding work in the real world, the
ability to analyze data using SPSS can prove extremely useful in advancing one’s career. This third edition
continues to provide the reader with many opportunities for actually entering data, not just opening existing
datasets. Readers are encouraged to enter their own personal data as this makes the discovery process that
much more exciting. There are few things in research that are more rewarding than making that final click on
the mouse and watching your mass of numbers come to life with new meaning and purpose. Whether it’s a
graph, a prediction equation, or perhaps a test showing a statistically significant difference between groups,
the discovery of the unknown that was hidden within the data can be extremely gratifying. The rewards of
data analysis can give, and often have given, new meaning to the lives of researchers and to entire soci-
eties that benefit from discovery. The major purpose of this book is to assist the student/statistician in exactly
that—making discoveries with the assistance of the IBM SPSS Statistics software. This first chapter begins
a journey leading to an understanding of how to use SPSS in making such discoveries in an effort to find
statistical evidence supportive of truth.
With that being said, it is assumed that you know little about variables, values, constants, statistics, and those
other tedious things. But it is assumed that you know how to use a mouse to move around the computer
screen and how to click an item, select an item, or drag (move) an item.
An easy mouse-using and -typing convention has been adopted for you to respond to requests. For example,
if you are requested to open an existing file from the SPSS Menu, you will see click File, select Open, and
then click Data. In general, you will be asked to click an item, select (position the pointer over) an item, drag
an item, or enter data from the keyboard. Note that in SPSS, the columns in the spreadsheets run vertically
and the rows run horizontally, as in a typical spreadsheet such as Excel.
Objectives
In this section you are walked through your first encounter with SPSS and shown how to enter some data,
analyze those data, and generate a graph. Once these steps are completed you will have a better under-
standing of the data by viewing the table and graph.
If you see the IBM SPSS icon anywhere on the screen, simply click it; otherwise, locate your com-
puter’s program files, and open SPSS from there. Once the SPSS starts, a screen will appear, which
can take different forms depending on the SPSS version you are using. There are some useful short-
cuts in these SPSS opening windows, but for now simply close the window. When the window clos-
es, you will see the Data Editor spreadsheet on the screen. This screen can appear in two different
ways depending on which tab is clicked at the bottom of the Data Editor screen. These two tabs,
Data View and Variable View, are together called the SPSS Data Editor. When you wish to enter or
view variable information, you click the Variable View tab, and when you wish to enter or view data,
you simply click the Data View tab. Figures 1.1 through 1.4 provide pictures of various portions of
these two screens.
Let’s get started with a demonstration of the bullet point part of this introduction to SPSS. Within the text of
various bullet points you will often find parentheses containing figure numbers. These referenced figures will
assist the SPSS user when following the step-by-step instructions.
Figure 1.1 ⬢ Upper-Left Portion of the Variable View Screen of the SPSS Data Editor
Figure 1.2 ⬢ Lower Portion of the Variable View Screen of the SPSS Data Editor
• At the top of the screen, type the word Eagles in the cell (this is the cell below Name and to the right
of Row 1). The callout (balloon) shown in Figure 1.3 points to the cell in which you are to enter the
variable name “Eagles.” Cells are the little boxes at the intersection of columns and rows.
• Click the cell below Measure, and select Nominal (see Figure 1.1).
• At the bottom of the screen, click Data View (note that the screen’s appearance changes slightly).
• You will now enter the number of eagles observed on five consecutive days at the top of Holcomb
Mountain. The callout in Figure 1.4 shows exactly where to type the number 3 (Row 1 and Column
1); for now, don’t worry about the decimal points (this will be addressed in the next chapter).
• Click in Row 2, and type 4; click in Row 3, and type 2; click in Row 4, and type 1; and finally click in
Row 5, and type 6. Your screen should now look as shown in Figure 1.4. If you make a mistake in
entering the numbers, just click the cell and reenter the correct number.
• After you have entered the five pieces of data, check carefully to see if the entries are correct. If they
are, save your work as follows: Click File, and then click Save As.
• A window titled Save Data As will open, in which you will enter a name for your work (throughout this
book, work like this is referred to as a project). For this project, enter the name eagles_expl.sav in
the File Name box (you will use this saved dataset later). The Look in box (located in the middle of
the window), showing where the file will be saved, should have an entry titled Documents (if not, then
click the black arrow and scroll until you see Documents). Click Save. Your data have now been
saved in the Documents section of your computer.
• An Output window may open; if so, close it by clicking the white “x” in the red box. Another dialog
box may open asking if you wish to save the output; click No.
• Let’s continue with the exercise. On the SPSS Menu at the top of the screen, click Analyze, select
Descriptive Statistics, and then click Frequencies. A window will appear titled Frequencies. Drag
Page 25 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Eagles to the Variable(s) panel, or click Eagles and then click the right arrow to place Eagles in the
Variable(s) panel (both methods work equally well).
• Click the Statistics button (the Frequencies: Statistics window opens). In the Central Tendency pan-
el, click Median and Sum, then click Continue.
• Click OK (another screen opens, titled Output IBM SPSS Statistics Viewer, which shows the results
of the analysis just requested). Look at Figure 1.5 for these results.
• On the Main Menu, click Graphs, select Legacy Dialogs, and then click Bar.
• The Bar Charts window opens; click Simple, and then click Values of Individual Cases. Click De-
fine.
• The Define Simple Bar: Values of Individual Cases window opens. Click Eagles and drag it to the
Bars Represent box, or click the right arrow to place Eagles in that box. Click OK. A simple bar
graph will appear in the same Output IBM SPSS Statistics Viewer screen below the table, as shown
in Figure 1.6.
After you have reviewed the graph, you will save the Output IBM SPSS Statistics Viewer screen, which con-
tains the results of your analysis and the graph. In the future this screen will simply be referred to as the
Output Viewer.
• In the Output Viewer screen, click File, and then click Save As.
• A window titled Save Output As will appear. In the File name box, type eagles_expl.sav. Note that
the file name is all lowercase and does not include any embedded spaces (blanks). The Look in box
indicates the location where your file will be saved and should have an entry titled Documents. Click
Save.
• After saving your work, your Output Viewer screen will remain. Click the white “x” in the red box
found in the top right corner to make it go away.
Congratulations! You have just used SPSS (perhaps for the first time) to analyze some data and provide some
statistical results and a graph. Looking at the Frequencies table shown in Figure 1.5, we see that 16 eagles
were observed over a period of 5 days with the median number per day of 3. The bar graph seen in Figure
1.6 provides the details regarding each day’s observations. For example, we see that Day 5 yielded the most
eagle sightings at 6, while the fewest were observed on Day 4, when only 1 was seen.
Admittedly, the statistical analysis and graph are not that exciting. But they do show you that SPSS is not dif-
ficult to use. Of course, you could have used a handheld calculator to do the same analysis in a few minutes.
But suppose you had 50 different variables, such as height, weight, eye color, and so on, and thousands of
cases for each of the variables! Using a calculator to analyze these data would be a monumental task. But
SPSS can do 10,000 cases just as easily as is shown above.
• If you wish to exit (quit using SPSS) at this time, click File, and then click Exit.
1.3 Summary
In this chapter, you learned how to enter variable names and data. You also learned how to generate a
basic table of statistics and a graph summarizing those statistics. In the next chapter, you will learn to
navigate in SPSS. You will be introduced to the Main Menu, the Toolbar editor, and the options avail-
able for these. Finally, you will be introduced to the various dialog boxes and windows in SPSS that
allow you to enter information regarding your variables.
Answers for Exercises 1, 2, and 3 can be found in Appendix C while answers for 4 and 5 can be found on the
professor-facing portion of the companion website (study.sagepub.com/aldrich3e).
1.1 You have classified the size of several fish that were caught in a “catch and release” fishing con-
test for children as small, medium, and large. The number of fish caught by the children are 32 small,
21 medium, and 11 large. Note: When inputting these data and information, you are not required to
enter the names for the categories of the fish (small, medium, large). SPSS calls these categories
Labels and Label Values. You will learn to input this information in the next chapter. Input the variable
information and data, and build a frequency table and a bar graph. Naming and saving this dataset
is optional.
1.2 One day you are sitting in your professor’s office getting help on regression analysis. His phone
rings; he apologizes but says that he must take the call. As you wait for him to end his phone call, you
scan his bookshelves and make mental notes of the titles. You arrive at the following: 15 books on in-
troductory statistical analysis, 12 on advanced statistics, 3 on factor analysis, 8 on various regression
topics, 13 on research methods, and 2 on mathematical statistics. You think to yourself, “Wow! This
guy must have an exciting life!” As in the previous exercise, don’t concern yourself with the category
labels for the textbooks. For now, just input the data and variable information, build a bar chart, and
generate a descriptive table. Naming and saving this dataset is optional.
1.3 There was a quarter-mile drag race held at the abandoned airport last week. The makes of the
winning cars were recorded by an interested fan. The results of her observations were as follows:
Chevrolets won 23 races, Fords won 19 times, Toyotas won 3, Hondas won 18, and KIAs won 8
races. As in the previous two exercises, don’t concern yourself with the categories’ labels for the
makes of the cars. Your task is to enter these data and generate a bar graph and a frequency table.
Naming and saving this dataset is optional.
1.4 The durability of outside house paint was studied under actual outdoor conditions. Four unique
brands—A, B, C, and D—were utilized in the test. A rating scale evaluated each paint’s durability
over a period of one year. The scale went from 1 to 10, with 10 being the most durable. The results
of the test follow: brand A = 6, brand B = 4, brand C = 8, and brand D = 9. You must now enter
these data, generate a bar graph (using legacy), and then name and save the dataset. Interpret the
graph and decide which brand is most durable and which is the least. Labels are not required at this
time—they are covered in the next chapter. Naming and saving this dataset is optional.
1.5 A demographer has collected the following data and must now use SPSS to build a simple bar
graph to display her findings. At this time don’t be concerned about labels as the x-axis will automat-
ically display the ordinal scale as shown in the table—just build the simple bar graph using SPSS.
Naming and saving this dataset is optional. The data are given in the following table:
• screens
• screening
• SPSS
• Graphs
• Bar charts
https://doi.org/10.4135/9781544318912
Name Type Width Decimals Label Values Missing Columns Align Measure
1=
Morning
Morning or Afternoon
class Numeric 8 0 None 8 Left Nominal
Class
2=
Afternoon
1=A
2=B
Student’s Predicted
predict_grde Numeric 8 0 3=C None 8 Left Nominal
Final Grade
4=D
5=F
1 = Male
gender Numeric 8 0 Gender None 8 Left Nominal
2 = Female
1 = Much
anxiety
3 = Little
anxiety
4 = No
anxiety
1=
Excellent
2 = Very
good
4 = Below
average
5 = Poor
1 100 83 1 2 4 2
1 50 68 3 2 2 1
1 78 68 3 2 2 1
1 50 78 3 1 2 1
1 97 74 2 2 3 2
1 41 71 3 2 2 1
1 30 72 3 1 1 2
1 31 83 2 1 1 1
1 71 63 2 2 2 1
1 85 89 1 2 3 1
1 86 93 2 2 2 1
1 67 64 2 2 1 1
1 52 100 2 1 2 1
1 88 83 1 2 4 1
1 25 23 1 1 1 1
1 100 100 1 2 2 2
1 14 71 3 2 2 2
1 60 75 3 2 1 2
2 93 84 1 2 2 1
2 94 93 1 1 4 1
2 90 89 1 1 2 2
2 78 80 2 1 2 1
2 50 84 3 2 1 1
2 74 50 2 2 3 1
2 62 93 2 1 4 1
2 80 81 2 2 2 1
2 87 97 1 1 2 1
2 25 61 3 2 1 1
2 50 82 2 1 2 1
2 99 93 2 2 3 1
2 50 64 3 2 1 1
2 100 100 1 2 2 1
2 66 62 3 2 2 1
2 50 100 3 2 1 1
2 100 94 1 2 4 1
2 26 53 4 2 1 3
2 41 72 4 2 1 1
The normal curve shown in Figure B.1 presents percentages and probabilities associated with areas under
the curve. This normal curve figure can be used, in conjunction with the z-table (Figure B.8), to identify
specific z-scores that correspond with those areas. These z-scores are then used to answer various research
questions presented in the remaining pages of this appendix. Important Note: All the examples below utilize
the critical values when the rejection area is .05 (SPSS’s default), but the same principles hold true for any
other level such as .01, .001, or perhaps .1.
To illustrate how SPSS uses the standard normal curve (the model) to estimate unknown sample values, let’s
look at the following examples.
Pickup trucks carrying cut firewood for home delivery were checked for weight at a state highway weigh
station. The weights of the individual loads of firewood were recorded for a period of 1 year and found to
have a mean weight of 1,500 pounds and standard deviation of 100. It was also noted that the weights very
closely followed the normal curve model. The weigh station chief wanted to take a random sample of next
year’s trucks passing through the weigh station rather than stopping and checking every truck. The chief then
asked the department’s statistician to develop a random sampling plan that was based on the prior year’s
experience with such trucks. This was an attempt to stop fewer firewood- carrying trucks and thus release
personnel to pursue more revenue-generating activities—like stopping bigger trucks. He wanted to determine
certain expected outcomes based on the sample data. Three questions will serve to illustrate the value of the
normal curve (as a model) in determining the probability of detecting certain weights of firewood in the sample
trucks.
Question 1. Determine the probability that a truck carrying between 1,450 and 1,600 pounds of firewood
would pass through the weigh station. Also report the percentage of trucks that would likely be carrying
firewood weighing between 1,450 and 1,600 pounds.
We begin by converting the basic raw scores, 1,450 and 1,600, to z-scores that can then be used with the
z-table (Figure B.8) and the normal curve (Figure B.1) to identify probabilities and percentages.
¯
x1 − x 1450 − 1500
z1 = = = − .5
S 100
¯
x2 − x 1600 − 1500
z2 = = = 1.0
S 100
You now have the z-scores needed to read and interpret the normal curve’s areas under the curve. Look at
Figure B.2, showing two separate curves: the first is the normal curve, which is the model, the second is the
raw data, which approximates the model. If we look at the model and determine the proportion of the area
under the curve, we can infer the same for the curve that represents the raw data. Consulting the z-table
(Figure B.8) we determine that if you have a z-score of –.5, then the cumulative area under the curve up to
that point is .3085. Consulting Figure B.8 once again we see that the cumulative area under the curve up to a
z-score of 1.0 is .8413. By subtraction (.8413 – .3085 = .5328) we find that the area under the curve that we
are interested in is .5328. In other words, by determining that the area under the model’s curve is .5328, and
since the raw data approximates the same curve, we infer that the same area is under the curve representing
the raw data. Since .5328 is a proportion of the total area under the curve (which is 1), we may now state that
there is a .5328 probability (just a little over half) that a load of firewood will weigh between 1,450 and 1,600
pounds. It can also be said that 53.28% of the pickup trucks would carry between 1,450 and 1,600 pounds of
firewood.
Figure B.2 Normal Curve (the Model) and the Raw Score Curve
Question 2. What percentage of firewood loads would be expected to weigh at most 1,680 pounds? What is
the probability that a randomly selected truck will have a load of firewood weighing at most 1,680 pounds?
It should be noted that the curve in Figure B.3 in this example does not show the model’s curve but only the
curve showing the raw data and its z-transformation. Both curves were shown in the first question in order
to facilitate the visualization of the relationship between the model and the raw score curves. Now, on to
answering question 2.
¯
x − x 1680 − 1500 180
z = s = 100 = 100 = 1.8
Now that we have a z-score of 1.8, the normal curve table (B.8) is consulted and we see that the cumulative
area under the curve, up to a z-score of 1.8, is .9641. Figure B.3 makes it possible for the reader to visualize
the relationships between raw score, the z-score and area under the curve.
The results indicate that 96.41% of the firewood loads carried by the pickup trucks would weigh at most 1,680
pounds. The probability that a randomly selected truck will carry firewood weighing at most 1,680 pounds is
.9641.
Question 3. What percentage of firewood loads would be expected to weigh at least 1,550 pounds? What is
the probability that a randomly selected truck will carry a load of firewood weighing at least 1,550 pounds?
As before, we begin by converting the raw score of 1550 to its z-score equivalent.
¯
x − x 1550 − 1500 50
z = s = 100 = 100 = .5
Now that we have the z-score of .5 we can look at the Figure B.8 and see that the cumulative area up to
a z-score of .5 is .6915. Figure B.4 makes it possible to once again visualize and examine the relationships
between raw score, z-score, and the area under the curve. The cumulative area of .6915 must be subtracted
from 1 since we are only interested in firewood loads that weigh “at least” 1,550 pounds. Hence, we have 1 –
.6915 = .3085.
The results indicate that 30.85% of the firewood loads carried by the trucks would weigh at least 1,550
pounds. The probability that a truck would be carrying a load weighing at least 1,550 pounds is .3085.
There are many, many more types of normal curve problems. The purpose here was to demonstrate the
important theory that underlies the practical use of the IBM SPSS Statistics software package to solve such
problems.
The rejection area under the curve is the area describing the probability of the occurrence of rare outcomes.
Another way of referring to such “rare” outcomes is to think of them as having occurred by “chance.” This is
especially important when we are seeking to reject a null hypothesis of equality. An example would be when
we are looking for differences between two groups, say on test scores. Such research would set the null
hypothesis to state that the groups are equal. You look to reject the null, which then indicates that there is
a statistically significant difference. If the observed difference, as shown by the results of the test, exceeds
what would be expected by chance, then that value would fall in the rejection area of the curve. Three normal
curves are shown in Figures B.5, B.6, and B.7, which makes it possible for the reader to see various rejection
areas.
Once you calculate the test statistic, you would see if the statistic falls in either the upper or lower rejection
area as shown in Figure B.5. Is the value more positive than 1.96 or more negative than –1.96? The values of
1.96 and –1.96 are known as critical values. If the calculated statistic is more positive or more negative than
the critical values, then the null is rejected and you have a statistically significant difference. The particular
case, as shown in Figure B.5, is also known as a two-sided or non-directional test, which is more fully
explained in Section B.3.
In Figure B.6, you have the situation where you are only looking for a test statistic value greater than 1.65.
Such a finding would then permit the rejection of the null and once again you have a statistically significant
difference that provides evidence in support of the alternative hypothesis. This is known as a one-sided or
directional test and is also discussed in the following section.
Figure B.7 shows the situation where the researcher hypothesizes that one group would have a z-score that
was significantly lower than the other. In this case you hope that the data generates a value more negative
than the critical value of –1.65, which would land in the rejection area and permit the rejection of the null
hypothesis. Such an outcome would then provide evidence in support of the alternative hypothesis.
For this section on directionality, you are directed to look at the three normal curves given in the prior section,
B.6, and B.7.
If the researcher has evidence that a difference would only occur in one direction, then a one-tailed test of
significance could be used. An example might be an owner of a coffee vending machine service who suspects
that her machines are dispensing too much coffee. She might obtain a random sample of her machines and
determine the amount they are dispensing and then test whether the observed amounts are greater than
some desired amount. This is known as a one-sided or directional test as all of the rejection area is on one
side of the curve. Look at Figure B.6. She could also be concerned that the machine might be dispensing too
little coffee. In that case, look at Figure B.7 and you see that in order to reject the null hypothesis you would
need a value more negative than –1.65.
If the coffee machine owner has no evidence of direction (too much or too little coffee), then a two-tailed test
of significance is appropriate. If we apply the concept of a two-sided test to the coffee machine scenario,
we would say that she is concerned as to whether the machines are either dispensing too little or too much
coffee. She would be looking for test values either more negative or more positive than 1.96. This outcome is
depicted in Figure B.5.
Note on SPSS. Each statistical test in SPSS lists results for a two-sided test. This is because the differences
for the rejection area, in a directional and non-directional test, are very small. If the test is actually a one-sided
test, it is slightly easier to obtain a significant difference at a given level of significance. For a significance
level of .05, the rejection area for a directional test is all on one side of the curve, making it easier to reject the
null. For a two-sided (non-directional) test, the .05 area is divided by 2; thus, you have a rejection area of .025
on each side of the curve. If the direction of the test is critical, then there is a way to have SPSS perform the
one-sided version. It is a little tricky, but it can be done. Very briefly, when setting up the test as presented in
Chapters 13, 14, and 15, you must change the confidence level. For instance, if you want to have a one-sided
test with the rejection area of .05, you change the confidence level to 90% from the SPSS default of 95%.
This then puts the area of .05 on both sides of the curve—making it a pseudo non-directional test. However,
the SPSS output still specifies the two-tailed significance level Sig. (2-tailed), which you must then divide by
2 for the exact probability level.
The term degrees of freedom is defined as the number of observations free to vary around a population value
(parameter). In short, the number of degrees of freedom is the maximum number of variates that can freely
be assigned before the rest of the variates are completely determined.
Let’s consider a concrete example to help clarify the concept of degrees of freedom. Consider the following:
2 + 4 + 6 + ? = 20. You can determine the missing number by subtracting the sum of the first three numbers
from 20, yielding 20 – 12 = 8. There are no degrees of freedom. Now, consider the following: 2 + 4 + ? + ? =
20. There is an unlimited combination of numbers, the sum of which is 14, including 7 + 7, 6 + 8, 9 + 5, 3.3
+ 10.7, and so on. However, if you choose, for example, the number 11 as one of the missing numbers, then
the other number is determined, and in this case, it is 3. Consequently, in this example, there is 1 degree of
freedom.
Each statistical test of significance in SPSS has a particular formula for determining the degrees of freedom.
You will see the result of SPSS’s calculation of degrees of freedom in output tables designated as df.
1.1 You have classified the size of several fish that were caught in a “catch and release” fishing
contest for children as small, medium, and large. The number of fish caught by the children are
32 small, 21 medium, and 11 large. Note: When inputting these data and information, you are not
required to enter the names for the categories of the fish (small, medium, large). SPSS calls these
categories Labels and Label Values. You will learn to input this information in the next chapter. Input
the variable information and data, and build a frequency table and a bar graph. Naming and saving
this dataset is optional.
Answer:
1.2 One day you are sitting in your professor’s office getting help on regression analysis. His phone
rings; he apologizes but says that he must take the call. As you wait for him to end his phone call,
you scan his bookshelves and make mental notes of the titles. You arrive at the following: 15 books
on introductory statistical analysis, 12 on advanced statistics, 3 on factor analysis, 8 on various
regression topics, 13 on research methods, and 2 on mathematical statistics. You think to yourself,
“Wow! This guy must have an exciting life!” As in the previous exercise, don’t concern yourself with
the category labels for the textbooks. For now, just input the data and variable information, build a
bar chart, and generate a descriptive table. Naming and saving this dataset is optional.
Answer:
1.3 There was a quarter-mile drag race held at the abandoned airport last week. The makes of the
winning cars were recorded by an interested fan. The results of her observations were as follows:
Chevrolets won 23 races, Fords won 19 times, Toyota won 3, Hondas won 18, and KIAs won 8 races.
As in the previous two exercises, don’t concern yourself with the categories’ labels for the makes of
the cars. Your task is to enter these data into SPSS and generate a bar graph and a frequency table.
Naming and saving this dataset is optional.
Answer:
2.1 You have designed a data-collecting instrument that has the following five variables measured at
the scale level (labels are given in parentheses; decimals are set to 3 and align to center): (1) “miles”
(speed in miles per hour), (2) “kilometers” (speed in kilometers per hour), (3) “hours,” (4) “minutes,”
and (5) “seconds.” Input this information into the Variable View screen, and then enter four cases of
fictitious data in the Data View screen.
Answer:
2.2 You must set up the SPSS Data Editor to analyze the three variables listed below on 30,000
individuals. The variables are (1) “age” (label is age in years, no decimals, center-aligned and scale
data); (2) “education” (label is years beyond H.S., no decimals, center-aligned and scale data); and
(3) “family” (label is number of siblings, no decimals, center-aligned and scale data). Make up and
enter data for three cases—now you only have 29,997 more to enter!
Answer:
2.3 You are the range safety officer at a long-distance firearms training facility. You have collected
the ballistic information on four rifles—data are given below. You would like to set up a data file in
SPSS to collect many hundreds of similar cases in the future. The variables are (1) “caliber” (with two
decimals, center-aligned and scale data); (2) “five hundred” (with two decimals, label is 500-yard drop
in feet, center-aligned and scale data); (3) “one thousand” (with two decimals, label is 1,000-yard
drop in feet, center-aligned and scale data); and (4) “weight” (having no decimals, label is bullet
weight in grains, center-aligned and scale data). Set up the SPSS Variable View page for this range
safety officer. There is no need to enter data for this exercise; however, four fictitious cases are
shown in the table below.
Answer:
3.1 With this review exercise, you will open an SPSS sample file workprog.sav. Show the first eight
variables as they appear in the Variable View.
Answer:
3.2 In this review exercise, you must import an Excel file from your computer and show the
appearance of the Open Excel Data Source window and the first six rows of the Variable View
screen. There should be an Excel file used as a demonstration for the Excel data program within
your system files. Its name is demo.xls, and it will be opened as an SPSS spreadsheet; examine the
file and observe that you can analyze the data as in any other SPSS spreadsheet. Show the first six
variables as they appear in the Variable View.
Answer:
3.3 Open another one of SPSS’s sample files called customer_dbase.sav (it has 132 variables and
5,000 cases) and save it in your document files. It’s another dataset that you will use several times
throughout this book. Show the first 6 variables as they appear in the Variable View.
Answer:
4.1 The following is a list of variables that an investigator wishes to use to measure the health
and survival of a particular species of earthworm: age, length, weight, moisture content, breed,
environmental factors, and acid content of the soil. Your job is to assist this researcher in specifying
the correct levels of measurement for these key variables and set up the Variable View screen in the
SPSS software package.
Answer:
4.2 A social researcher is interested in measuring the level of religiosity of a sample of senior
citizens. Help her in establishing the levels of measurement for the following variables: “pray” (do you
pray?), “services” (number of times you attend formal church services per year), “money” (donated to
church), “volunteer” (hours per year of volunteer assistance), “member” (are you an official member
of a church?), “discuss” (how many times each week do you discuss religious doctrine?), and “times
pray” (how many times per week do you pray?). Input these variables into the SPSS Variable View
screen.
Answer:
4.3 A political consultant wished to measure the level of politicization of the candidates for a job at the
White House. He decided that the following variables would provide at least some of the evidence
required to assess the extent of their interest in politics: “vote” (did you vote in the previous election?),
“letter” (the most recent letter sent to a politician), “meeting” (the most recent political meeting
attended), “donate” (how much money did you donate to a politician in the past year?), “candidate”
(have you run for office?), “party” (are you a member of a political party?), and “campaign” (have you
worked on a political campaign?). Your job is to input these variables and their labels into SPSS and
specify their levels of measurement.
Answer:
5.1 A highway patrol officer wants to set up an SPSS file to record traffic violations. She wishes to
record data at the nominal, ordinal, and scale levels of measurement. The first item of interest (the
largest source of income for the highway patrol) is speeding. Input three variables that could record
speed at each level of measurement. The next item of interest is vehicle violations—in the same
dataset set up a variable at the correct level of measurement and with three categories, if necessary.
Impaired driving is another important violation. How would you measure and record information for
this violation? Show how these data could be collected and present the appearance of SPSS’s
Variable View.
Answer:
If you were actually entering data for the first variable (“speed”), then the observed speeds would be recorded
in miles per hour (mph). The following are the suggested values for the above categories, but you can dream
up your own—(1) speedcat: 1 = <10 mph, 2 = >10 and <20 mph, and 3 = >20 mph; (2) speedyesno: 1 = yes
and 2 = no; (3) vehicle: 1 = serious danger, 2 = minor danger, and 3 = both; and (4) impaired: 1 = intoxicated,
2 = medical reason, and 3 = tired driver.
5.2 A child psychologist is investigating the behavior of children in the play area of Balboa Park. Help
him set up an SPSS file to measure the following variables on individual children: length of time that
the child was observed, pieces of play equipment used, other children in the play area, interaction
with others, interaction with parent, and child’s general demeanor. Input these variables into SPSS at
measurement levels of your choosing but appropriate for the variable being analyzed. Show SPSS’s
Variable View once your information has been entered.
Answer:
The following are the suggested values for the categorized variables—(1) “interaction”: 1 = yes and 2 = no,
(2) “guardian”: 1 = yes and 2 = no, and (3) “demeanor”: 1 = sad, 2 = happy, and 3 = very happy.
5.3 The following sample data were collected by the owner of a private 640-acre forest reserve. He
did a sample of 10 acres as a trial survey for the entire reserve. He needs to set up and test a
computer file system using SPSS’s Data Editor. The 10-acre sample was subdivided into 2.5-acre
parcels, with each yielding the following data: hardwood trees, softwood trees, new-tree growth,
stage of decay for fallen trees, soil moisture content, and crowded conditions. Your dataset will have
four cases (4 × 2.5 = 10) and seven variables. Enter some fictitious data for the newly created
variables on the four 2.5-acre plots.
Answer:
The following are the suggested codes for the categorized variables—(1) “newgrowth”: 1 = low, 2 = moderate,
and 3 = high; (2) “decay”: 1 = low, 2 = moderate, and 3 = high; and (3) “crowding”: 1 = yes and 2 = no.
6.1 An urban planner was tasked with recording the walking speeds (in miles per hour) of people
at a downtown government center. He recorded walking speeds of the same individuals over a
period of 5 days. A sample of the data is provided in the chapter along with the variable information.
There are three variables. You are to set up a dataset in SPSS and then use the Compute Variable
feature to create a fourth variable called “avgspeed.” There are five cases (the 5 days) that record
walking speed for the same three individuals. Save the dataset as prob_6.1_walk_speed.sav, which
has the new variable (“avgspeed”) as speed—you will use it in the next exercise. This dataset
may be found in Review Exercise 6.1 and also can be downloaded from the companion website at
prob_6.1_walk_speed.sav.
Answer:
6.2 Open the dataset from Review Exercise 6.1 (prob_6.1_walk_speed.sav). Use the data from the
variable (“avgspeed”) created in that exercise to recode the values into a nominal (string) variable.
Your task is to use SPSS’s Recode into Different Variables feature and form two categories for the
average walking speeds of the individuals. The two categories are based on the average speed of
2.9 miles per hour for all walkers. All speeds above the mean are to be classified as Fast; those
speeds that are equal to or below the mean are classified as Slow. You will create a new nominal
or string variable called “catspeed.” Show the Data View with this new variable and how many were
classified as slow and fast. This dataset may also be downloaded from the companion website at
prob_6.2_walk_speed.sav.
Answer:
6.3 Using the data provided in Review Exercise 6.3, you must set up a dataset in SPSS and then
transform the data using the Compute Variable and arithmetic functions to calculate new variables
giving the log and square root of the original test scores. You must end up with a dataset consisting
of 10 cases and three variables named “test,” “logtest,” and “sqrttest.” Show SPSS’s Data View once
the operations are complete.
Answer:
7.1 Open the SPSS sample file called workprog.sav and print the data for the first six variables of the
first 10 cases.
Answer:
7.2 You have a meeting with your research group, and you wish to discuss some of the variables you
have selected for a project. The dataset is stored in an SPSS sample file called workprog.sav. What
is the quickest and easiest way to print all the information about your study variables?
Answer:
Use the File/Display Data File Information method to generate the Variable Information table in the Output
Viewer as follows:
7.3 You must analyze several variables of the workprog.sav dataset (SPSS sample file) and then
prepare a clean handout for your colleagues. You need basic descriptive statistics on all your scale
data and a frequency table for level of education for the categorical variables. Generate the output,
give it the title “Work Program Study,” and print the “cleaned” output.
Answer:
8.1 How can you use the SPSS Help function to get information on how to do a square root
transformation of data in an attempt to get a more normal distribution of values?
Answer:
Use the Help button on the Main Menu and Topics to obtain this information. Next click Data
Transformations>Computing Variables>Functions>Arithmetic functions.
8.2 You need help to determine the best way to summarize categorical (nominal) data for a public
presentation. As part of the summarizing process you also wish to build a graph from the data that
are stored as a large SPSS dataset.
Answer:
8.3 You have a large dataset opened in SPSS, and now you must summarize and describe several of
the nominal variables it contains. You decide to click Analyze on the Main Menu, then Frequencies;
the Frequencies window opens, and you realize that you need help—what do you do next?
Answer:
9.1 Open the SPSS sample dataset called bankloan.sav and calculate the following statistics for all
variables measured at the scale level: N, Range, Minimum, Maximum, Mean, Mean Std. Error, Std.
Deviation, Variance, Skewness, and Kurtosis. Print the SPSS Descriptive Statistics output that is
produced.
9.2 Open the SPSS sample dataset called bankloan.sav and produce frequency tables for all
variables measured at the nominal or ordinal level.
Answer:
9.3 Open the SPSS sample dataset called bankloan.sav and determine if the variables “age” and
“household income in thousands” are normally distributed.
Answer:
10.1 You are given the task of building a pie chart that summarizes the five age categories of 582
individuals recorded in an SPSS sample file titled satisf.sav. The variable is named “agecat” and
labeled Age category. Build a 3-D pie graph that displays the names of the categories, the numbers
of observations, and the percentages of the total for each slice. Also, answer the following questions:
(a) What percentage of people are 18 to 24 years old?; (b) How many individuals are 50 to 64 years
old?; (c) What is the largest age category?; (d) What is the quantity, and its respective percentage,
for the individuals who are 25 to 34 years old?; and (e) What is the quantity, and its respective
percentage, for the smallest category?
Answer:
(a) 7.90%; (b) 147; (c) 35 to 49 years old; (d) 127, 21.82%; (e) 32, 5.50%
10.2 You must summarize the data for 8,083 individuals. Specifically, you are tasked with building
a 3-D pie graph showing the number and percentage of people in each of four regions. Open
dmdata3.sav (found in SPSS or the companion website), select the discrete variable named
“Region,” and build a pie graph. Embellish the graph to display the names of the categories, the
numbers of observations, and the percentages of the total for the slices to answer the following
questions: (a) What is the largest region, and what percentage of the total respondents does it
represent?; (b) Which region is the smallest, and how many individuals does it have?; (c) What is
the number of customers, and its respective percentage, for the North?; (d) What is the number of
customers, and its respective percentage, for the West?; and (e) Rank the regions from the smallest
to the biggest.
Answer:
(a) East, 25.79%; (b) South, 1,965; (c) 1,967, 24.34%; (d) 2,066, 25.56%; (e) South, North, West, East
10.3 You must visually display the relationship between two categorical variables using the
population pyramid graphing method. The variables you use are found in the SPSS sample file called
workprog.sav. The variable “prog” (program status) is the split variable, and “ed” (level of education)
is the distribution variable. Build the population pyramid to compare these discrete variables split
into two groups of program status, 0 and 1. It has been determined that the categories of 0 and 1
are approximately equal; therefore, these distributions can be directly compared. Also, look at the
finished graph and answer the following questions: (a) By looking at the graph, does it appear that
program status and level of education are related?; (b) Which of the six categories contains the most
observations?; and (c) Which category has the least number of observations?
Answer:
(a) Looking at the graph suggests that there are no differences between level of education and program
status. This should be confirmed with a chi-square test but not now, as this test is not presented until Chapters
20 and 21. (b) Most of the observations are contained in the “did not complete high school” category, with a
program status of “0”—244 individuals. (c) The least number of observations are found in the “some college”
category, with a program status of “0”—92 individuals.
11.1 For this exercise, you will build a simple histogram using the SPSS sample file known as
workprog.sav. Open the file, select the variable named “age,” and build a simple histogram. Use the
graph you build to answer the following questions: (a) Are the data skewed to the right, skewed to
the left, or normal?; (b) How many of these 1,000 individuals were aged between 15.5 and 16.5?; (c)
How many were aged between 18.5 and 19.5?; (d) How many people are aged between 19.5 and
20.5?; and (e) What is the average age for these 1,000 individuals?
Answer:
(a) normal; (b) 60; (c) 240; (d) 180; (c) 18.48
11.2 Open the SPSS sample file called bankloan.sav and build a boxplot and answer the following
questions: (a) What are the minimum and maximum ages, excluding any outliers and extremes?;
(b) What are the limits of the interquartile range?; (c) What is the interquartile range for this age
distribution?; (d) What is the median value?; and (e) Does the graph depict normally distributed data
or perhaps a negative or positive skew?
Answer:
(a) 20 min. and 56 max.; (b) 29 and 41; (c) 41 – 29 = 12; (d) 34; (e) positive skew
11.3 For this exercise, you will open the SPSS sample file called workprog.sav and select two
discrete variables and one continuous variable to build a histogram paneled on both columns and
rows. For the x-axis use the variable labeled as Age in years and named “Age.” The discrete variable
named “Marital” and labeled as Marital status will be the column-paneled variable. The row-paneled
variable is named “Ed” and labeled as Level of education. Look at the finished graph, and answer
the following questions: (a) Which of the row panels (Level of education) contains the most people?;
(b) Which of the six groups had the most participants in the age interval 17.5 to 18.5 years, and how
many are in that group?; (c) Which group had a distribution of ages that most closely resembled a
normal distribution?; and (d) What is the shape of the distribution for unmarried individuals with a
high school degree?
Answer:
(a) Most of the individuals in the work program did not complete high school; (b) The group of married people
with a high school education was the most populated, with 70 individuals; (c) Of all the groups, the group of
married individuals without a high school diploma most closely resembles the normal distribution; (d) Positive
skew.
12.1 Your research associate needs to determine the standard error of the mean for the weights
of a random sample of 24 El Salvadorian pigs and also construct the 95% confidence interval for
the point estimate of the mean. For this exercise, you must open and download the dataset found
in the companion website called prob_16.1_pig_farmer.sav. The data may also be found in Review
Exercise 16.1.
Answer:
Use the Explore in Descriptives, move variables in the Explore window, and run the analysis. The below
shows a portion of the SPSS output providing the answer.
12.2 Raw data from a simple random sample of mid-level bank managers found a mean and
standard deviation of the sample. In the banking industry, such salaries are known to be normally
distributed. Of special interest to the researcher was a salary of $1,200 per week. When this amount,
$1,200, was transformed to a z-score, it was found to be equal to 1.60. Given this information, can
you state the probability that a randomly selected individual from the population of mid-level bank
managers would have a salary of at least $1,200? What percentage of individuals would you expect
to have salaries of at least $1,200?
Answer:
Consult the normal curve table in Appendix B (Figure B.8) and find that a z-score of 1.6 (– ∞ to z of 1.60) is
equal to a cumulative area of .9452 under the curve. Armed with this information we can state that that the
probability that a person would have a salary of “at least” $1,200 would be 1 − .9452 = .0548. The percentage
of individuals having a salary of $1,200 per week is 5.48%.
12.3 Open SPSS sample file bankloan.sav and determine if the variables with the labels of Age in
years and Household income are normally distributed. You may have to look at Section 9.4 for help
in selecting the correct test.
Answer:
The Kolmogorov–Smirnov test reveals that the two variables do not approximate the normal curve.
13.1 You are a seller of heirloom garden seeds. You have several machines that automatically load
the seeds into packages. One of your machines is suspected of sometimes being inaccurate and
under-loading or over-loading the packages. You take a random sample of 20 packages and record
their weights (in ounces). In the past it has been determined that the average weight should be 2.88
ounces.
Write the null and alternative hypotheses and select the correct statistical test to determine if the mean of your
sample provides evidence that the machine is malfunctioning. The dataset may be found in Review Exercise
13.1 or downloaded from the website at prob_13.1_garden_seeds.sav.
Answer:
Check the sample data for normality using the Kolmogorov–Smirnov one-sample nonparametric test. If
normal, then use the one-sample t test with a test value of 2.88. H0: µ = 2.88 and HA: µ ≠ 2.88. You fail to
reject the null and determine that the machine is within tolerance—it is not malfunctioning. The mean value of
your sample, 2.92, is not significantly different from 2.88.
13.2 There is an annual race to reach the top of Kendall Mountain in the small town of Silverton,
Colorado. A health scientist believes that the time to reach the top has significantly changed over
the past 10 years. The average time for the first 12 runners to reach the summit in the 2005 race
was 2.15 hours. She records the times (in hours) for the top 12 runners in 2015 as given in Review
Exercise 13.2.
Can you help the scientist by writing the null and alternative hypotheses and selecting the appropriate test in
an attempt to produce evidence in support of her research hypothesis? The data for these runners can also
be downloaded from the companion website at prob_13.2_mountain_race.sav.
Answer:
H0: µ = 2.15 and HA: µ ≠ 2.15. The significant t test of .044 allows you to reject the null and, therefore,
provides evidence in support of the alternative. There is now statistical evidence that the times are
significantly different. The direction indicates runners in the 2015 race having taken significantly more time to
reach the summit.
13.3 An instructor in a community college auto mechanics class asked his students if they thought
that gas consumption, as measured in miles per gallon (mpg), had improved for the Ford Focus in
the past 5 years. It was a team project; the students did some research and found that, on average,
the Ford Focus was rated at 24.3 mpg in 2010. No data for the current year (2015) were available,
so they worked out a random sampling plan and collected the data found in Review Exercise 13.3.
The plan was to somehow compare their sample data with the 5-year-old data. Can you assist these students
in writing the null and alternative hypotheses and selecting the correct statistical analysis? Do the analysis
in an attempt to develop evidence that supports the alternative hypothesis and therefore the students’ idea.
These data on miles per gallon can be entered directly or downloaded from the companion website at
prob_13.3_miles_per_gallon.sav.
Answer:
First of all, we note that we have scale data, and we will be comparing mean values for miles per gallon. A
t test comes to mind, but do we have a normal distribution? Let’s check for normality with the nonparametric
one-sample Kolmogorov–Smirnov test. The test indicates that the distribution is not normal, so we decide
on the nonparametric one-sample binomial test. In the Settings window, we use the binomial test and the
Binomial Options window with a Hypothesized proportion of .5 and a Custom cut point of 24.3 (the 2010
average miles per gallon). The results indicate that we must reject the null hypothesis. There is not an equal
distribution of categories above and below the mean of 26.642. The students must conclude that there is
a significant change in the miles per gallon over the past 5 years. Furthermore, we can conclude that gas
mileage has increased for the Ford Focus.
14.1 Two 12-man teams were randomly selected, one team from Marine Corps Air Stations Miramar
and another from Marine Corps Air Station Yuma. The two teams were to be compared on their
Combat Fitness Test (CFT). Their scores ranged from a low of 263 to a perfect score of 300.
Miramar scores:
267 278 295 280 268 286 300 276 278 297 298 279
Yuma scores:
263 272 286 276 267 284 293 270 272 296 279 274
The Yuma team leader and researcher had the idea that the scores were unequal. Can you help the Yuma
team leader write the null and alternative hypotheses and select the appropriate test(s) to see if there is
evidence in support his idea? The teams’ scores are given in the tables found in Review Exercise 14.1 or may
be downloaded from the companion website at prob_14.1_miramar_marines.sav.
Answer:
The data were measured at the scale level, and you are looking at the differences between means.
Furthermore, the samples are independent. If the distributions for both teams can be shown to approximate
the normal curve with equal variances, we would select the t test for independent samples. The
Kolmogorov–Smirnov test indicates that both teams had scores that are normally distributed, so we proceed
with the t test. The t test indicates that the variances are equal. Levene’s test fails to reject the null of equal
variances. The null hypothesis is stated as H0: µ1 – µ2 = 0, while the team leader’s idea is stated as the
alternative hypothesis, HA: µ1 – µ2 ≠ 0. We fail to reject the null; therefore, there is no statistical evidence
to support the team leader’s idea that the Combat Fitness Test scores are unequal. The Marines are equally
prepared for combat.
14.2 The local bank president had the idea that the money held in individual savings accounts would
be significantly different for males and females. A random sample of the dollars in male and female
savings accounts was recorded as presented in Review Exercise 14.2. Write the null and alternative
hypotheses and select the correct test to seek evidence in support of the bank president’s contention
that male and female saving habits are significantly different.
Complete the statistical test in an attempt to develop evidence supporting the bank president’s idea, which
is the alternative hypothesis. These data can also be downloaded from the companion website at
prob_14.2_bank_saving.sav.
Answer:
You have scale data, and the task is to determine if there is a significant difference between the mean savings
amounts for males and females. Since you have two independent groups (males and females) and scale
data, we first think of a t test. Check to see if the male and female savings distributions are normal by using
the Kolmogorov–Smirnov test. The test provides evidence that they are normal, so we proceed with the t test
for independent samples. Let’s state the null hypothesis as H0: µ1 – µ2 = 0 and the alternative (the bank
The t test shows equal variances (see Levene’s test) and a Sig. of .595, so we fail to reject the null hypothesis.
The t test does not detect a significant difference in the savings habits of males and females. There is no
statistical evidence in support of the bank president’s contention.
14.3 For this review exercise, you will select and open the SPSS sample file called bankloan.sav.
You will test for significant differences in the categories of education (“ed”) and whether they have
previously defaulted on a loan (“default”). There are five educational categories (“ed”) and two for
the “default” variable. Write the alternative hypothesis and null hypothesis, and use the appropriate
statistical test to see if the distribution of levels of education is the same for the categories that had
previously defaulted.
Answer:
You are looking for evidence of significant differences between five categories of education and whether the
individual defaulted on a loan. Since you have categorical data for both variables, you have to rule out a t
test. You do have independent groups (those who defaulted and those who did not), but you need a test that
will compare the ranks rather than the means. The Mann–Whitney U Test is designed to look for significant
differences between independent groups measured at the categorical level.
The alternative hypothesis is that the distributions of level of education are not the same for the categories
of those who have previously defaulted on a loan. The null hypothesis is that the distributions of level of
education are the same across the categories of those who have previously defaulted.
In the Nonparametric Independent Samples window, customize tests in the Objective tab; in the Fields tab,
customize assignments, and move the ordinal variable “education” to the Test Field box and the nominal
variable “default” to the Group’s box. In the Settings tab, customize tests, and select the Mann-Whitney test.
The test results indicate that you now have statistical evidence that the distributions of those who defaulted
on loans are not the same for different levels of education.
15.1 The researcher has the idea that listening to hard rock music can directly influence one’s
perception of the world. The professor randomly selects a group of 15 individuals from his
“Introduction to Psychology” lecture hall class of 300 freshman students. He wishes to test his theory
on his sample of 15 students by giving them the World Perception Test (WPT), having them listen to
loud hard rock music, and then re-giving the WPT. The test results are given in Review Exercise 15.1
or can be downloaded from the companion website at prob_15.1_rock_music.sav.
Write the null and alternative hypotheses and see if you can produce statistical evidence in support of the
professor’s idea that listening to hard rock music can actually change your perception of the world.
Answer:
The data were measured at the scale level, so you can look for differences in means, which suggests some
type of a t test. Since you are measuring the same individuals in a pretest–posttest design, you should use
the paired-samples t test to look for differences between the means. The null hypothesis is written as H0: µ1
There is a requirement that both pretest and posttest scores be distributed normally, so we first check for
normality using the Kolmogorov–Smirnov test. The outcome of that test is positive, so we proceed with
the paired-samples t test. The results of the t test show a Sig. of .000, indicating that we can reject the
null hypothesis of equality. We can say that we now have statistical evidence in support of the professor’s
contention that listening to hard rock music changes one’s perception of the world.
15.2 Data were collected from 10 major oil well drilling operators that recorded the number of hours
lost per week due to work-related accidents. A rigorous safety program was instituted, and the
number of lost hours was once again recorded following the introduction of the safety program. The
two tables in Review Exercise 15.2 present the safety data or they may be downloaded directly from
the companion website at prob_15.2_oil_well.sav.
A research consultant was hired to examine the data and determine if the safety program significantly
changed the weekly hours lost from on-the-job injuries. Write the null and alternative hypotheses, and select
and conduct the appropriate test to seek evidence that the program was successful.
Answer:
You have data measured at the scale level; therefore, you can look for significant differences between
means—this suggests a t test. You are measuring the same group twice in a pretest–posttest design;
therefore, the paired-samples t test would be the correct test. You should first check for normality, then
conduct the t test.
The Kolmogorov–Smirnov test indicates normality for both distributions. The null hypothesis is written as H0:
µ1 – µ2 = 0, while the alternative hypothesis is HA: µ1 – µ2 ≠ 0. The t test results in a Sig. value of .003,
informing us that the null can be rejected; we now have evidence that there is a significant difference in the
mean number of weekly hours lost due to on-the-job injury. Next, we can look at the Paired Samples Statistics
table and see that the mean hours lost due to injury was reduced from 49.6 to 44.4. The consultant can now
say that the safety program was a success.
15.3 A chemical engineer added a chemical to a fast-burning compound that changed the oxygen
consumption once the reaction started. He did a pretest to measure the oxygen consumption
index; then he added the chemical and recorded the posttest oxygen index. The data are given
in Review Exercise 15.3 or they may be downloaded from the companion website at
prob_15.3_chemical_eng.sav. Write the null and alternative hypotheses, select the correct test, and
look for significant differences between the pretest and posttest oxygen index values.
Answer:
Just by looking at the values in both distributions, you might conclude that they are not normal. To be sure,
conduct the Kolmogorov–Smirnov test, which agrees with your initial assessment of nonnormality. Since
you have non-normal data, you select the related-samples sign test known as the Wilcoxon Signed Rank
Test. Select the Analyze, Nonparametric, Related Sample approach, and click Customize analysis in the
Objective tab; move both variables to the Test Fields box, then click Customize tests in the Settings tab, and
select the sign and Wilcoxon tests.
The null hypothesis is written as H0: medianpretest = medianposttest, while the alternative hypothesis is HA:
medianpretest ≠ medianposttest. Both the sign rank and the Wilcoxon tests find statistical evidence in support
of the alternative hypothesis, since the null of equality is rejected. There were statistically significant changes
in the oxygen index when the chemical additive was added to the compound.
16.1 An El Salvadorian pig farmer, Jose, had the idea to add a by-product from the production of cane
sugar to his pig feed. The idea was that the pigs would eat more, gain weight, and be worth more at
market time. He had 24 weaner pigs weighing from 20 to 40 pounds. He randomly divided the pigs
into three groups of eight. He concocted three different feed types, each containing different levels
of the cane sugar by-product (low, medium, and high sugar content). The farmer decided to record
the pounds of feed consumed by each pig for 1 week. The data are given in Review Exercise 16.1
Answer:
You have scale data; therefore, you can look for differences between the means. There are three means for
the pounds of feed consumed; therefore, t tests are not appropriate—you must use the ANOVA procedure,
which provides for three or more means. You do the Kolmogorov–Smirnov test to find if all three distributions
are normal—they are. You next write the null hypothesis as H0: µ1 = µ2 = µ3 and the alternative hypothesis
as HA: One or more of the three feed types result in unequal feed consumption. If the null hypothesis (H01)
is rejected, then the following additional null hypotheses should be tested: H02: µ1 = µ2, H03: µ1 = µ3, H04:
µ2 = µ 3.
The ANOVA test finds a significant difference between the means (rejecting the null), so we look to the post
hoc analysis Scheffe’s Multiple Comparisons to specify the means that contributed to the significant F value
(12.499). We find the only significant difference in feed consumption to be between the high- and low-sugar
feed. We recommend that Jose continue to use the high-sugar feed based on the evidence that the pigs did
eat more.
16.2 A chemical engineer had three different formulas for a gasoline additive that she thought
would significantly change automobile gas mileage. She had three groups of 15 test standard eight-
cylinder engines that simulated normal driving conditions. Each group received a different gasoline
formulation (A1, A2, and A3) and was run for several hours. Simulated mileage was recorded for
each gasoline formulation as presented in Review Exercise 16.2 or they can be downloaded from
the companion website at prob_16.2_chem_gas.sav. Your job is to investigate the mileage numbers
in an effort to provide evidence in support of her contention that the groups would have significantly
different gas mileage. Write the null and alternative hypotheses. If you find a difference, therefore
Page 92 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
rejecting the null, you must identify the groups contributing to the significant F statistic with post
hoc analysis. Can you provide evidence in support of the chemical engineer’s contention that her
formulas will result in significant differences in the gas mileage for these test engines?
Answer:
You have scale data and three group means to compare; therefore, the logical choice for analysis is
ANOVA. You check the three mileage distributions for normality, and they are determined to be normal by
the Kolmogorov–Smirnov test. You next write the null hypothesis as H0: µ1 = µ2 = µ3 and the alternative
hypothesis as HA: One or more of the three gasoline formulations result in unequal gas mileages. If H01 is
rejected, then the following null hypotheses should be tested: H02: µ1 = µ2, H03: µ1 = µ3, H04: µ2 = µ3.
The ANOVA test computes an F value of 12.499, which is significant. You next do a Scheffe post hoc multiple
comparisons test and find significant differences for A1 and A3, and between A2 and A3. We can say that
we now have evidence in support of the engineer’s idea that there are differences in two of the three mileage
comparisons. Comparisons between A1 and A2 were found not to be significantly different.
16.3 Bacteria counts were taken at the four Southern California beaches of Santa Monica, Malibu,
Zuma, and Ventura. The researcher’s idea was that the different beaches would yield significantly
different bacteria counts. The data for the beaches are given in Review Exercise 16.3 or they may
be downloaded from the companion website at prob_16.3_bacteria.sav.
Check the distributions for normality—just by looking, you would suspect that they don’t approximate the
normal curve.
Select the correct testing approach based on your normality findings, and write the null and
alternative hypotheses. If you find significant differences in the bacteria counts at the four beaches,
do the additional work to identify the specific beaches that contribute to the overall finding. What is
the answer to the researcher’s idea that the beaches have statistically significant different bacteria
counts?
Answer:
The four distributions of bacteria counts at the beaches are found to be non-normal. Therefore, the one-way
ANOVA should not be used, and the Kruskal–Wallis test for nonparametric data is the test of choice. The null
is that the median ranks for the beaches are the same. The alternative hypothesis is that one or more of the
median ranks are different. The Kruskal–Wallis test shows that the bacteria counts were not the same for the
four beaches. The null of equality is rejected, with a significance level of .036. This finding requires that you
double click on Hypothesis Statement in the Output Viewer and then request the pairwise comparison. This
reveals that the overall significance is obtained from the rather large difference between the Santa Monica
and Ventura average ranks of 8.57 and 20.71, respectively. We can say that the researcher’s idea that the
beaches had different bacteria counts is only supported in one of the five comparisons.
17.1 The high school track team coach recorded the time taken for the 100-yard dash by eight team
members for three consecutive track meets during the regular season. His past experience informed
him that they would improve their times throughout the season as they grew stronger and smarter.
He had the idea that their level of improvement would qualify for statistical significance. Can you help
the coach write the null and alternative hypotheses, select the correct test(s), interpret the analysis,
and then answer his question? The coach’s data are presented in Review Exercise 17.1 or can be
downloaded from the companion website at prob_17.1_track_team.sav.
Answer:
The run times were measured at the scale level; therefore, you can look for differences between the means.
There are three measurements taken on the same individuals over a period of several weeks. The repeated
measures ANOVA would be the first choice. We must first check for normality with the Kolmogorov–Smirnov
test—the three run times are all found to approximate the normal curve.
We write the alternative and null hypotheses as follows. The alternative hypothesis is HA1: One or more of
the mean run times are unequal. The null hypothesis states the opposite and is written as H01: µ1 = µ2 = µ3.
In this example, H01 states that there are no differences between the means of the run times. The track team
coach would prefer to reject the null hypothesis, which would provide statistical evidence for the idea that the
run times are significantly different.
If there is evidence of overall significance, leading to the rejection of the null hypothesis (H01), the researcher
would most likely wish to identify which of the three groups are different and which are equal. The following
null and alternative hypotheses will facilitate that task. If the first null hypothesis (H01) is rejected, then we
may test the following additional null hypotheses: H02: µ1 = µ2, H03: µ1 = µ3, H04: µ2 = µ3. The alternative
hypotheses for these new null hypotheses are HA2: µ1 ≠ µ2, HA3: µ1 ≠ µ3, HA4: µ2 ≠ µ3.
17.2 A farm manager was interested in studying several first-time strawberry pickers over a period of
4 weeks. He felt that there was a significant difference in the number of pints picked per hour from
one week to the next. Can you help him write the null and alternative hypotheses, input the data,
select the correct tests, interpret the results, and answer his question concerning significant changes
in the number of pints picked? The data can be found in Review Exercise 17.2 or can be downloaded
from the companion website at prob_17.2_strawberries.sav.
Answer:
You have scale data (number of pints picked), and therefore you can select a test that compares the means.
The paired-samples t test won’t work since you have more than two separate time periods. You should use
the repeated measures ANOVA since you are measuring the same individuals at four different times. You
must first check the distributions of the number of pints picked for normality. The Kolmogorov–Smirnov test
indicates normal distributions for all four picking weeks, so you proceed with the ANOVA test.
The alternative and null hypotheses for overall significance and all possible comparisons are as follows: The
alternative hypothesis is HA1: One or more of the mean numbers of pints picked over the four time periods
are unequal. The null hypothesis states the opposite and is written as H01: µ1 = µ2 = µ3= µ4.
In this example, H01 states that there are no differences between the mean numbers of pints picked over the
four time periods. The farm manager would prefer to reject the null hypothesis, which would provide statistical
evidence for the idea that the numbers of pints picked are significantly different.
Page 100 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
If there is evidence of overall significance, leading to the rejection of the null hypothesis (H01), the farm
manager would most likely wish to identify which of the four time periods are different and which are equal.
The following null and alternative hypotheses will facilitate that task.
If the first null hypothesis (H01) is rejected, then we may test the following additional null hypotheses: H02:
µ1 = µ2, H03: µ1 = µ3, H04: µ1 = µ4, H05: µ2 = µ3, H06: µ2 = µ4, H07: µ3 = µ4. The alternative hypotheses
for these new null hypotheses are as follows: HA2: µ1 ≠ µ2, HA3: µ1 ≠ µ3, HA4: µ1 ≠ µ4, HA5: µ2 ≠ µ3, HA6:
The second table in the output, Descriptive Statistics, clearly indicates a steady increase in the mean number
of pints picked from Week 1 to Week 4. The production of these 10 workers is definitely on the increase, but
now we want to answer the question of statistical significance. We first examine Mauchly’s Test for Sphericity
and see the borderline significance value of .054 (remember, we don’t want to reject the null for this test), but
we make the decision to proceed. As you will see in the next table, there are tests other than Mauchly’s that
can guide our use of ANOVA. Also, the ANOVA has the ability to tolerate minor deviations. The following table,
Tests of Within-Subjects Effects, confirms our decision to proceed as the tests that do not assume sphericity
found significance for difference between the four times. The final table, Pairwise Comparisons, identifies
significant differences between all possible time comparisons. All of the null hypotheses were rejected; thus,
we have evidence to support the alternatives. We can definitely inform the farm manager that, for these
workers, there is a statistically significant increase (look at the direction of the means) in strawberry production
over the 4 weeks.
Page 101 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 102 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
17.3 For this exercise, you must open the SPSS sample file called bankloan.sav. A bank president is
interested in comparing the last three variables: “preddef1,” “preddef2,” and “preddef3.” These three
variables were three different models created to predict whether a bank customer would default on
a bank loan. Since they were all created from the same basic information, we can treat them as the
same object and analyze them using some type of repeated measures method. Can you help the
bank president in determining if the three models are the same? Also, write the null and alternative
hypotheses that you will be testing.
Answer:
For the repeated measures for scale data, we might use the ANOVA approach, but the Kolmogorov–Smirnov
test shows non-normality for all variables.
Let’s write the alternative and null hypotheses for this design. The alternative hypothesis is HA1: One or more
of the three prediction models resulted in unequal default predictions. The null hypothesis states the opposite
and is written as H01: µ1 = µ2 = µ3.
In this example, H01 states that there are no differences between the prediction models. The bank president
Page 103 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
would prefer to reject the null hypothesis, which would provide statistical evidence for the idea that the
prediction models are significantly different.
If there is evidence of overall significance, leading to the rejection of the null hypothesis (H01), the bank
president would most likely wish to identify which of the three models are different and which are equal. The
following null and alternative hypotheses will facilitate that task. If the first null hypothesis (H01) is rejected,
then we may test the following additional null hypotheses: H02: µ1 = µ2, H03: µ1 = µ3, H04: µ1 = µ3. The
alternative hypotheses for these new null hypotheses are HA2: µ1 ≠ µ2, HA3: µ1 ≠ µ3, HA4: µ2 ≠ µ3.
To test the hypotheses, we decide to use the Related-Samples Friedman’s Test, which indicates no statistical
difference between the ranks for the three model types. The above null hypotheses are not rejected, and all
the models are equal. We double click on the SPSS output to obtain the graph showing the ranks for the
three different default models just to add some visual evidence to our conclusion of no significance. You can
confidently inform the bank president that there is no difference in the three default models.
Page 104 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
18.1 A corn farmer is interested in reducing the number of days it takes for his corn to silk. He has
decided to set up a controlled experiment that manipulates the nominal variable “fertilizer,” having
two categories: 1 = limestone and 2 = nitrogen. Another nominal variable is “soil type,” with two
categories: 1 = silt and 2 = peat. The dependent variable is scale and is the “number of days until
the corn begins to silk.” The data are given in Review Exercise 18.1; note that they must be entered
Page 105 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
into Data View in one continuous string of 40 cases. Once the data are entered, you must look for
significant differences between the four study groups. You will also look for any interaction effects
between fertilizer and soil and any influence they may have on the number of days to the showing
of silk. Write the null and alternative hypotheses, select and conduct the correct test(s), and interpret
the results. The data for this exercise or the entire dataset may be downloaded from the companion
website at prob_18.1_corn_farmer.sav.
Answer:
Since we have two independent nominal variables (each having two levels) and a dependent variable
measured at the scale level, we should think of a two-way ANOVA. The first thing we do is check to see if the
values of the dependent variable approximate the normal distribution. The Kolmogorov–Smirnov test confirms
that it is normally distributed.
The next table tells us that the error variances of the dependent variable are equal across groups since we fail
to reject the null of equality (Sig. is .275). This adds to our confidence that our findings can be taken seriously,
since this is one of the requirements for our test.
Page 106 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
We next write the null and research hypotheses for both main effects:
Null for fertilizer type is H01: µnitrogen = µlimestone; Alternative is HA1: µnitrogen ≠ µlimestone.
Null for soil type is H02: µpeat = µsilt; Alternative is HA2: µpeat ≠ µsilt.
Proceed with the two-way ANOVA, and make sure you request Descriptive Statistics, as they are especially
useful in learning what is happening with your data. You are interested in interpreting the mean values in the
table in a way that will inform you about the main effects of soil type (peat and silt) and fertilizer (nitrogen and
limestone) on the number of days it takes corn to silk. The interaction effects can also be interpreted from this
table. The values of 59.9 and 58.1 answer the question about the main effect of fertilizer. (For significance,
look at the next table. For the main effect of soil type, look for a significant difference between 62.4 and 55.6.
For the interaction effect, look at the same table.)
Page 107 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
We fail to reject the null for fertilizer and reject the null for soil type; therefore, we have evidence that the soil
type does affect the days before corn silks. Silt soil significantly reduces the time to silk, which is a positive
outcome for this corn farmer. There is also an interaction effect between fertilizer and soil type.
18.2 A psychologist had the idea that different types of music and room temperature would influence
performance on simple math tasks. She had two independent variables measured at the nominal level: (1)
“music type,” hard rock and classical, and (2) “room temperature,” comfortable and hot. The dependent
variable was a series of minimally challenging mathematical problems that were scored on a 0 to 100 scale.
She randomly selected 24 students and then once again randomly assigned them to one of four groups. Your
Page 108 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
task is to select and then do the correct test, write the null and alternative hypotheses, and then interpret
the results. Was there any significance change on task performance as a result of music type or room
temperature, or did these two variables act together to cause change? The table in Review Exercise 18.2
presents the results for the 24 students. These data may also be downloaded from the companion website at
prob_18.2_music_temp.sav.
Answer:
Since we have scale data for the dependent variable and multiple independent nominal variables (two or
more), we select the two-way ANOVA procedure. The Kolmogorov–Smirnov test indicates that the data are
normally distributed. Levene’s Test for Equality of Error Variances indicates that we can continue with the
analysis.
Null for music type H01: µhard rock = µclassical; Alternative is HA1: µhard rock ≠ µclassical.
Null for room temperature H02: µcomfortable = µhot; Alternative is HA2: µcomfortable ≠ µhot.
The Descriptive Statistics table informs us that we will look for significance between 82.17 and 76.83 to either
confirm or deny a statistically significant main effect of music type. Any main effect from room temperature
will be identified by comparing 83.42 and 75.58. Interaction will be indicated in the Tests of Between-Subjects
Effects table. Results indicate that no significance was found for any comparison, and the nulls remain in
force. There was no evidence generated to support the psychologist’s idea that math performance would be
influenced by room temperature and/or type of music. Post hoc analysis is not necessary.
Page 109 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 110 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
18.3 The inspector general for a large state’s motor vehicle department decided to collect some data
on recent driving tests. The idea was to see if scores on the driving test (dependent scale variable)
were significantly different for male and female (nominal independent variable) instructors. He also
wanted to know if the time of day the test was given might also influence the scores. He first randomly
picked two instructors and then collected data on recent tests they had administered. Time of day
that the test was given was categorized as either early morning or late afternoon (the second nominal
independent variable). He decided to randomly select six morning and six afternoon tests for each
of his picked instructors. In the end, he had four unique groups consisting of six test takers each.
You must write the null and alternative hypotheses and then select the correct test, interpret the
results, and answer the inspector’s questions. The data are given in Review Exercise 18.3 or may be
downloaded from the companion website at prob_18.3_drive_test.sav.
Answer:
We note that there are two nominal independent variables and one dependent scale variable. It is the perfect
situation for a two-way ANOVA procedure. We first check the values of the dependent variable (scores on a
test) for normality, and the Kolmogorov–Smirnov confirms its normality.
Page 111 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Null for instructor’s gender is H01: µSusan = µTom; Alternative is HA1: µSusan ≠ µTom.
Null for time of day is H02: µmorning = µafternoon; Alternative is HA2: µmorning ≠ µafternoon.
The Descriptive Statistics table informs us that we will look for significance between 69.67 and 78.08 to
either confirm or deny a statistically significant main effect of gender. Any main effect from time of day will be
identified by comparing 78.17 and 69.58. Interaction will be indicated in the Tests of Between-Subjects Effects
table. Results indicate that there was significance found for the main effects of both gender and time of day.
There was also significance identified for the interaction between gender and time of day. We reject both of
the null hypotheses, and the interaction between gender and time of day was also significant. We now have
statistical evidence to support the inspector general’s idea that the factors of “gender of the instructor” and
“time of day” influence the driving test scores.
Page 112 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
19.1 A metallurgist has designed a way of increasing the strength of steel. She has discovered
a chemical that is added to samples of molten metal during the manufacturing process that have
already been measured for strength. These pre-additive values are recorded as the variable called
“preadd” in the data table shown in Review Exercise 19.1. She believes that the “preadd” values may
influence the “postadd” measure of the steel’s strength. She is looking for significant differences in
strength for the four different manufacturing methods. If differences are found, she wishes to identify
which ones contribute to the overall significance. Can you help her select the correct statistical
procedure? She also needs help in writing the null and alternative hypotheses. The data can also be
downloaded from the companion website at prob_19.1_metallurgist.sav.
Answer:
You might initially think of the one-way ANOVA as you have two scale variables and one nominal and you are
looking for differences in groups. However, the scientist stated that she suspected that the “preadd” values
may have an undue influence on the “postadd” values. This suggests that she needs some method to control
for any influence of the “preadd” variable—the ANCOVA seems to fit this requirement. Let’s begin by checking
the distributions for normality and then the homogeneity of regression slopes. The Kolmogorov–Smirnov
test finds both variables normally distributed. The initial run of the ANCOVA procedure is to check for the
homogeneity of regression slopes, which indicates that we could then proceed with the main test.
Before starting the analysis, we write the null and alternative hypotheses as follows. We state the null
hypothesis as H01: µA = µB = µC = µD. We state the alternative hypothesis as HA1: One or more of the four
If the null hypothesis (H01) is rejected, the researcher then wishes to identify which of the four groups are
H02 : μA = μB, H03 : μA = μC, H04 : μA = μD, H05 : μB = μC, H06 : μB = μD, H07 : μC = μD.
The alternative hypotheses for these new null hypotheses are as follows: HA2: µA ≠ µB, HA3: µA ≠ µC, HA4:
Page 113 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Running the analysis, we see the Descriptive Statistics table showing small differences between the four
methods used in the steel production process. It remains to be seen if these differences are significant. The
results of the first ANCOVA, shown in the first Tests of Between-Subjects Effects table, indicate no interaction
(see Sig. of .331); thus, we have homogeneous regression slopes—we may proceed.
Our second Tests of Between-Subjects Effects table shows an overall significance of .043. The Levene’s test
indicated a value for F of .632 and a significance of .603; once again homogeneity of regression slopes was
found. Thus, we next examine the Pairwise Comparisons table and find that A & B and A & C are significant.
Thus, we have rejected the overall null and those for A & B and A & C. We can advise the scientist that the
method used to manufacture these samples had a significant impact and that there was significant difference
between methods A & B and A & C.
Page 114 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 115 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
19.2 A botanist measured the 3-day growth, in inches, of his marijuana plants at two different
Page 116 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
times (variables: “pregrowth” and “postgrowth”) under four different growing conditions (variable:
“peatsoil”). He felt that the initial growth rate influenced the second rate of growth. The scientist’s
main concern was the effect of soil type on growth rate during the second growth period. The problem
was that he somehow wanted to statistically account for any differences in the second growth period
that might be related to the first rate of growth. His ultimate quest was to identify any significant
differences in the four samples that were grown in soils containing different percentages of peat.
Select the correct statistical method, write the null and alternative hypotheses, do the analysis,
interpret the results, and answer the botanist’s questions. The data may be found in Review Exercise
19.2 or downloaded from the companion website at prob_19.2_marijuana.sav.
Answer:
The botanist has two scale variables and one categorical variable. One of the scale variables (“pregrowth”) is
thought to have an effect on the second (“postgrowth”) that needs to be statistically controlled. The ANCOVA
is the logical choice for analysis, but we first use the Kolmogorov–Smirnov to test for normality, and we
find that both scale distributions pass. We next test for the homogeneity of regression slopes, and the test
shows that we fail to reject the null of no interaction (Sig. =.332), which provides evidence of homogeneous
regression slopes. The Descriptive Statistics table shows means that look pretty much alike—but we must
test for significance.
We next write the null and alternative hypotheses as follows. We state the null hypothesis as H01: µ1 = µ2
= µ3= µ4. We state the alternative hypothesis as HA1: One or more of the four groups have mean inches of
If the null hypothesis (H01) is rejected, the researcher then wishes to identify which of the four groups are
different and which are equal. The following additional null hypotheses will facilitate that task: H02: µ1 = µ2,
H03: µ1 = µ3, H04: µ1 = µ4, H05: µ2 = µ3, H06: µ2 = µ4, H07: µ3 = µ4.
The alternative hypotheses for these new null hypotheses are as follows: HA2: µ1 ≠ µ2, HA3: µ1 ≠ µ3, HA4:
The main table of interest is Tests of Between-Subjects Effects, which shows whether the percentage of peat
in the soil had a significant effect on the rate of growth during the second growth period. The significance level
of .048 (5th table below) permits us to the reject the null (H01: µ1 = µ2 = µ3 = µ4) of equality of all groups;
Page 117 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
therefore, we have evidence that soil type did affect the rate of growth of the marijuana plants. Levene’s test
also shows significance with a value of .498, providing evidence of equality of variances. We can investigate
further by an examination of the Pairwise Comparisons table, which indicates significant differences for the
12% & 16% and 14% & 16% peat groups. Therefore, we can also reject the null hypotheses H06: µ2 = µ4
Page 118 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 119 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
vaccinations and cognitive ability. He obtained records on randomly selected children who had
received three levels of vaccinations during their first year of life. He randomly placed them in three
groups defined by rates of vaccination (“vaccinated” is the nominal variable), where 1 = high, 2 =
low, and 3 = none. The children had been tested for cognitive ability at 5 years of age (“precog” is
the scale variable) and again at 10 years of age (“postcog” is another scale variable). The scientist’s
main reason for conducting the investigation was to search for any differential effects that the levels
of vaccination might have on the children’s cognitive ability. However, he was concerned about the
potential effect that the “precog” scores might have on the “postcog” values. His major research
question was whether the three levels of vaccination affected the children’s cognitive ability at 10
years of age.
Help this scientist pick the appropriate statistical test that would offer a way to control for differences in the
“precog” values. Write the null and alternative hypotheses, run the analysis, interpret the results, and answer
his questions. This dataset may be found in Review Exercise 19.3 or may be downloaded from the companion
website at prob_19.3_vaccinations.sav.
Answer:
The epidemiologist/psychologist has collected the data from prior observations. This makes it impossible
to alter the research design in a way to control for differences in cognitive ability at age 5. He needs to
consider using ANCOVA, which could statistically control for any earlier differences in cognitive ability. He
has two scale variables and one nominal, which qualifies it for the ANCOVA approach. If the distributions
of cognitive ability scores approximate the normal and if the regression slopes are homogeneous, then we
can use the ANCOVA procedure. We run the Kolmogorov–Smirnov test and confirm that the “precog” and
“postcog” values are normally distributed. The customized run of the ANCOVA analysis indicates that the null
hypothesis of no interaction cannot be rejected; therefore, we have evidence that the regression slopes are
homogeneous and we can proceed with the ANCOVA approach.
We next write the null and alternative hypotheses as follows. We state the null hypothesis as H01: µ1 = µ2
= µ3. We state the alternative hypothesis as HA1: One or more of the three groups have mean cognitive
If the null hypothesis (H01) is rejected, the researcher then wishes to identify which of the three groups are
different and which are equal. The following additional null hypotheses will facilitate that task: H02: µ1 = µ2,
Page 120 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
H03: µ1 = µ3, H04: µ2 = µ3. The alternative hypotheses for these new null hypotheses are HA2: µ1 ≠ µ2,
The next SPSS output examined is the Descriptive Statistics table. Looking at the means gives us a hint
that we might find significant differences in the cognitive ability of children and the level of vaccinations. The
next table, showing the results of Levene’s Test of Equality of Error Variances, indicates that confidence is
increased in any significant levels that may be identified. This is because a significance level of .227 results
in the decision not to reject the null of equal variances across all groups on values of the dependent variable.
The table Tests of Between-Subjects Effects indicates that the overall null hypothesis for vaccination (H
01: µ1 = µ2 = µ3) can be rejected. This finding provides evidence that there is a difference in children’s
cognitive ability and their level of vaccine exposure. We examine the table titled Pairwise Comparisons and
find significant differences in cognitive ability between groups of children in the high and none categories of
vaccination. Also, we find statistically significant differences for the low and none categories of vaccination
levels. We reject H03: µ1= µ3 and H04: µ2= µ3. We can inform the epidemiologist/psychologist that, for this
sample of children, the evidence supports the idea that vaccines negatively affect children’s cognitive ability
at age 10.
Page 121 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 122 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
20.1 A medical biologist was studying three bacteria that were known to be equally present in
samples taken from the healthy human digestive system. We shall label them as A, B, and C. The
transformed numbers of bacteria observed by the scientist were A = 3,256, B = 2,996, and C = 3,179.
The question is whether the difference from the expected shown in these values qualifies as being
Page 123 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
statistically significant. Write the null and alternative hypotheses, input the data, run the analysis, and
interpret the results. The data are given in Review Exercise 20.1 or may be downloaded from the
companion website at prob_20.1_medical.sav.
Answer:
You are given the frequencies of three categories (A, B, and C), and you know the frequencies should be
equally distributed among the categories but you suspect that they are not. You select a chi-square goodness-
of-fit test, and you want to test whether there are an equal number of the three bacteria dispersed in the
sample. The null hypothesis (H0) states that they are equally distributed in the sample. The alternative
hypothesis (HA) states that they are not equally dispersed. The data are inputted as two variables, “bacteria”
and “frequency.” The test results indicate a chi-square value of 11.347 at 2 degrees of freedom and a
significance level of .003. You reject the null, and you now have evidence that the bacteria are not equally
distributed in this sample.
Page 124 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
20.2 For this problem, you will open the SPSS sample file telco.sav and use the first variable, named
“region” (labeled Geographic indicator). The variable “region” represents five different zones in which
Page 125 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
the 1,000 cases reside. The researcher believes that the cases are not equally distributed among the
five zones. You are to write the null and alternative hypotheses and then study the variable “region”
in an attempt to develop evidence in support of the researcher’s hypothesis.
Answer:
You are concerned with checking to see how many people live in each of the five different regions and then
determining if they are equally populated. The independent regions represent five categories that contain
frequencies suggesting that chi-square might be a reasonable approach. The null hypothesis (H 0) is that the
cases are equally dispersed among the five geographic regions. The alternative hypothesis (HA), which is the
researcher’s idea, is that the people are not equally dispersed among the five regions. Open the dataset, and
then use the nonparametric one-sample test approach in the Fields tab; remember to customize the test and
specify chi-square test. The results indicate that you fail to reject the null as you have a Sig. of .695. There is
now statistical evidence that the cases are in fact equally dispersed among the five regions.
20.3 A high school principal was concerned that several of her teachers were awarding “A” grades
at vastly different rates—she had heard many complaints from students and parents. She compiled
the data on the teachers and decided to see if there was in fact a statistically significant difference.
Her first look at the grades gave her the idea that there was indeed a difference. Can you write the
null and alternative hypotheses, select the appropriate statistical test, conduct the analysis, and then
interpret the results? The data are as follows: the teacher’s name, with the number of “A” grades
given in parentheses: Thomas (6), Maryann (10), Marta (12), Berta (8), and Alex (10). The data can
Page 126 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
be inputted as shown in Review Exercise 20.3 or can be downloaded from the companion website at
prob_20.3_school_grades.sav.
Answer:
We have five categories (the five teachers) and frequencies (number of “A” grades awarded by each of the
five teachers), and we wish to determine if the numbers of “A” grades are equal in the five categories. This is
a one-sample goodness-of-fit chi-square test. The principal believes that the teachers do not award an equal
number of “A” grades; therefore, this becomes the alternative hypothesis. The alternative hypothesis (HA) is
that the grades are not equally dispersed among the five teachers, and the null hypothesis (H0) is that the
grades are equally dispersed among the five teachers. We input the data in two columns—one for teacher
and one for frequency—then use the weighted measure function found under Data on the Main Menu. Finally,
we use the nonparametric one-sample test to obtain the results. A chi-square value of 2.261 at 4 degrees of
freedom and significance of .688 informs us that we fail to reject the null hypothesis. Therefore, we conclude
that any grading differences are simply attributable to chance—the principal should not be overly concerned
with the complaining students and parents.
Page 127 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
21.1 A community activist believed that there was a relationship between membership in the police
SWAT Team and prior military experience. He collected data from several police departments in an
effort to support his belief. He found that there were 57 members of the SWAT team with prior military
experience and 13 members with no prior military service. There were also 358 police personnel who
had military experience but were not members of SWAT and another 413 with no military experience
and not members of SWAT. You must write the null and alternative hypotheses, select the correct
statistical method, do the analysis, and interpret the results. The activist’s data can be found in
Review Exercise 21.1 and may be downloaded from the companion website at prob_21.1_swat.sav.
Page 128 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Answer:
The activist believes that individuals having prior military experience tend to seek out and join the police
department’s SWAT team. He is basically saying that there is a relationship between military experience and
SWAT team membership—this is the alternative hypothesis (HA). The null hypothesis (H0) is that there is no
relationship between prior military experience and SWAT team membership. We have frequency data and
four categories; therefore, a logical choice for analysis is the chi-square test of independence. Input the data
as three variables—“military experience,” “SWAT,” and “numbers”—in each of the four categories (mil + swat),
(no mil + swat), (mil + no swat), and (no mil + no swat). Run the data using Analyze/Descriptive/Crosstabs,
then click Statistics and Options. Interpretation of the chi-square value of 31.442 at 1 degree of freedom
and significance of .000 informs us that we must reject the null hypothesis of independence. The community
activist has statistical evidence to support his idea that there is a relationship between membership in the
SWAT team and prior military experience.
Page 129 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
21.2 For this exercise, you will open the SPSS sample file customer_dbase.sav and determine
if there is a relationship between gender and the size of their hometown for these 5,000 bank
customers. The bank official conducting the research believes that “size of hometown” is definitely
related to “gender.” Your task is to assist the bank official in uncovering evidence in support of his
belief. Write the null and alternative hypotheses, select the appropriate statistical method, conduct
the analysis, and interpret the results.
Answer:
You have categorical data consisting of the two variables: “size of hometown,” which has five categories,
and two categories for “gender.” The sample consists of 5,000 cases, which must be subdivided into gender
and size of hometown. Once this is done, you must determine the expected number in each category and
then determine if the difference is significant. Begin by writing the null and alternative hypotheses, which will
serve as a guide for your work. The null hypothesis is H 0: Gender is independent of (not related to) the size
of one’s hometown. Another way to state the same thing is that the numbers of males and females coming
from hometowns of different sizes are the same. The alternative hypothesis is H A: Gender and the size of
hometown are related. That is, the numbers of females and males coming from different-sized hometowns are
not the same. Remember that the alternative hypothesis is the bank official’s belief and you are attempting to
develop evidence to support his belief.
You select the chi-square test of independence and use Analyze/Descriptive/Crosstabs to generate the
Crosstabulation table and request a chi-square statistic and expected values for each of the 10 categories.
Page 130 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
The calculated chi-square of 3.021 at 4 degrees of freedom has the significance level of .554 and informs
you that the null cannot be rejected. You must inform the bank official that there is no statistical evidence of a
difference in the numbers of females and males originating from different-sized hometowns—they are equal.
You hope he doesn’t get angry, as this is really important to him. You decide not to tell him until tomorrow and
go home and have a single-malt scotch.
Page 131 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
21.3 A nutritionist was developing an educational program to encourage healthy eating and was
seeking evidence to support her belief that males and females do not consume the same amount of
vegetables. She conducted a survey that categorized people by gender and whether they consumed
low, medium, or high amounts of vegetables. The numbers for males were as follows: low = 29,
medium = 21, and high = 16. The numbers for females were as follows: low = 21, medium = 25, and
high = 33. Write the null and alternative hypotheses, select the correct test, do the analysis (include
percentages for all categories), and interpret the results. The data can be found in Review Exercise
21.3 or downloaded from the companion website at prob_21.3_nutritionist.sav.
Answer:
You have categorized the data with simple counts in each category. You wish to determine if the counts in
the categories are equal or unequal. The investigator suspects that they are unequal; therefore, you seek
evidence to support this contention. Let’s begin by writing the null and alternative hypotheses. The null is
H0: Gender is independent of (not related to) the amount of vegetables consumed. You could also say that
males and females consume the same amount of vegetables. The alternative hypothesis is HA: Gender
and the amount of vegetables consumed are related. Or females and males consume different quantities of
vegetables.
We choose chi-square analysis since we have frequency data in independent categories. The data are
inputted as three variables—“gender,” “vegetable consumption,” and “frequency”—in each of the six
categories (male + low veg = 29), (male + med veg = 22), (male + high veg = 16), (female + low veg =
21), (female + med veg = 25), and (female + high veg = 33). Use Analyze/Descriptive/Crosstabs, then click
Statistics for chi-square test and the Options tab to request all categorized percentages.
Interpretation of the chi-square value of 6.412 at 2 degrees of freedom and significance of .041 tells us that
we can reject the null hypothesis of independence. The nutritionist has statistical evidence to support her idea
that there is a relationship between gender and quantity of vegetables consumed. She may now proceed with
the development of her educational program better informed about her target audience.
Page 132 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 133 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
22.1 Assume you have collected a random sample of first-year students at a local community college
and given them a general survey that included a number of items. A series of questions results in self-
esteem ratings, and part of their official record includes their IQ. You want to calculate a correlation
coefficient for these two variables including a significance level and then chart the results and add
the Fit Line. Select the correct correlation coefficient, write the null and alternative hypotheses,
and interpret the results. The SPSS Data View is given in Review Exercise 22.1 and may also be
downloaded from the companion website at prob_22.1_self_esteem.sav.
Answer:
We first test both distributions using the Kolmogorov–Smirnov nonparametric one-sample test and determine
they are both normal. The null hypothesis is H0: ρ = 0, and the alternative is HA: ρ ≠ 0.
We use Spearman’s correlation and find a moderate to strong coefficient of .599 with a significance level
of .04. We now have evidence that the population correlation is not equal to 0, and we may take our value
seriously. The chart you build also shows a definite positive relationship between these two variables.
Page 134 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
22.2 Let’s say you live on a little-used back road that leads to the ski slopes. Over the years, you
have noticed that there seems to be a correlation between the number of inches of snowfall and
Page 135 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
traffic on your road. You collect a random sample of daily snow levels on the mountain and match
these with auto traffic. You now wish to analyze the data using correlation and a test of significance.
You also wish to visualize the data on a graph that includes a Fit Line. Write the null and alternative
hypotheses, calculate the coefficient and the significance level, and then build the graph. The SPSS
Data View is given in Review Exercise 22.1 and may also be downloaded from the companion
website at prob_22.2_back_road.sav.
Answer:
You should first test both distributions using the Kolmogorov–Smirnov nonparametric one-sample test to
see if they are normally distributed. The test provides evidence that both snowfall and number of cars are
normally distributed, so you may proceed with the calculation of the Pearson’s correlation coefficient. The null
hypothesis is H0: ρ = 0, and the alternative is H A: ρ ≠ 0.
Pearson’s correlation finds a strong positive correlation coefficient of .92, with a significance level of .000.
We now have evidence that the population correlation is not equal to 0, and we may take the value of
.92 seriously. You reject the null that the population correlation coefficient is equal to 0. The chart you built
provides visual evidence in support of the hypothesis that there is a strong positive relationship between
inches of snowfall and car traffic on the back road.
Page 136 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
22.3 Assume you own a furniture store and you decided to record a random sample of rainy days
and the number of patrons on those days. Calculate the correlation coefficient, build a graph, and
Page 137 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
test the numbers for significance. The SPSS Data View is given in Review Exercise 22.3 and may
also be downloaded from the companion website at prob_22.3_furniture_rain.sav.
Answer:
We test the rainfall and number of patrons for normality using the Kolmogorov–Smirnov one-sample test and
find that they are both normally distributed. We may proceed with the calculation of the Pearson’s correlation
coefficient. The null hypothesis is H0: ρ = 0, and the alternative is HA: ρ ≠ 0.
Pearson’s correlation finds a strong negative correlation coefficient of –.979, with a significance level of .000.
The null hypothesis is rejected. We have statistical evidence that the population correlation coefficient is not
equal to 0, and we may take the value of –.979 seriously. The bivariate chart agrees with the correlation
calculation and shows a strong negative relationship between inches of rain and the number of patrons
coming to the store. More rain equals fewer patrons.
Page 138 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 139 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
23.1 Can you help the manager of a senior citizen center at the local library determine if there was
any merit to her idea that the patron’s age and the number of books checked out were related? Her
thought was that as an individual got older, more books would be checked out. She would like to be
able to predict the number of books that would be checked out by looking at a person’s age. She
selected a random sample of 24 senior patrons and collected details of the age and the number of
books checked out during a 4-week period. Assist the manager by selecting the correct statistical
approach, write the null and alternative hypothesis, conduct the analysis, and interpret the results.
The SPSS Data View, shown in Review Exercise 23.1, presents the data, including the predicted
values, or it may be found at the companion website at prob_23.1_library.sav.
Answer:
Correlation and regression seems to be the best approach to answer the manager’s questions. The manager
has two variables (“age” and “number of books”) that are measured at the scale level, and the first question is
whether they both approximate the normal curve. The Kolmogorov–Smirnov test determines that the variables
are normally distributed. Other data assumptions are checked as we proceed through the analysis. Since we
have one independent variable, we decided to use simple linear regression.
The dependent variable is “number of books checked out”; in the Statistics window you need estimates and
model fit, and for Plots, you will need ZPRED and ZRESID. The results of both plots are satisfactory in that
the ZPRED values are close to the diagonal and the ZRESID values are scattered uniformly on the graph.
The Model Summary table indicates that 86% of the variance in the number of books can be accounted for
by using the model.
The ANOVA table shows an F value of 170.015 that is significant at .000, providing additional evidence that
our model has merit.
The final table, titled Coefficients, provides the intercept and slope needed to write the regression equation.
The Transform/Compute Variable feature can then be used to provide a prediction for all ages. The equation
used is –52.876 + (.876 * age); these results are shown in the table in Review Exercise 23.1.
The null hypothesis that “age” has no influence on the “number of books checked out” can be rejected. Thus,
Page 140 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
we now have evidence in support of the alternative hypothesis (the manager’s idea) that age may directly
influence the number of books checked out at the library. We can advise the senior center manager that her
idea has merit and that she now has a useful prediction equation.
Page 141 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 142 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 143 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 144 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
23.2 An economist at a university was studying the impact of crimes committed with guns on local
economies. Part of his study sought information on the relationship between the number of gun
control measures a lawyer/legislator introduced and his score awarded by the state bar on his
knowledge of constitutional law. His idea was that low-scoring lawyers would introduce more gun
control laws. He wished to quantify the strength and direction of any relationship and also see if the
number of laws introduced could be predicted by knowing the legislator’s constitutional law rating.
One specific value he wished to predict was the number of laws introduced by the average score of
76, a value not directly observed in the data.
His research has thus far shown that gun control laws have a negative impact on local economies. The
researcher selected a random sample of lawyers elected to office and then compiled public information on
the two variables of interest (“gun control” and “state bar rating”). As a consulting statistician, your task is to
select the correct statistical method, write the null and alternative hypotheses, do the analysis, and interpret
the results. The SPSS Data View, shown in Review Exercise 23.2, presents the data, including the predicted
values, or they can be downloaded from the companion website at prob_23.2_gun_crime.sav.
Answer:
It appeared that since the economist had two scale variables and was interested in correlation-type questions,
simple linear regression might be a good place to start. The first check is to use Kolmogorov–Smirnov to
test variables for normality—both distributions are normal. To satisfy additional data assumptions, you need
to generate plots for error residuals and for homoscedasticity. In the Statistics window, you must request
Page 145 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Estimates, Model Fit, and also ZPRED and ZRESID for the plots. The plots look good since the values are
aligned fairly well with the diagonal of the P-P Plot and are uniformly scattered on the residual plot.
According to the Model Summary, 80% of the variance in the number of gun laws that are introduced is
accounted for by the level of understanding of constitutional law by the legislator.
In the ANOVA table, we find an F value of 91.068, which is significant, providing evidence that our model is
sound and will provide useful information.
The Coefficients table gives us the intercept (a) and slope (b), which we need to write the prediction equation.
We use the Transform/Compute Variable feature to generate predicted values, which can then be compared
with the observed data. The equation used is derived from the Coefficients table and is = 12.929 + (–.121
× const_score), and when used with the transform data feature it results in the predicted values as shown
in the Data View table in Review Exercise 23.2. The equation used to create a new predicted value when x
= 76 is 12.929 + (–.121 × 76). You can use a handheld calculator to arrive at the value of 3.794 (rounded
to four laws introduced). SPSS will also calculate this value by using the Transform/Compute Variable. Just
substitute 76 for const_score when entering the prediction equation into the Numeric Expression box. If you
choose the SPSS approach you must change the name placed in the Target Variable box in the Compute
Variable window as the new value will be created for all cases. Name it one-value and then click OK. The
result is a column of the predicted number of laws, 3.73 (round to 4) that would be introduced by someone
with a constitutional knowledge score of 76.
The null hypothesis that a score on the constitutional law rating has no influence on the number of laws
introduced is rejected. There is evidence that supports the alternative hypothesis that the score on the
knowledge of constitutional law may influence the number of gun laws a legislator introduces.
Page 146 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 147 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 148 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 149 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
23.3 A deacon at St. Joseph the Worker Church had the theory that attendance at formal church
services was a good indicator of the number of hours an individual would volunteer for church
functions. He randomly selected 12 individual volunteers and collected the required information. The
deacon wanted to measure the strength and direction of any association. He also wanted a method
whereby he might predict the number of hours volunteered by a person who attends church on
average four times per month. Since you are an active volunteer and a student of statistics, he asked
for your help. You have to select the appropriate statistical technique, write the null and alternative
hypotheses, do the analysis, and interpret the results. The data are presented in Review Exercise
23.3 or downloaded from the companion website at prob_23.3_st_joseph.sav.
Answer:
The deacon has two scale variables (“number of times of church attendance” and “hours and volunteer
work”) and wanted to measure their relationship. It was a basic correlation problem, but to make the
prediction, we need to apply linear regression. We select simple linear regression since there is only one
independent variable: the number of times the parishioner attends formal services per month. Using the
Kolmogorov–Smirnov test, we determine that both the independent and the dependent variables are normal.
We also use SPSS to calculate values that will tell us whether the other assumptions required of the
regression model are met. In the Statistics window for regression, we click Estimates, Model Fit, and also
ZPRED and ZRESID. We generate plots for error residuals and homoscedasticity information. The plots
indicate that the data are okay for regression. The scatterplot could be better if it showed a more uniform
distribution, but it is acceptable for this very small sample.
Page 150 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
The Model Summary shows that 74% of the variance in the number of volunteer hours is accounted for by the
number of formal church services attended each month. In the ANOVA table, we find an F value of 29.096,
which is significant (.000), which tells us that we can have confidence in our model.
The Coefficients table gives us the intercept and slope, which are required to make the predictions desired by
the deacon. We use the Transform/Compute Variable feature to generate predicted values, which can then
be compared with the observed data. The equation used to create a new predicted variable when x = 4 is
1.396 + (1.257 * 4). If we insert the average church attendance of 4 times a month into the equation and use
a handheld calculator, we get approximately 6 hours of volunteer work per month. Note: You may also use
SPSS to do the calculation as described in the answer to Review Exercise 23.2.
The null hypothesis that the level of church attendance has no relationship with the number of volunteer hours
is rejected. The deacon now has evidence that supports the alternative hypothesis that there is a relationship
between his two variables. He also has, at his disposal, a prediction equation that can be used to predict the
number of volunteer hours if church attendance figures can be obtained.
Page 151 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 152 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 153 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 154 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
24.1 This exercise is an extension of the senior center manager’s problem in the previous chapter
(Review Exercise 23.1). You may recall that the manager developed a prediction equation that
estimated the number of books checked out at the library using the “patron’s age” as the single
independent variable. For the current exercise, used to illustrate multiple linear regression, we add
a second independent variable—“total years of education.” Using the single variable, the model
developed was able to account for 86% of the variance in the number of books checked out.
Although the senior center manager was happy with that result, she wishes to add “total years of
education” in the hope of improving her model. The manager wants to use a new equation (using two
independent variables) to make predictions and then compare those predictions with the observed
data to see how well it works. She also wishes to predict the number of books checked out by
someone aged 63 with 16 years of education, which was not directly observed in her data. Use
multiple linear regression, write the null and alternative hypotheses, conduct the analysis, write the
prediction equations, make the predictions, and interpret the results. Her data are presented in the
Page 155 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
SPSS Data View in Review Exercise 24.1 or may be downloaded from the companion website at
prob_24.1_library_books.sav.
Answer:
If we use multiple linear regression, we must first check the three variables for normality. The
Kolmogorov–Smirnov test confirms that they are normally distributed, and we may proceed. We basically
follow the same steps taken in simple regression. However, we now move both independent variables (“age”
and “education”) to the Independent Variables box. In the Statistics window, we check Estimates and Model
Fit. In the Plot window, we move ZPRED to the Y box and ZRESID to the X box, while checking the box next
to Normal probability plot. This takes care of the initial output; later there will be more for the prediction part
of this exercise.
The ZPRED values are close to the diagonal, and the ZRESID values are scattered uniformly but with a slight
negative skew. The skew is not surprising with the small sample size, and we decide to proceed with the
regression analysis. The Model Summary table indicates that 93.8% of the variance in the number of books
can be accounted for by our model. The ANOVA table shows an F value of 48.882 (significance level is .000),
indicating that our model is statistically sound.
As with our simple regression, the Coefficients table is needed for the prediction part of this exercise. This
table provides the intercept and slopes needed to write the multiple regression equation. As before, the
Transform/Compute Variable function is used to provide the required predictions. The equation used for all
cases is –35.443 + (.876 * age) + (.374 * education). The first four cases of this analysis are shown in the
transformed Data View below. The results are very encouraging in that the predicted values for all cases are
very close to the observed. The specific case that the manager requested was when someone aged 63 and
with 16 years of education was entered into the equation—this resulted in a prediction of four books being
checked out in 1 month.
The null hypothesis that age and level of education are not related to the number of books checked out can
be rejected. Thus, we have evidence to support the alternative hypothesis that age and level of education are
related to the number of books checked out at the library. We can advise the senior center manager that her
idea is sound and that she has a useful prediction equation.
Page 156 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 157 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 158 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 159 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
24.2 This problem is based on the simple regression you did in the previous chapter (Review
Exercise 23.2). We just added another variable called “freedom index” to demonstrate an example
Page 160 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
of multiple regression. You now have two independent variables (“constitutional law score” and
“freedom index”) and one dependent variable that counts the “number of laws introduced” by the
legislator.
The political consultant wants to determine if the scores on knowledge of constitutional law and score of the
freedom index are related to the number of gun control laws introduced. He also wishes to extend any findings
into the realm of prediction by using regression to estimate the number of laws introduced by a legislator
rated average on both these independent variables. He also wishes to use the equation to predict values
that can then be directly compared with the observed values. Use multiple linear regression, write the null
and alternative hypotheses, conduct the analysis, write the prediction equations, make the predictions, and
interpret the results. His data are presented in the SPSS Data View in Review Exercise 24.2 or may be
downloaded from the companion website at prob_24.2_gun_law.sav.
Answer:
Checking the three variables using the Kolmogorov–Smirnov test, we find three normal distributions. The
Model Summary indicates that 83.7% of the variance in the number of gun control measures introduced can
be accounted for by the two independent variables. The ANOVA table reports an F value of 53.865, which
is significant at .000. This indicates that our overall model is significant and that the amount of variance
accounted for (83.7%) can be taken seriously. The normal P–P Plot of standardized residuals could be better,
but it is within tolerance when taking into consideration our sample size. The plot for homogeneity of variances
is fine, so we have additional confidence of our model.
We next turn to the Coefficients table for the values needed for our prediction equation. The intercept is
13.243, the slope for “constitutional law score” is –.101, and the slope for “freedom index” is –.088. We insert
these values into the Compute Variable window as 13.243 + (–.101 * const_score) + (–.088 * freeindex).
Once OK is clicked, the new variable (named “predgunlaw”) is automatically inserted into your dataset. The
predicted values can now be directly compared, which gives you immediate insight into the value of the
equation. Next, we can insert the value of particular interest to the political consultant, which is the value of
the dependent variable for a legislator rated average on both the constitutional law score and freedom index
measures. We do a descriptive analysis and find 75.67 for constitutional law score and 20.63 for freedom
index. Plugging these into the equation, we can either do it with a handheld calculator or use SPSS to arrive
at four gun control measures that are predicted to be introduced by an average-rated legislator.
Page 161 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
The null hypothesis that the “constitutional law score” and “freedom index” are not related to the number of
gun control laws introduced is rejected. There is evidence in support of the alternative hypothesis that these
variables are related. The political consultant has a useful prediction equation that can predict the number of
gun control measures that will be introduced if the legislator’s score on constitutional law and freedom index
are known.
Page 162 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 163 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 164 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
24.3 As we have done in the previous two exercises, we bring forward from the previous chapter a
simple linear regression problem and add an additional variable. In that exercise (Review Exercise
23.3), you had one independent variable, which was “the number of times an individual attended
church during a month.” For this current exercise, you will add another independent variable, which
is “the number of times one prays in a day.” The deacon of the church wants to see if the earlier
prediction equation could be improved by adding this additional variable. As before, he wants to
compare the performance of the new equation with the actual observed values. In addition, he
wishes to predict the number of volunteer hours for those rated as average on the two independent
variables. Use multiple linear regression, write the null and alternative hypotheses, do the analysis,
and interpret the results. The deacon’s new data are shown in the SPSS Data View in Review
Exercise 24.3 or may be downloaded from the companion website at prob_24.3_church_attend.sav.
Answer:
Use multiple linear regression since you have two independent variables. Check all data, independent and
Page 165 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
dependent variables, for normality. The Kolmogorov–Smirnov test shows that all are normally distributed. In
the regression statistics window, click Estimates and Model Fit, and in the Plots window, move ZPRED to
the Y box and ZRESID to the X box. These will generate output that further informs you whether regression is
appropriate for your data. We find that the P–P Plot is just acceptable and the scatterplot excellent; therefore,
we proceed with our regression analysis.
The Model Summary shows that 79.6% of the variance in the number of volunteer hours is accounted for by
the number of church services attended each month and the number of times an individual prays each day.
The ANOVA table shows an F value of 17.54, which is significant (.001) and informs us that the overall model
is good.
The Coefficients table shows an intercept of .874 and a slope for “church attendance per month” of .478
and for “prayers per day” of 1.257. We next use the Transform/Compute Variable feature to calculate the
predicted values, which can then be compared with the observed data. The equation inserted in the Numeric
Expression window is .874 + (.478 * churchattend) + (1.257 * pray). This gives us the predictions that can be
compared with the actual observations. Next, we see what the equation predicts for the average values of 6
(“church attendance per month”) and 4 (“prayers per day”). The equation now becomes .874 + (.478 * 6) +
(1.257 * 4) and predicts that such a person would be expected to volunteer 9 hours per month.
The null hypothesis that the level of church attendance and number of times one prays daily has no
relationship to the number of volunteer hours is rejected. The deacon now has evidence that supports the
alternative hypothesis that there is a relationship between these variables.
Page 166 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 167 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 168 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 169 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
25.1 A major in the Air Force wanted to find a way to predict whether a particular airman would be
promoted to sergeant within 4 years of enlisting in the military. He had data on many characteristics
of individuals prior to enlistment. He chose three variables that he thought might be useful in
determining whether they would earn the early promotion. He selected a random sample of 30
individuals and compiled the required information. Your task is to develop a prediction equation
that might assist the major in efforts to predict early promotion for his young airman. Write the
research question(s) and the null and alternative hypotheses. The major’s variable information and
data are presented in the SPSS Data View in Review Exercise 25.1 or may be downloaded from the
companion website at prob_25.1_air_force.sav.
Answer:
You can identify this problem as requiring a regression method since we are attempting to make a prediction
Page 170 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
based on observed data. Since we have a discrete dependent variable (“promoted to sergeant in 4 years”)
with two categories (yes or no), we should immediately think of binary logistic regression. Levels of
measurement for our independent variables can be either discrete or continuous. Okay, now that we have the
selected statistical approach, we next write the research questions and the null and alternative hypotheses.
Questions: (1) Can we predict whether an airman will be promoted to sergeant within 4 years given the pre-
enlistment data? and (2) If we can accurately predict the outcome, then which of the variables are the most
important? The null hypothesis is the opposite of the research questions and would state that knowledge of
the pre-enlistment variables would be of no help in predicting early promotion to sergeant.
First, let’s check the data for multicollinearity. You want lower correlations between independent variables and
higher correlations between the independent and dependent variables. The pattern observed is acceptable,
and we make the decision to proceed.
The tables under Block 0: Beginning Block show the prediction of the dependent variable with none of our
selected variables in the equation at 76.7% (Classification Table). The Variables in the Equation table shows
significance under these same conditions. The results shown in the Omnibus Tests of Model Coefficients
under Block 1: Method = Enter are much more interesting, and since significance is .001, we now have
evidence that we have a good fit. The Model Summary indicates that between 43% and 64.9% of the
variance in our dependent variable can be accounted for by the model. The Hosmer–Lemeshow test provides
additional support for the model since we fail to reject the null. The chi-square value of 5.677 at 8 degrees of
freedom and a significance level of .683 is further evidence that the model is worthwhile.
The new Classification Table indicates that the predictive power is 90%, an increase of 13.3% from the original
Classification Table. This percentage increase gives us more confidence that our equation will have the ability
to actually make predictions. The final table, Variables in the Equation, adds more evidence that our model
can be useful in predicting the binary outcome. The most desirous result would be for all variables to be
significant. However, the one variable (“test”) showing no significance at .340 could be easily removed from
the equation with no negative impact. Try it and see what happens!
We can now answer the research question and say that we can write the prediction equation. The two
variables that are significant are “sports activity in high school” and “whether the airman had a hunting license
in high school.” The performance on the induction test does not have a positive effect on the equation’s
performance in predicting early promotion to sergeant.
Page 171 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 172 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 173 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
25.2 A social scientist wanted to develop an equation that would predict whether a male student
would be successful in getting a date for the senior prom. The scientist had access to many student
records and took a random sample of 40 students. She chooses four characteristics that she felt
would predict whether a male would get a date or not—a binary outcome. These variables are shown
in Review Exercise 25.2 in the SPSS Variable View. The Label column shows the description of
the variable. Your job is to select the correct statistical approach and then assist the social scientist
in developing the equation. Write the research question(s) and the null and alternative hypotheses.
The variable information and data are given in the table found with Review Exercise 25.2 or may be
downloaded from the companion website at prob_25.2_prom_date.sav.
Answer:
The tip-off as to the type of statistical method needed is that you need a prediction equation—hence,
regression is required. Since you have a discrete dependent variable that has only two possible outcomes
(getting a date or not), you should select binary logistic regression. For this method, you can have a mix of
discrete and continuous independent variables. There are really just two research questions or alternative
hypotheses (HA): (1) Can you predict whether the high school student will get a date for the prom? and (2)
If we can develop a reliable prediction equation, which variables are the most important? The null hypothesis
(H0) is the opposite and states that knowledge of the variables would not help in predicting whether someone
Before using the logistic approach, we check the variables for multicollinearity. We generate a bivariate
Page 174 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
correlation table to see how the independent variables correlate with one another (we want low correlations)
and then separately with the dependent variable (high). The pattern observed is passable, and we decide to
proceed with the analysis.
The first Classification Table under Block 0: Beginning Block shows an 80% prediction with no variables. The
Variables in the Equation table shows a Wald statistic of 12.3 and significance of .000 under these same
conditions. The Omnibus Tests of Model Coefficients under Block 1: Method = Enter shows a significance of
.000, which provides the first evidence of a good fit for our model. The Model Summary shows that between
39.5% and 62.4% of the variance in the dependent variable (getting a date or not) can be accounted for by
the model. The Hosmer–Lemeshow test provides additional support for the model since we fail to reject the
null. The chi-square value of 3.775 at 6 degrees of freedom and a significance level of .707 informs us that
the model is worthwhile.
The Classification Table indicates that the predictive power is now 82.5%, which represents an increase over
the original table. Variables in the Equation adds more evidence that our model can be useful in predicting the
binary outcome. In this table, we see the most desirous result in that all variables are found to be significant.
Thus, we have evidence that all variables contribute to the model for predicting the outcome.
To answer our two research questions, we can say that we developed a prediction equation that made it
possible to predict whether the high school student would get a date for the prom. Furthermore, we found that
the variables had a positive significant impact on our equation.
Page 175 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 176 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 177 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 178 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
25.3 For this review exercise, you will use the SPSS sample file titled customer_dbase.sav. You are
a statistical consultant with a contract to help a phone company executive develop a way to predict
whether a customer would order the paging service. Based on prior experience, the executive feels
that customers using voice mail (“voice”), caller ID (“callid”), and electronic billing (“ebill”) would also
be inclined to utilize the paging service (“pager”). He is seeking statistical evidence and an equation
to support his intuitive feeling. He also wishes to utilize the equation to predict future customers’
behaviors. Select the appropriate statistical method, open the dataset, select the variables, do the
analysis, and then interpret the results.
Answer:
You will need to develop a prediction equation utilizing the method that accommodates categorical data. The
executive wants to predict whether someone will purchase the paging service or not—a categorical outcome.
And to be more specific, it is a binary outcome. The logical choice for analysis is binary logistic regression.
The two research questions or alternative hypotheses (HA) are as follows: (1) Can you predict whether the
customer will purchase the paging service? and (2) If a reliable prediction equation can be developed, which
of the selected variables are the most important? The null hypothesis (H0) is the opposite and states that the
variables would be of no use in predicting whether someone purchases the paging service.
We first check the selected variables for multicollinearity. We request a Spearman bivariate correlation table
to see how the independent variables correlate with one another. We want these to have low correlation
Page 179 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
coefficients. The ideal would be for the independent variables to then have high coefficients with the
dependent variable. The observed pattern is not perfect, but we believe it will not have a negative impact on
the development of an equation, so we proceed with logistic regression.
The first Classification Table shows a 75.6% prediction with no variables. The Variables in the Equation table
shows a Wald statistic of 12.3 and significance of .000 under these same conditions.
The Omnibus Tests of Model Coefficients under Block 1: Method = Enter shows a significance of .000,
providing evidence that the model fit is good. The Model Summary shows that between 35.7% and 53.2% of
the variance in the dependent variable (pager service purchase or not) can be accounted for by the model.
The Hosmer–Lemeshow test provides additional support for the model, since we fail to reject the null. The
chi-square value of 6.624 at 4 degrees of freedom and a significance level of .157 informs us that the model
is worthwhile.
The Classification Table indicates that the predictive power has increased to 85.1%, up from 75.6% in the
original table. Variables in the Equation adds more evidence that our model can be useful in predicting
the binary outcome. All the variables are found to contribute significantly to the model. This result gives us
additional confidence in our predictive model.
To answer our two research questions, we can say that we developed a prediction equation that made it
possible to predict whether an average customer would purchase the paging service or not. The second
question was also answered in that we found that all the variables had a positive impact on our equation.
Page 180 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 181 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 182 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 183 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
26.1 For this exercise, you will use the SPSS sample file called customer_dbase.sav. This dataset is
composed of 5,000 cases and 132 variables. You will select the first 10 scale variables and search for
underlying latent variables (factors) within these variables. The idea is that you must explore the data
in an attempt to reduce the 10 variables into smaller clusters, referred to as components in principal
component factor analysis. Write the null and alternative hypotheses, open the dataset, select the
correct statistical approach, and search for any underlying latent factors in the first 10 scale variables.
Answer:
Before getting started on the analysis, we write the null and alternative hypotheses. The null hypothesis (H0)
is that there are no underlying structures and that all the variables load equally. The alternative hypothesis
(HA) is that there are one or more underlying components (factors) and the variables do not load equally.
Go to Analyze/Dimension reduction/Factor, and then select and move the 10 variables (“Age in years” through
“Spouse years of education”) for analysis; then open the Descriptives window and check Univariate, Initial,
Coefficients, Significance, and KMO. In Extraction, click Unrotated and Scree. You have now generated
all the output you will need to determine if you have evidence in support of the alternative hypothesis. Next,
we interpret each generated table to learn more about the 10 variables.
Page 184 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
The table titled Correlation Matrix is not shown, but there are a number of variables that have correlations
below the recommended .3; in spite of this, we elect to proceed. Remember that high correlations are best
when doing factor analysis. Two additional tests presented in the next table give us evidence supportive of
efforts to uncover the underlying factors. The KMO and Bartlett’s test shows a value of .698, which is good
since any value greater than .6 is considered satisfactory. Bartlett’s test shows significance at .000, which is
also a positive indicator that any identified factors can be taken seriously.
The table titled Communalities gives the proportion of variability in the original variable that is accounted for
by the high-loading factors. The value of .934 means that 93.4% of the variance in the variable “Debt” is
accounted for by the high-loading factors, which are those with eigenvalues >1. The high-loading factors are
numbers 1, 2, and 3 in the next table.
The Total Variance Explained table shows the eigenvalues for the 10 new factors. Look at the column called
Total and notice the value of 4.382 for the first factor. It means that 43.816% of the total variance is accounted
for by this first factor (4.382/10 × 100). Another way to say this is that 43.816% of the total variance is
accounted for by this first factor. Factor 2 shows an eigenvalue of 1.696, and the % of Variance column shows
that 16.962% (1.696/10 × 100) of the total variance is explained by this factor. The column for Cumulative %
indicates 60.777%, which is the percentage of the total variance that can be accounted for by the first two
factors.
The Scree Plot shows three factors with eigenvalues greater than 1.0. These three factors account for
73.317% of the variance for all variables. We can say that the scree plot provides additional support for a two-
or three-factor solution to our factor analysis problem.
The final table is called Component Matrix, and it shows the factor-loading values for factors with eigenvalues
of 1.0 or more. These values are interpreted by any correlation coefficient. Zero indicates no loading, while
minus values, such as –.174 for Component 2, indicate that as the particular variable score increases, the
component score decreases. Those values with a plus sign, such as .194 for Component 1, indicate that as
the variable score increases, so does the component score.
The Component Matrix can be very useful when the data analyst is attempting to name the newly discovered
latent variable (factor). For instance, look at Factor 1, where we see the highest loadings on “debt” and
“income.” This is suggestive of a name dealing with financial conditions within a particular household. Factor
2 suggests a name associated with education. Factor 3 should be dropped.
Page 185 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 186 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 187 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 188 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
26.2 This review exercise uses the same SPSS sample file dataset as in the previous exercise
(customer_dbase.sav). However, this time you will examine eight scale variables dealing with pet
ownership (#29 through #36). You are to look for underlying latent variables that would permit the
reduction of these eight variables. Write the null and alternative hypotheses, select the correct
procedures, examine the initial statistics, and interpret the results.
Answer:
The null hypothesis (H0) is that there are no underlying structures and that all the variables load equally. The
alternative hypothesis (HA) is that there are one or more underlying components (factors) and the variables
Open customer_dbase.sav, and go to Analyze/Dimension reduction/Factor; then select and move the eight
variables. In the Descriptives window, check Univariate, Initial, Coefficients, Significance, and KMO. In
Extraction, click Unrotated and Scree. Click OK, and you generate several tables that are ready for your
interpretation.
After looking at the very low correlation coefficients in the initial Correlation Matrix, you see that there is no
hope for a successful solution of this factor analysis problem. However, in this example, we proceed to explain
Page 189 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
the following tables, which will point out the results of this initial data screening. Just keep in mind that you
would be justified to just stop at this point. We could say that the initial analysis indicates that the null cannot
be rejected and there is no evidence to support the alternative.
You may have noticed that the KMO and Bartlett’s test are not reported, another indication that there was no
underlying factor for these eight variables. The Total Variance Explained table indicates very little difference
in the eigenvalues for all variables. The scree plot is also atypical for a successful solution. A scree plot
showing the underlying components (factors) would drop to some elbow point and then level off with steadily
diminishing eigenvalues. The table for Component Matrix clearly shows that there are no unique factor
loadings and that all factors basically measure the same quantity as the original variable.
Page 190 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 191 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
26.3 For this review exercise, you will use the SPSS sample file titled telco.sav. This dataset has
1,000 cases and 22 variables measured at the scale level. You will select the first 16 of the scale
variables (up to wireten/wireless over tenure, #25) and attempt to identify any underlying factor(s)
that would permit data reduction. State the null and alternative hypotheses, select the statistical
method, and proceed with the analysis and interpretation.
Answer:
We begin by writing the null and alternative hypotheses. The null hypothesis (H0) is that there are no
underlying structures and that all the variables load equally. The alternative hypothesis (HA) is that there are
one or more underlying components (factors) and the variables do not load equally.
Go to Analyze/Dimension reduction/Factor, and then select and move the 16 variables for analysis; then open
the Descriptives window, and check Univariate, Initial, Coefficients, Significance, and KMO. In Extraction,
click Unrotated and Scree. Click OK, and you have all the output needed to answer the questions regarding
the underlying factors.
The Correlation Matrix does show many correlations exceeding the recommended .3 level, so we proceed
with the analysis of the output. The KMO value of .654 and Bartlett’s significance of .00 are also evidence that
Page 192 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
The Communalities table indicated the proportion of variability in the original variable that was accounted for
by the high-loading factors (1, 2, 3, and 4). These high-loading factors are numbers 1, 2, 3, and 4 in the next
table.
The Total Variance Explained table shows the eigenvalues for the 14 new factors. Look at the column called
Total and notice the value of 5.226 for the first factor. It means that 37.327% (5.226/14 × 100) of the total
variance is accounted for by this first factor. Factor 2 shows an eigenvalue of 2.796 and the % of Variance
column shows that 19.973% (2.796/14 × 100) of the total variance is explained by this factor. The column for
Cumulative % indicates 57.300%, which is the percentage of the total variance that can be accounted for by
the first two factors. The first four major factors (those with eigenvalues >1.0) account for 76.365% of the total
variance for all variables.
The Scree Plot shows four factors with eigenvalues greater than 1.0. These four factors account for 76.365%
of the variance for all variables. We can say that the scree plot provides additional support for a two-, three-,
or four-factor solution.
The Component Matrix shows the factor-loading values for the four factors having eigenvalues of 1.0 or more.
We interpret these values, as we do any correlation coefficient. We can readily see strong correlations for
Factors 1 and 2, which would argue for a two- or three-factor solution.
Looking at Factor 1, it appears that the high loadings are associated with convenience of uninterrupted
communication—perhaps, you could name this factor “convenient communication.” Factor 2 loaded more on
equipment-related issues and might be named “reliability.”
Page 193 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 194 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 195 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 196 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
1) Those datasets with “expl” in their name are used in chapter examples.
2) Those datasets beginning with “prob” are used in the Review Exercises.
3) Those datasets with only a name and “.sav” are SPSS’s sample files.
no datasets used
Page 197 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 198 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 199 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 200 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 201 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 202 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 203 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach
Sage Sage Research Methods
© 2019 by SAGE Publications, Inc.
Page 204 of 204 Using IBM® SPSS® Statistics: An Interactive Hands-On Approach