Interpersonal Skills: By: S.Sohail Sajjad
Interpersonal Skills: By: S.Sohail Sajjad
SPSS performs statistical analysis range from basic descriptive statistics, such
as average and prevalence, to advanced inferential statistical, such as
regression model, analysis of variance (ANOVA), factor analysis etc.
SPSS also contains several tools for manipulating data, including functions
for recording data, macros programming on visual basic editor, merging data,
and aggregating complex data sets.
FEATURES AND BENEFITS
For Small And Medium Enterprises (SME)
Resources & best practices.
Techniques for cleaning data.
Access Data in Relational Databases.
MS Excel:
Widely available (part of MS Office Suite)
Not a statistical software spreadsheet
Finance, math, and statistics applications
SPSS:
Robust software for sophisticated statistical applications
Upgrade with add-ins
WHY SPSS?
Strengths:
Very robust statistical software
Many complex statistical tests available
Good stats coach help with interpreting results
Easily and quickly displays data tables
Can be expanded
Using the syntax feature
Purchasing add-ins
Limitations:
Can be expensive
Not intuitive to use
Typically requires additional training to maximize features (at a cost)
Graphing feature not as simple as Excel
Understanding SPSS
Data Editor:
The Data Editor displays the contents of the active data file. The
information in the Data Editor consists of variables and cases.
- In Data View, columns represent variables, and rows represent cases
(observations).
6
On start up, click Type
in data, then click OK.
7
Data Editor
8
Entering Data
Click the Variable View tab at the bottom of the Data Editor window.
You need to define the variables that will be used. In this case, the needed
variables are: staffNo, Fname, LName, Position, Gender, DoB, Salary and
BranchNo.
In the first row of the first column, type staffNo, in the second row, type
FName, in the third row, type LName and so on.
New variables are automatically given a Numeric data type.
For string data type, click the button on the right side of the Type cell to
open the Variable Type dialog box. Select String to specify the variable
type. Click OK to save your selection and return to the Data Editor.
If you don't enter variable names, unique names are automatically
created. However, these names are not descriptive and are not
recommended for large data files.
Click the Data View tab to continue entering the data.
9
Entering Data (Contd.)
10
Entering Data
11
Entering Data (Contd.)
Click the Data View tab to continue entering the data.
The names that you entered in Variable View are now the headings
for the first three columns in Data View.
Begin entering data in the first row, starting at the first column.
In the StaffNo column, type s21. In the Fname column, type John. In
the LName column, type White and so on.
Move the cursor to the second row of the first column to add the
next subject's data, and repeat the above steps.
12
Entering Data (Contd.)
Currently, the salary column displays decimal points, even though its value is
intended to be integer.
- Click the Variable View tab at the bottom of the Data Editor window.
- In the Decimals column of the Salary row, type 0 to hide the decimal.
Entering Data (Contd.)
14
Defining Data
In addition to defining data types, you can also define descriptive variable
labels and value labels for variable names and data values. These
descriptive labels are used in statistical reports and charts.
Labels can be up to 255 bytes. These labels are used in your output to
identify the different variables.
Click the Variable View tab at the bottom of the Data Editor window.
In the Label column of the FName row, type First Name. In the Label
column of the LName row, type Last Name. In the Label column of the DoB
row, type Date of Birth.
15
Defining Data (Contd.)
16
Defining Data (Contd.)
17
Adding Value Labels for Variables
Value labels provide a method for mapping your variable values to a string
label. In this example, there are two acceptable values for the Gender
variable.
A value of F means that the subject is female, and a value of M means that the
subject is male.
Click the Values cell for the Gender row, and then click the button on the right
side of the cell to open the Value Labels dialog box.
The value label is the string label that is applied to the specified numeric (or
string) value.
Type M in the Value field. Type Male in the Label field. Click Add to add this
label to the list.
18
Adding Value Labels for Variables
19
Adding Value Labels for Variables
20
Adding Value Labels for Variables
21
Adding Value Labels for Variables
These labels can also be displayed in Data View, which can make your data
more readable.
Click the Data View tab at the bottom of the Data Editor window.
The labels are now displayed in a list when you enter values in the Data
Editor. This setup has the benefit of suggesting a valid response and
providing a more descriptive answer.
If the Value Labels menu item is already active (with a check mark next to
it), choosing Value Labels again will turn off the display of value labels.
22
When Value Label is Active
23
Using Value Labels for Data Entry
You can use value labels for data entry.
Click the Data View tab at the bottom of the Data Editor window.
In the first row, select the cell for gender. Click the button on the right side
of the cell, and then choose Male from the drop-down list. In the second
row, select the cell for gender. Click the button on the right side of the cell,
and then choose Female from the drop-down list.
Only defined values are listed, which ensures that the entered data are in
a format that you expect.
24
Using Value Labels for Data Entry
25
Handling Missing Data
Missing or invalid data are generally too common to ignore. Survey
respondents may refuse to answer certain questions, may not know the
answer, or may answer in an unexpected format.
If you don't filter or identify these data, your analysis may not provide
accurate results.
For numeric data, empty data fields or fields containing invalid entries are
converted to system-missing, which is identifiable by a single period.
26
Handling Missing Data
Click the Variable View tab at the bottom of the Data Editor window.
Click the Missing cell in the salary row, and then click the button on the
right side of the cell to open the Missing Values dialog box.
In this dialog box, you can specify up to three distinct missing values, or
you can specify a range of values plus one additional discrete value.
Select Discrete missing values. Type 000 in the first text box and leave the
other two text boxes empty.
27
Handling missing data
28
Missing Values for a Numeric Variable
Now that the missing data value has been added, a label can be applied to
that value.
Click the Values cell in the Salary row, and then click the button on the
right side of the cell to open the Value Labels dialog box.
Type 000 in the Value field. Type No Response in the Label field.
Click Add to add this label to your data file. Click OK to save your changes
and return to the Data Editor.
29
Missing Values for a Numeric Variable
30
Missing Values for a String Variable
Missing values for string variables are handled similarly as the
missing values for numeric variables.
31
Running an Analysis
The Analyze menu contains a list of general reporting and statistical
analysis categories.
We will start by creating a simple frequency table (table of counts).
From the menus choose: Analyze -> Descriptive Statistics-> Frequencies
The Frequencies dialog box is displayed.
An icon next to each variable provides information about data type and
level of measurement.
In the dialog box, you choose the variables that you want to analyze from
the source list on the left and drag and drop them into the Variable(s) list
on the right.
Click the variable Salary.
You can obtain additional information by right-clicking any variable name
in the list. For example, you could click Salary and choose Variable
Information.
Click OK to run the procedure.
32
Running an Analysis
33
Running an Analysis
34
Running an Analysis
35
Running an Analysis
36
Running an Analysis
37
Examining Summary Statistics for Individual Variables
38
Level of Measurement
Different summary measures are appropriate for different types of data,
depending on the level of measurement:
Categorical:
39
Level of Measurement
There are two basic types of categorical data:
- Nominal:
Categorical data where there is no inherent order to the categories. For
example, a job category of sales is not higher or lower than a job category of
marketing or research.
- Ordinal:
Categorical data where there is a meaningful order of categories, but there
is not a measurable distance between categories. For example, there is an
order to the values high, medium, and low, but the "distance" between the
values cannot be calculated.
40
Level of Measurement (Contd.)
Scale:
Data measured on an interval or ratio scale, where the data values
indicate both the order of values and the distance between values.
41
Summary Measures for Categorical Data
For categorical data, the most typical summary measure is the number or
percentage of cases in each category. The mode is the category with the
greatest number of cases. For ordinal data, the median (the value at which
half of the cases fall above and below) may also be a useful summary
measure if there is a large number of categories.
42
Charts for Categorical Data
You can graphically display the information in a frequency table with a bar
chart or pie chart.
Click Charts.
43
Charts for Categorical Data (Contd.)
44
Charts for Categorical Data (Contd.)
45
Charts for Categorical Data (Contd.)
46
Summary Measures for Scale Variables
There are many summary measures available for scale variables, including:
- Measures of central tendency. The most common measures of central tendency
are the mean (arithmetic average) and median (value at which half the cases fall
above and below).
47
Summary Measures for Scale Variables (Contd.)
Click statistics.
48
Summary Measures for Scale Variables (Contd.)
49
Summary Measures for Scale Variables (Contd.)
50
Histograms for Scale Variables
Click Charts.
51
Cross Tabulation Tables
Cross tabulation tables (contingency tables) display the relationship between
two or more categorical (nominal or ordinal) variables. The size of the table is
determined by the number of distinct values for each variable, with each cell
in the table representing a unique combination of values. Numerous statistical
tests are available to determine whether there is a relationship between the
variables in a table.
From the menus choose: Analyze -> Descriptive Statistics -> Crosstabs
The cells of the table show the count or number of cases for each joint
combination of values.
52
A Simple Cross Tabulation
53
A Simple Cross Tabulation
54
Counts vs. Percentages
It is often difficult to analyze a cross Tabulation simply by looking at the
simple counts in each cell.
Open the Crosstabs dialog box again. (The two variables should still be
selected.)
Click cells.
Click Continue and then click OK in the main dialog box to run the
procedure.
55
Counts vs. Percentages
56
Counts vs. Percentages
57
Significance Testing for Cross Tabulations
The purpose of a cross Tabulation is to show the relationship (or lack thereof)
between two variables.
A number of tests are available to determine if the relationship between two cross
Tabulated variables is significant. One of the more common tests is chi-square.
One of the advantages of chi-square is that it is appropriate for almost any kind of
data.
- Open the Crosstabs dialog box again, Click Statistics.
- Click (check) Chi-square.
- Click Continue and then click OK in the main dialog box to run the procedure.
Pearson chi-square tests the hypothesis that the row and column variables are
independent. The actual value of the statistic isn't very informative.
The significance value (Asymp. Sig.) has the information we're looking for. The
lower the significance value, the less likely it is that the two variables are
independent (unrelated).
58
Significance Testing for Cross Tabulations
59
Significance Testing for Cross Tabulations
60
Adding a Layer Variable
You can add a layer variable to create a three-way table in which
categories of the row and column variables are further subdivided by
categories of the layer variable.
Open the Crosstabs dialog box again. Click Cells. Uncheck Row Percents.
Click Continue.
61
Adding a Layer Variable
62
Adding a Layer Variable
63
Reading Data
Data can be entered directly, or it can be imported from a number of
different sources such as SPSS-format data files; spreadsheet applications,
such as Microsoft Excel; database applications, such as Microsoft Access;
and text files.
64
Reading Data from Spreadsheets
Rather than typing all of your data directly into the Data Editor, you can read data
from applications such as Microsoft Excel.
Make sure that Read variable names from the first row of data is selected. This
option reads column headings as variable names.
If the column headings do not conform to the SPSS variable-naming rules, they are
converted into valid variable names and the original column headings are saved as
variable labels.
65
Reading Data from Spreadsheets (Contd.)
If you want to import only a portion of the spreadsheet, specify the range
of cells to be imported in the Range text box.
The data now appear in the Data Editor, with the column headings used as
variable names.
Since variable names can't contain spaces, the spaces from the original
column headings have been removed. For example, Marital status in the
Excel file becomes the variable Maritalstatus. The original column heading
is retained as a variable label.
66
Reading Data from Spreadsheets (Contd.)
67
Reading Data from Spreadsheets (Contd.)
68
Reading Data
from Spread
sheets (Contd.)
69
Reading Data from a Database
Data from database sources are easily imported using the Database
Wizard.
From the menus choose: File -> Open Database -> New Query
Select MS Access Database from the list of data sources and click
Next.
Click Browse to navigate to the Access database file that you want
to open.
Click OK in the login dialog box.
In the next step, you can specify the tables and variables that you
want to import.
Drag your entire table to the Retrieve Fields In This Order list. Click
Next.
In the next step, you select which records (cases) to import.
70
Reading Data from a Database (Contd.)
If you do not want to import all cases, you can import a subset of cases (for
example, males older than 30), or you can import a random sample of cases
from the data source. For large data sources, you may want to limit the
number of cases to a small, representative sample to reduce the processing
time.
Field names are used to create variable names. If necessary, the names are
converted to valid variable names. The original field names are preserved as
variable labels. You can also change the variable names before importing the
database.
Click the Recode to Numeric cell in the ID field. This option converts string
variables to integer variables and retains the original value as the value label
for the new variable.
71
Reading Data from a Database (Contd.)
The SQL statement created from your selections in the Database
Wizard appears in the Results step. This statement can be executed
now or saved to a file for later use.
All of the data in the Access database that you selected to import
are now available in the Data Editor.
72
Reading Data from a Database (Contd.)
73
Reading Data from a Database (Contd.)
74
Reading Data from a Database (Contd.)
75
Reading Data from a Database (Contd.)
76
Reading Data from a Database (Contd.)
77
Reading Data from a Database (Contd.)
78
Computing New Variables
Add two new variables to your data file, namely Current Age
and Number of years worked. We will create a variable for
the employees age at the time he or she started that job. It is
going to be the computed difference between current age and
number of years at current job, which should be the
approximate age at which the employee started that job.
79
Computing New Variables (Contd.)
From the menus in the Data Editor window choose: Transform -> Compute
Variable.
Select Current Age in the source variable list and click the arrow button to
copy it to the Numeric Expression text box.
Click the minus () button on the calculator pad in the dialog box (or press the
minus key on the keyboard).
Select Number of Years worked and click the arrow button to copy it to the
expression.
The new variable is displayed in the Data Editor. Since the variable is added to
the end of the le, it is displayed in the far right column in Data View and in
the last row in Variable View.
80
Computing New Variables (Contd.)
81
Computing New Variables (Contd.)
82
Using Functions in Expressions
You can also use predened functions in expressions. More than 70 built-
in functions are available, including:
- Arithmetic functions
- Statistical functions
- Distribution functions
- Logical functions
- Date and time aggregation and extraction functions
- Missing-value functions
Functions are organized into logically distinct groups, such as a group for
arithmetic operations and another for computing statistical metrics.
A brief description of the currently selected function (in this case, SUM) or
system variable is displayed in a reserved area in the Compute Variable
dialog box.
83
Using Functions in Expressions (Contd.)
84
Using Functions in Expressions (Contd.)
Select the appropriate group from the Function group list. The group
labeled All provides a listing of all available functions and system variables.
Double-click the function in the Functions and Special Variables list (or
select the function and click the arrow adjacent to the Function group list).
The function is inserted into the expression. If you highlight part of the
expression and then insert the function, the highlighted portion of the
expression is used as the first argument in the function.
85
Using Functions in Expressions (Contd.)
The function is not complete until you enter the arguments, represented
by question marks in the pasted function. The number of question marks
indicates the minimum number of arguments required to complete the
function.
Enter the arguments. If the arguments are variable names, you can paste
them from the variable list.
86
Using Functions in Expressions (Contd.)
Remove variable Current age and calculate it from variables jobStart and
Number of years worked
87
Using Functions in Expressions (Contd.)
88
Calculating the Length of Time between Two Dates
A number of tasks commonly performed with dates and times can be easily
accomplished using the Date and Time Wizard. Using this wizard, you can:
- Create a date/time variable from a string variable containing a date or time.
- Extract a part of a date or time variable; for example, the day of month from a
date/time variable which has the form mm/dd/yyyy.
One of the most common tasks involving dates is calculating the length of time
between two dates. As an example, to calculate an employees current age from its
date of birth.
89
Calculating the Length of Time between Two Dates (Contd.)
90
Calculating the Length of Time between Two Dates (Contd.)
Enter CAge for the name of the result variable. Result variables cannot have the
same name as an existing variable.
Enter Current age as the label for the result variable. Variable labels for result
variables are optional.
Leave the default selection of Create the variable now, and click Finish to create
the new variable.
The new variable, CAge, displayed in the Data Editor is the integer number of years
between the two dates. Fractional parts of a year have been truncated.
91
Calculating the Length of Time between Two Dates (Contd.)
92
Calculating the Length of Time between Two Dates (Contd.)
93
Calculating the Length of Time between Two Dates (Contd.)
94
Calculating the Length of Time between Two Dates (Contd.)
95
Calculating the Length of Time between Two Dates (Contd.)
96
Sorting Data
Sorting cases (sorting rows of the data le) is often useful and sometimes
necessary for certain types of analysis.
To reorder the sequence of cases in the data le based on the value of one or
more sorting variables:
- From the menus choose: Data -> Sort Cases
If you select multiple sort variables, the order in which they appear on the Sort by
list determines the order in which cases are sorted. For string variables, uppercase
letters precede their lowercase counterparts in sort order.
97
Sorting Data (Contd.)
98
Sorting Data (Contd.)
99