BUP 02 Data Management
BUP 02 Data Management
bd
Professor of Statistics, DU E-mail: [email protected]
Data Management
Data Manipulation I:
Data Manipulation: To create new data file from any existing data file according to the need
of the researcher.
Data Manipulation includes
A. Inserting Variables B. Inserting Cases
C. Go to Case / Variable D. Merging Files
A. Inserting Variables:
Suppose that you want to insert more variables in the data file just created. Inserting
variables in an existing data file is not a complex job.
To do this job first create a new data file or open an existing data file.
To insert a new variable in the existing data file we can follow any one of the following
techniques.
1
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
where you want the variable and then click the toolbar item.
B. Inserting Case:
Sometimes you may need to consider more cases anywhere in a data file after its creation.
You can do it by following any of the procedures.
Technique 2:
Put the cursor in a cell of the row where you want the case.
Now click on Edit (Menu Bar) and then click on Insert Case. Now you can enter information
on different variables for that case (Individual).
Technique 3:
You can use the Toolbar menu to insert a case. Put the cursor in a cell of the row where you
want the case. Then click on the Toolbar item. Now you can enter information on different
variables for that case (Individual).
Edit
Go to Case...
Enter an integer value that represents the current row number in Data View.
Note: The current row number for a particular case can change due to sorting and other
actions.
Variables
Edit
Go to Variable...
2
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Enter the variable name or select the variable from the drop-down list.
You can use the Toolbar menu to find cases, variables and imputations.
D. Merging Files:
Sometimes you may encounter the data files that contain (i) same variables under same
name/different names (ii) different variables. Then you need to take them in a single data
file. These can be done through a procedure known as Merge Files. To do these jobs, let us
create the following data files named spss-test1, spss-test2, spss-test3, spss-test4 and spss-
test5.
Data file: spss-test1
Time spent for
Education Age
Identification # Internet Use Marital Status
(in years) (in years)
(id) (in hours) (marista)
(edu) (age)
(inttime)
1 16 30 3 1
2 18 32 5 1
3 17 37 4 2
4 15 30 1 1
5 14 26 4 2
6 13 27 1 2
7 15 28 4 2
8 11 24 2 2
9 16 32 3 1
Marital Status: Married=1, Unmarried=2
3
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
3 17 37 4 2
4 15 30 1 1
5 14 26 4 2
6 13 27 1 2
7 15 28 4 2
8 11 24 2 2
9 16 32 3 1
Marital Status: Married=1, Unmarried=2
Data file: spss-test5 (spss-test3 selected for the cases with age >=30)
Education Age Time Spent for
Identification # Marital Status
(in years) (in years) Internet Use
(id) (ms)
(ys) (ageyears) (in hours) (timeint)
1 16 30 3 1
2 18 32 5 1
9 16 32 3 1
3 17 37 4 2
4 15 30 1 1
A. Firstly, adding cases of two files with same number and same spelling of
variables.
Merge the files: spss-test1 & spss-test2
Keep a data file (spss-test1) open and follow the instructions given below.
Click on Data (Menu Bar)
Merge Files
Add Cases
Then you will have an option Browse. By which you have to select the data file from where
you want the cases to be included in the opened data file and click on Open.
Click Continue
4
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
B. Secondly, adding cases of two files with same number but different
spelling of variables.
For example, both ys in spss-test3 and edu in spss-test2 are same meaning. In such cases, you
can add cases by the following way:
Merge the files: spss-test3 & spss-test2
Keep a data file (spss-test3) open and follow the instructions given below.
Click on Data (Menu Bar)
Merge Files
Add Cases
Then you will have an option Browse. By which you have to select the data file from where
you want the cases to be included in the opened data file and click on Open.
Click Continue
Now select the variables (pair wise) in the box Variables in New Active Dataset. Then
click on OK.
Then save the data file by different name (not existing file name).
5
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Click Continue
Select: Match cases on key variables in sorted files
Select: Non-active data set is keyed table
Select: Excluded Variables (id) into the box Key Variables
Then click on OK.
Case2: Open file provides larger number of cases and your study is interested in the file that
is not opened (it provides smaller cases than the opened file).
Merge the files: spss-test4 & spss-test5
At first open the file spss-test5 and follow the instructions given below.
Click on Data (Menu Bar)
Sort cases
Sort by: id, Sort Order: Ascending
Click OK. Now save the file.
Now keep spss-test4 open and follow the instructions given below.
Click on Data (Menu Bar)
Merge Files
Add Variables
Then you will have a window.
Select: spss-_test5.sav[DataSet2] in the box: An open dataset
By which you have to select the data file from where you want the variables to be included in
the opened data file and click on Open.
Click Continue
Select: Match cases on key variables in sorted files
Select: Active data set is keyed table
Select: Excluded Variables (id) into the box Key Variables
Then click on OK.
You have received a Warning. Click OK.
Then save the data file by different name (not existing file name).
6
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
A. Splitting File:
You may encounter some situations where you need to analyze data based on the categories
of one or more categorical variables. For example, you might seek to know the picture of
income or education among the sexes either for comparison purposes or for organizing the
output by groups. To do this you have the SPSS tool Split File.
Split File splits the data file into separate groups for analysis based on the values of one or
more grouping variables. If you select multiple grouping variables, cases are grouped by each
variable within categories of the prior variable on the Groups Based On list. For example, if
you select Sex as the first grouping variable and Religion as the second grouping variable,
cases will be grouped by Religion classification within each Sex category.
Note that you can specify up to eight grouping variables and cases should be sorted by values
of the grouping variables.
For splitting file, Let us create the following data file or open an existing data file:
7
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Now select the variable Gender and send it to the box Groups Based on:
Click OK
Now if you carry out any analysis on this data file you will get the results in your desired
format.
Again you may seek to compare income among people of different religions and sexes at a
time. Repeat the above steps. Here with Sex variable select the variable Religion and send it
to the box Groups Based on: click OK.
B. Case Selection
Sometimes you may have special interest on particular cases only. To cope with this
situation, SPSS will have for you the Select Cases option. For instance, suppose you want to
analyze data considering only male respondents/cases. Then you have to select only male
cases. To do it, use the following instructions:
Click on Data (Menu Bar)
Select Cases
Select If condition is satisfied
Click on If
Again suppose you want to study only Hindu and Christian people. Then select these cases
following the instruction given below:
8
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
them as follows:
gender=2 & religion=1
Now click on Continue and then OK.
If you want to study the individuals who are either Muslim or females, then you have to
select the desired cases simply by taking Gender and Religion variables in the expression box
of Select cases: If and writing them as follows:
gender=2 | religion=1
Now click on Continue and then OK.
Data Transformation:
Compute
Compute command is used to compute values for a variable based on numeric
transformations of other variables. Using this command we can create new variables or
replace the existing variables (for new variables we can also specify the variable type and
label). Note that we can compute values for numeric or string (alphanumeric) variables only.
We can also compute values selectively for subsets of data based on logical conditions. For
computation purposes we can use mathematical and / or logical operators. We can use over
70 built-in functions, including arithmetic functions, statistical functions and other functions.
The general expression of Compute command is as follows:
Compute newvar iable = Arithmetic or Logical expression.
The following steps are followed to compute variables:
I. From the menu choose
9
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Transform
Compute
Computer will show the compute variable dialogue box.
II. Type the name of a single target variable, it can be an existing variable or a new
variable .
III. Write an Arithmetic or Logical Expression in the Numeric Expression field.
To build an expression, either paste components of variable list into the Expression
field or then edit the name or type directly in the expression field. To build Numeric
Expression we can use Existing Variable Names, Arithmetic Operators, Constants and
Functions. Besides we can use Calculator Pad, Variable List and Function List.
Calculator Pad
We can use calculator pad to build Arithmetic or Logical Expression. For using the calculator
pad click the number on it using mouse. It is possible to make complex Expression using this
Calculator Pad. There are three types of operators and one function in calculator pad.
II. Relational Operator: Relational operators are used to compare the similar type
of elements/variables. For instance, a string variable is compared with another
string variable. Again a numeric variable/value can be compared with another
numeric variable/value. The relational operators are:
Operator Meaning/use
< Less than
> Greater than
Greater than or equal
Less than or equal
or ~= Not equal
= Equal
III. Logical Operator: Logical operator is used to make relatively more Complex
Expression. Suppose we want the people whose age is greater than equal 25 and
less than equal 60, then we can write: Age 25 AND Age 60. The AND
used in this expression is a Logical operator. The Logical Operators are as
follows:
Operator use
AND, & When both the conditions are true
OR When one of the two conditions are true
and another condition is false
NOT When does not satisfy the condition
10
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Functions
There are more than 70 built-in functions, which includes:
Arithmetic Functions
Statistical Functions
Logical Functions
Missing Value Functions etc.
Illustrative Examples
Example 1:
Suppose we want to compute the total marks obtained by the competitors in the written test
for a job in a firm, from the following data:
11
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Then we will see that a new variable Tmarks has automatically been created on the right-
most column of the data sheet. The data sheets now look like:
Example 2:
Suppose we want to compute the Average marks obtained from 3 subjects on a test from the
following data:
Mathematics Statistics Economics
25 23 18
20 22 20
17 19 21
21 21 13
18 20 15
To get the Average marks, we will follow the following formula:
Average Marks = (Marks in Mathematics + Marks in Statistics +Marks in
Economics)/3.
We will denote the Average Marks as avmarks. In order to do that we follow the following
steps:
(a) Click Transform
(b) In the Compute variable dialog box type avmarks in the Target variable box
appeared at the left-upper corner in the dialog box.
(c) Using either the calculator pad or the keyboard write (Mathematics + Statistics +
Economics)/3 in the Numerical Expression box.
12
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Example 3:
Suppose we want to compute the yearly increment of the employee on the basis of their
salary from the following data using the formula:
Increment = (10% of the salary) + 1000
Salary
10000
15000
12000
13000
14000
17000
15500
16500
17500
To do this, we follow the following steps:
(a) Click Transform
(b) In the Compute variable dialog box type increm in the target variable box
appeared at the left-upper corner in the dialog box.
(c) Using either the calculator pad or the keyboard write (Salary*0.10) + 1000
(d) Click left mouse button to OK.
Data Management
CONDITIONAL TRANSFORMATIONS
13
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Illustrative Examples
Example 4:
Suppose we want to find the deduction from the salary for the transport facility for the
employees of a firm, from the following data of salary. It is given that 5% of salary is
deducted if salary is greater than 12,000 taka.
Salary
10000
15000
12000
13000
14000
17000
15500
16500
17500
To perform this computation, we follow the following steps:
(a) From the menus choose
Transform
Compute……..
Then the Compute Variable dialog box will be open.
(b) In the Compute Variable dialog box type deduct in the Target Variable box appeared at
the left corner in the dialog box.
salary * 0.05
(d) Click If box appeared below the calculator pad. This will open
Compute Variable: If Cases dialog box
(e) Select include if case satisfies condition, which is appeared on the upper horizontal wider
bar.
(g) Click left mouse button to continue box to return the Compute Variable dialog box.
(h) Click OK.
Now it is seen that a new variable deduct has automatically been created on the right most
column on our data sheet. The data sheet now looks like-
14
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Salary deduct
10000 .
15000 750
12000 .
13000 650
14000 700
17000 850
15500 775
16500 825
17500 875
(b) In the Compute Variable dialog box type deduct in the Target Variable box appeared at
the left corner in the dialog box.
(c) Using either the calculator pad or the keyboard write:
‘0’ at the numeric expression
(d) Click If box appeared below the calculator pad. This will open
Compute Variable: If Cases dialog box
(e) Select include if case satisfies condition, which is appeared on the upper horizontal wider
bar.
(f) Using either the calculator pad or the keyboard write
salary <= 12000
in the Compute Variable : IF Cases dialog box.
(g) Click left mouse button to continue box to return the Compute Variable dialog box.
(h) Click OK.
Example 5:
Suppose we want to compute the yearly increment of the employee of an institution who
satisfies the following condition from the following data:
Condition: Increment = 15% of the salary if Job Category = 3 and Experience is greater or
equal to 5 years.
salary jobcat exper
10000 1 5
15000 2 6
17000 3 7
21000 3 8
18000 3 5
Now to find the Increment we follow the following steps:
15
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Recode into Same Variables reassigns the values of existing variables or collapses ranges of
existing values into new values. For example, we can collapse Marks into Marks Range
categories. We can recode numeric and string variables, but we cannot recode numeric and
string variables together. If we select multiple variables, they must be all of the same type.
To recode values of a variable into same variable we follow the following steps:
(a)From the menus choose:
Transform
Recode
Into Same Variables
(b) Select the variable which we want to recode. (If we select multiple variables, they must all
be of the same type, numeric or string).
We can define values to recode in this dialog box. All value specifications must be the same
data type (numeric or string) as the variables selected in the main dialog box. The variable
whose value is to be recoded is defined as Old Value and after fixing its new value we click
the Add button. We can recode more than one Old Values to one New Value, but we can not
recode one Old Value into more than one new value.
16
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Old Value: The values to be recoded. We can recode single values, range of values.
New Value: The single value into which each old value or range of values is recoded.
Old New: The list of specifications that will be used to recode the variable(s).We can add,
change and remove specifications from the list.
Illustrative Examples
Example 1:
Suppose we want to define ‘Educational Status’ on the basis of ‘year of schooling’ from the
following data using the following specifications:
0 =1 = Illiterate
1-5 =2 = Primary
6-10 =3 = Secondary
11-12 =4 = Higher Secondary
13-16 =5 = Graduate
17 =6 = Post Graduate
18+ =7 = Higher
yearsch
15
7
14
8
13
0
18
6
20
11
10
5
12
16
Now, to recode this data into new values, we follow the following steps:
(a)From the menus choose :
Transform
Recode
Into same variables……….
This will open the Recode Into Same Variable dialog box.
(b) Select yearsch from the variable list (left window) and then click the arrow on the vertical
bar of the dialog box with the left mouse.
(c) Then we click on Old and New Values option.
(d) We shall see the Recode Into Same Variables: Old and New Values dialog box.
17
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
(e) Using the Old value and New value options we recode the variable in our desired format.
(f) Click left mouse button to Continue box to return the Recode Into Same Variable dialog
box.
(g) Click OK.
We shall see that the variable yearsch has automatically been recoded on the existing
variable.
Example 2:
Suppose we want to define the ‘Social Status’ on the basis of Income Variable given below
using the following specifications:
income
20000
1800
35000
56000
3200
17000
78000
22000
900
7000
32000
125000
45000
245000
Now, to recode this data into new values, we follow the following steps:
This will open the Recode Into Same Variable dialog box.
(b) Select income from the variable list (left window) and then click the arrow on the vertical
bar of the dialog box with the left mouse.
18
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
(d) We shall see the Recode Into Same Variables: Old and New Values dialog box.
(e) Using the Old value and New value options we recode the variable in our desired format.
(f) Click left mouse button to Continue box to return the Recode Into Same Variable dialog
box.
We shall see that the variable income has automatically been recoded on the existing
variable.
2. Recode into Different Variables
Recode into Different Variables reassigns the values of existing variables or collapses ranges
of existing values into new values for a new variable. For example, we can collapse Marks
into a new variable containing Marks-Range categories. We can recode numeric and string
variables, but we cannot recode numeric and string variables together. If we select multiple
variables, they must be all of the same type. Also we can recode numeric variables into string
variables and string variables into numeric variables.
To recode values of a variable into different variable we follow the following steps:
(b) Select the variable which we want to recode. (If we select multiple variables, they must all
be of the same type, numeric or string).
(c) Enter an output (new) variable name for each new variable and click Change.
(d) Click Old and New Values and specify how to recode values.
We can define values to recode in the Old and New Value dialog box. All value
specifications must be the same data type (numeric or string) as the variables selected
in the main dialog box.
We can recode more than one Old Values to one New Value, but we cannot recode
one Old Value into more than one new value.
If we want to recode a numeric variable into a string variable, you must also select
Output variables are strings.
Any old values that are not specified are not included in the new variable, and cases
with those values will be assigned the system-missing value for the new variable. To
include all old values that do not require recoding , select All Other Values for the old
value and Copy old value(s) for the new value.
19
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Illustrative Examples
Example 3:
Suppose we want to define the ‘Social Status’ on the basis of Income Variable given below
using the following specifications:
income
20000
1800
35000
56000
3200
17000
78000
22000
900
7000
32000
125000
45000
245000
Now, to recode this data into new values, we follow the following steps:
(a)From the menus choose :
Transform
Recode
Into Different variables……….
This will open the Recode Into Different Variable dialog box.
(b) Select income from the variable list (left window) and then click the arrow on the vertical
bar of the dialog box with the left mouse.
(c) We shall see the variable income ? In the Numeric Variables Output Variable box.
(d) Write the name of new variable i.e. nincome on Output Variable Name : box.
(e) Now label the new variable using the Output Variable Label: box.
(f) Click on Change option.
(c) Then we click on Old and New Values option.
(d) We shall see the Recode Into Different Variables: Old and New Values dialog box.
(e) Using the Old value and New value options we recode the variable in our desired format.
(f) Click left mouse button to Continue box to return the Recode Into Same Variable dialog
box.
(g) Click OK.
20
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
Example 4:
Suppose we want to define ‘Educational Status’ on the basis of ‘year of schooling’ from the
following data using the following specifications:
Year of schooling new value(code) meaning(value label)
0 =1 = Illiterate
1-5 =2 = Primary
6-10 =3 = Secondary
11-12 =4 = Higher Secondary
13-16 =5 = Graduate
17 =6 = Post Graduate
18+ =7 = Higher
yearsch
15
7
14
8
13
0
18
6
20
11
10
5
12
16
Now, to recode this data into new values, we follow the following steps:
This will open the Recode Into Different Variable dialog box.
(b) Select yearsch from the variable list (left window) and then click the arrow on the vertical
bar of the dialog box with the left mouse.
(c) We shall see the variable yearsch ? in the Numeric Variables Output Variable box.
(d) Write the name of new variable i.e. scstatus on Output Variable Name : box.
(e) Now label the new variable using the Output Variable Label: box.
(f) Click on Change option.
(c) Then we click on Old and New Values option.
21
Dr. Md. Abdus Salam Akanda Website: http://du.ac.bd
Professor of Statistics, DU E-mail: [email protected]
(d) We shall see the Recode Into Different Variables: Old and New Values dialog box.
(e) Using the Old value and New value options we recode the variable in our desired format.
(f) Click left mouse button to Continue box to return the Recode Into Same Variable dialog
box.
(g) Click OK.
We will see that the variable yearsch has automatically been recoded to a new variable.
…………………………………
22