0% found this document useful (0 votes)
62 views9 pages

Data Management: I. Importing Data From Excel To SPSS

Three key elements of data management processes are importing data into SPSS from Excel, labeling variables and values for clarity, and sorting and merging datasets. The document provides detailed steps for importing an Excel file into SPSS, inspecting the data using codebook and dictionary commands, labeling variables and values, sorting cases by variables, and merging datasets sorted by a common key variable. Modifying data through computing new variables from existing variables using compute and recode commands is also demonstrated.

Uploaded by

Lucyl Mendoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views9 pages

Data Management: I. Importing Data From Excel To SPSS

Three key elements of data management processes are importing data into SPSS from Excel, labeling variables and values for clarity, and sorting and merging datasets. The document provides detailed steps for importing an Excel file into SPSS, inspecting the data using codebook and dictionary commands, labeling variables and values, sorting cases by variables, and merging datasets sorted by a common key variable. Modifying data through computing new variables from existing variables using compute and recode commands is also demonstrated.

Uploaded by

Lucyl Mendoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

DATA MANAGEMENT

Three key elements to data management processes include:


1. Importing data into SPSS;
2. Labelling data (variables and values); and
3. Sorting and merging data.

I. Importing data from Excel to SPSS


One of the more common formats for entering and storing raw data is Microsoft Excel. If you
want to work with your data in SPSS, however, you’ll need to import the data from Excel to SPSS.
This can be relatively straightforward, but there are a few things to look out for in the conversion
process.

Steps:
1. Save all your SPSS files, and Close SPSS. Open the Excel file you downloaded earlier
– HLTH1025_2016.xlsx
2. Have a look at how the Data worksheet is set up. For instance, how many records
are there? How many variables? What format are the data for each variable? If the
data are numeric, make sure the format is set to “Number”.
3. When you’re happy with the Excel file, close it.
4. Open your SPSS syntax file.
5. Click File > Open > Data …
6. Click the “Files of type:” drop down box and select Excel (.xls, .xlsx, .xlsm)
7. Navigate to and select the HLTH1025_2016.xlsx file that you saved to your working
directory.
8. Click “Paste”, to paste the syntax to your syntax file
9. A new dialogue box will appear – you should tick the box “Read variable names from
first row of data”, as the H&S excel file has variable names in the first row.
10. In that same dialogue box, select the “Data” worksheet to import (that’s where the
data are stored).
11. Click OK. This will paste the syntax to import the data – it hasn’t imported the data
yet!

II.Inspecting your data


There are a couple of useful commands for inspecting your data:
1. The codebook command is best run through the menus if you have a large number of variables
and you want to look at information for all of them.
1.1 Click Analyze > Reports > Codebook…
1.2 Select all of the variables you want information for, and move them from the left hand box
to the right hand box:

1.3 Click Paste.


1.4 Select and run your codebook syntax from your syntax file.
1.5 Have a look at the information presented about your variables in the Output window:
2. To display the data dictionary, all you need to type into your syntax file is:
display dictionary.
You should now see something like this in your Output Viewer window:

III. Labeling your data

Later on, when you’re running some analyses and you want to look at, say, how many of your
respondents are male or female, you’ll run a procedure and the results will be displayed in the
Output Viewer window. If we left our data as they are, we’d get the results we want, but we
probably wouldn’t know how to read them. E.g., if I ran that procedure, here’s the output I’d get:

But which ones are male and which ones are female?! Because I haven’t labelled my data, it can
be difficult to remember what all the values for each variable refer to.

So here’s a tip: always label your variables and their values (particularly for categorical variables).

The syntax to label variables and values is pretty straightforward.


To give the variable sex a descriptive label, here’s what you could type in your syntax file:
variable labels sex Sex of respondent.
Then, to label the categories of sex (in this case, male and female), we could type the following:
value labels sex 1 ‘Male’ 2 ‘Female’.
After labelling my variable, sex, if I run the same procedure as above I get a table that very clearly
indicates what is being presented:
IV. Sorting and Merging data

The sort syntax is quite simple. If you want to sort cases by sex, in ascending order (although for
this type of variable it probably doesn’t matter whether ascending or descending), I can simply type
into my syntax file the following:
sort cases by sex(A).
The command to sort your data is sort cases. The “A” in brackets at the end of the statement
indicates we want the data sorted in ascending order – you can write “D” if you want descending.

Let’s say now, we have an additional variable – year – that we have in a separate dataset. This
second dataset has the same participants as our HLTH1025_2016 dataset, and they are identified by
the variable ID number. But we have several years’ worth of data from this course, and when we
put it all together, we want to be able to identify which students belong to which cohort (year of
study).

The students we have information about belong to the second year of the study. We need to
merge this variable in to our main dataset.

This is a relatively simple merge procedure, but demonstrates the key steps involved.

Sort and merge data

1. You already have open the main dataset (HLTH1025_2016). You now need to open the
second dataset, called HLTH1025_2016_yr. Do this using the get file syntax:

get file=‘HLTH1025_2016_yr’.

We also want to name this second dataset – it will help us to tell SPSS exactly which dataset we
want to do things with … we’ll see in a minute when we sort. So use the dataset name syntax to
name your new dataset:

dataset name HLTH1025_2016_yr window=front.

2. Now we need to sort cases by IDnumber in both datasets. Here’s where it’s helpful to have
datasets named, since we can ‘activate’ a dataset to run a procedure, then activate another dataset
to run the same or different procedure. It’s easy to activate a particular dataset if you’ve named it.
Use the following syntax:
dataset activate  HLTH1025_2016.
sort cases by IDnumber(A).
dataset activate  HLTH1025_2016_yr.
sort cases by IDnumber(A).

2. Now that you’ve written the syntax to open, name and sort your data, you will need to select
those lines of code and Run Selection (big green arrow button).
3. The final step is to merge the year variable into your main dataset. Here we are going to use
the menus to help us, and then paste the syntax and run it from our syntax file.
a. First, activate your main dataset (HLTH1025_2016):
b. Click Data > Merge Files > Add Variables …

c. Select the dataset HLTH1025_2016_yr to merge with your active dataset, and click
Continue:

d. A new dialogue box will appear, like this:


e. Tick the “Match cases on key variables” check box, and then tick the “Cases are sorted in
order of key variables in both datasets” check box.
f. Click the ID number variable in the “Excluded Variables:” box to the top left, and move it
using the arrow button into the “Key Variables:” box.
This tells SPSS to match cases in both datasets by the IDnumber variable, and we confirm
that the data in both datasets are sorted by IDnumber (the variable we want to match on).
g. The dialogue box should now look something like this:

h. Click Paste to paste the merge syntax to your syntax file.


When you click Paste (or OK) a warning message will come up to say that if your data are
not sorted in ascending order of the Key Variable(s) then the procedure will fail. Luckily we
already sorted the data as required, so we click OK.
i. Your syntax should look something like this:

j. Select the syntax and Run Selection.


After you run a procedure like merging, it’s best to check a few things. First, I always check
my output to see whether there were any error messages. Hopefully you don’t receive any
error messages, and your output record looks something like this:
The next thing I normally check is the Data Editor window in Variable View, to see whether
my new variable is now at the bottom of my list of variables, and check whether it looks like
it merged correctly.
Looks good to me:

Then, I’d just double check using Data View that the values have actually merged for that
variable, and that they look right. I know that all records should have a value of “2” for the
variable year.

Again, it looks good:

V. Modifying your data


Normally when we collect data, we don’t get everything in the format that we actually want to
analyse. For example, in a survey we often collect information about height and weight – but
what we really want to know is their body mass index (BMI). We use the height and weight data
to calculate this, and create a new variable BMI. We might then also want to categorise our new
BMI variable into: underweight, normal weight, overweight & obese.

We’re now going to look at a couple of very commonly used commands to modify, and create
new variables from, existing data.

Key commands to modify your data:


You can access these procedures from the menus, under the Transform menu. There are many
things you can do here, such as create standardised variables (with a mean of 0, SD of 1), create
new variables from ntiles (tertiles, quartiles, quintiles, deciles, etc.), and many more. It’s worth
having a look to see what you can do.
The compute  and recode commands can be used for simple data modification processes,
though, so we’ll write the code ourselves in our syntax file.

1. Compute a new variable


First, let’s compute a new variable, BMI, from weight and height [Note: BMI = weight/height2]:

1.1 Compute a new variable, heightm (height in metres)

compute heightm=height/100.
execute.

1.2 We should check our newly created variable heightm – how should we do this?
1.3 Compute another new variable, BMI:

compute bmi=weight/height**2.
execute.

[Note: exponentiations are denoted as ** in SPSS]

1.4 Check your new variable, BMI – how??

Another Example:

You might also like