Data Management: I. Importing Data From Excel To SPSS
Data Management: I. Importing Data From Excel To SPSS
Steps:
1. Save all your SPSS files, and Close SPSS. Open the Excel file you downloaded earlier
– HLTH1025_2016.xlsx
2. Have a look at how the Data worksheet is set up. For instance, how many records
are there? How many variables? What format are the data for each variable? If the
data are numeric, make sure the format is set to “Number”.
3. When you’re happy with the Excel file, close it.
4. Open your SPSS syntax file.
5. Click File > Open > Data …
6. Click the “Files of type:” drop down box and select Excel (.xls, .xlsx, .xlsm)
7. Navigate to and select the HLTH1025_2016.xlsx file that you saved to your working
directory.
8. Click “Paste”, to paste the syntax to your syntax file
9. A new dialogue box will appear – you should tick the box “Read variable names from
first row of data”, as the H&S excel file has variable names in the first row.
10. In that same dialogue box, select the “Data” worksheet to import (that’s where the
data are stored).
11. Click OK. This will paste the syntax to import the data – it hasn’t imported the data
yet!
Later on, when you’re running some analyses and you want to look at, say, how many of your
respondents are male or female, you’ll run a procedure and the results will be displayed in the
Output Viewer window. If we left our data as they are, we’d get the results we want, but we
probably wouldn’t know how to read them. E.g., if I ran that procedure, here’s the output I’d get:
But which ones are male and which ones are female?! Because I haven’t labelled my data, it can
be difficult to remember what all the values for each variable refer to.
So here’s a tip: always label your variables and their values (particularly for categorical variables).
The sort syntax is quite simple. If you want to sort cases by sex, in ascending order (although for
this type of variable it probably doesn’t matter whether ascending or descending), I can simply type
into my syntax file the following:
sort cases by sex(A).
The command to sort your data is sort cases. The “A” in brackets at the end of the statement
indicates we want the data sorted in ascending order – you can write “D” if you want descending.
Let’s say now, we have an additional variable – year – that we have in a separate dataset. This
second dataset has the same participants as our HLTH1025_2016 dataset, and they are identified by
the variable ID number. But we have several years’ worth of data from this course, and when we
put it all together, we want to be able to identify which students belong to which cohort (year of
study).
The students we have information about belong to the second year of the study. We need to
merge this variable in to our main dataset.
This is a relatively simple merge procedure, but demonstrates the key steps involved.
1. You already have open the main dataset (HLTH1025_2016). You now need to open the
second dataset, called HLTH1025_2016_yr. Do this using the get file syntax:
get file=‘HLTH1025_2016_yr’.
We also want to name this second dataset – it will help us to tell SPSS exactly which dataset we
want to do things with … we’ll see in a minute when we sort. So use the dataset name syntax to
name your new dataset:
2. Now we need to sort cases by IDnumber in both datasets. Here’s where it’s helpful to have
datasets named, since we can ‘activate’ a dataset to run a procedure, then activate another dataset
to run the same or different procedure. It’s easy to activate a particular dataset if you’ve named it.
Use the following syntax:
dataset activate HLTH1025_2016.
sort cases by IDnumber(A).
dataset activate HLTH1025_2016_yr.
sort cases by IDnumber(A).
2. Now that you’ve written the syntax to open, name and sort your data, you will need to select
those lines of code and Run Selection (big green arrow button).
3. The final step is to merge the year variable into your main dataset. Here we are going to use
the menus to help us, and then paste the syntax and run it from our syntax file.
a. First, activate your main dataset (HLTH1025_2016):
b. Click Data > Merge Files > Add Variables …
c. Select the dataset HLTH1025_2016_yr to merge with your active dataset, and click
Continue:
Then, I’d just double check using Data View that the values have actually merged for that
variable, and that they look right. I know that all records should have a value of “2” for the
variable year.
We’re now going to look at a couple of very commonly used commands to modify, and create
new variables from, existing data.
compute heightm=height/100.
execute.
1.2 We should check our newly created variable heightm – how should we do this?
1.3 Compute another new variable, BMI:
compute bmi=weight/height**2.
execute.
Another Example: