SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
This is a brief guide to the essentials you need to know to write a SAS program if working on a
PC. If you are working on CUNIX refer to the handout, SAS on the Cunix Cluster.
A SAS Program consists of a series of SAS statements which are used to define, read in,
manipulate, and analyze data. The typical SAS program is organized into three parts:
The data step takes most of the time, so plan accordingly. This document will review each of the
three parts in some detail, giving examples of each. It will also provide a few tips in a section on
Additional Information, and will show a complete sample program at the end. The examples
included here will all point to unix directory naming structures. If you are working on another
operating system, only the directory naming conventions will differ.
Since SAS statements are the basis of all three parts of a SAS program, there are a few
generalities which must be mentioned about them:
The Data Definitions and Options are at the top of most SAS programs. The first thing you need
to do is to tell SAS where the data is. In other words, you must define the location of the data on
your computer or storage device. To define the location of the data in SAS, you need to know:
1. The type of data you are working with, i.e. is it raw data or a SAS data set?
2. If you are reading in raw data, you need to know the length, or lrecl, of the records in the
file.
3. Where the data is. For example:
o survey1.dat
Raw data in your directory on cunix
o survey1.sas7bdat
A SAS data set in your directory on cunix
o C:\Documents and Setting\User\My Documents\survey1.dat
A raw dataset on Windows
o C:\Documents and Setting\User\My Documents\survey1.sas7bdat
A SAS dataset on Windows
For reading in raw data, you need a filename statement to identify its location on your computer.
On cunix, For example,
This statement assigns a ddname (data definition name), rawin, to associate with the raw data file
survey1.dat which is located in the unix directory /p/s/sz/sas/ and has a record length of 1880.
(Hint: If you are unsure of the directory name, give the unix command pwd to find out.)
For reading in a SAS data set, you also need a libname statement. For example:
Note: filename statements point to specific files while libname statements simply point to
directories.
SAS Options:
An options statement is used to define an environment for the program. It changes the standard
settings. Some common options include:
ls
defines the line size for output.
obs
limits the number of observations processed to allow for program testing
on a small subset rather than reading in the entire data set.
nocenter
writes all the output in the log and listing files flush left.
A temporary SAS data set is one that is created in the program and automatically erased when
the program is finished. This is often useful while you are in the testing stages of your analysis.
Creating a permanent SAS data set is useful once you have gotten your data in the form you need
it in and you expect to be working with the data set repeatedly.
This section will review the following facets of the data step:
Example:
In the data step above, a temporary SAS data set called one is created. This will be referred to in
the SASLOG as work.one. This data set can be used in subsequent data steps and procedures
within the program. Because it is temporary, this file will be erased automatically when the
program completes its run.
Note: In the input statement, $ is used to indicate alphanumeric variables as in state. Since
alphanumeric, or character, format is a superset of numeric format, all variables may be read-in
in character format. However, only numeric variables may be used in any analysis such as
regressions. So even if all of the values are numbers, if a variable is defined as a character, you
cannot use it for analysis.
If you have a SAS data set, either your own or one you've received, you can either run SAS
procedures directly on it, or you can read it in and make some changes using the data step. See
the Procedures section for an example on running analysis directly on a SAS data set.
If you need to revise the data, e.g. take a subset, create a new variable, etc., you must first read
the data set in.
e.g.,
Note: The set statement refers to a file called survey1.sas7bdat in the Windows directory
ddname you use in the libname statement, which acts as a placeholder to be used in the set
statement. It is not part of the file's name. In your directory, the extension will always be
.sas7bdat, regardless of the ddname you use for the libname.
As in the previous example, a temporary SAS data set called one (or work.one) is created. It is
available to be used in subsequent data steps and procedures within the program. This data set
will automatically be erased when the program completes its run since it is temporary.
It is important to realize that any changes you make in the data step will only affect the
temporary data set. The original data set survey1.sas7bdat will remain unchanged.
If you need to do select or modify data, it must be done within the data step. Some common SAS
statements for this are if statements, assignment statements, and where statements.
if statements
This command selects whole observations (cases), usually people, and performs an action
on those observations, e.g.:
if sex = 1;
Keep only those observations where sex=1
if racegrp in(4,5,6,8);
Keeps observations in any of race groups 4, 5, 6 or 8.
The values for variables in SAS statements must be quoted if the variable is a character
variable and must be unquoted if it is numeric. If you are unsure whether a variable is
character or numeric, you should run proc contents on the data set.
assignment statements
Used for creating new variables or changing the values of an existing one. Examples of
creating a new variable are:
newage=0;
yearetr=byear+65;
income=salary+interest+divdnds;
In changing the values of existing variables, it is best to do this on a new variable created
from an old one, so that you don't lose the original values. e.g.
where statements
Used to subset observations, e.g.
This will create a work data set which subsets sasdata.survey1 to keep only those
observations for which sex is equal to 1. The keep statement tells SAS to only keep those
variables in the data set one.
A where statement is the only data modification statement that can be used in procedure
statements as well as in the data step., e.g.
does not write out a dataset, but runs the analysis on those records for which sex is equal to
1.
If you would like a SAS data set to be saved for use beyond the program in which you create it,
you must create a permanent SAS data set. The steps are:
Decide where you want to save the file and put in a libname statement pointing to that
directory.
Use a two-part name in the data step, with the ddname from the libname statement being
the first part of the name.
e.g.
A complete step with data modifications would then look something like this:
yearetr=byear+65;
income=salary+interest+divdnds;
agegrp=age;
3. Procedures
SAS procedures are used to perform an action on the data. This includes running any sort of
statistical analyses, including chi-squares, regressions, means, frequencies, and plots, as well as
just sorting or looking at your data (e.g. by running proc print). To use SAS Procedures:
Decide what statistical procedures are appropriate for your research. You and your
advisor/statistician have to do this. EDS does not provide statistical consulting.
Very Important!!! First check the data using simple procedures such as proc freq, proc
means, and proc print. You need to run frequencies on all the variables you are going to
use in your analysis so you know what your data looks like. If you are reading in an
existing SAS data set, proc contents will give you information on variable names and
types.
Look up the particular procedure command in the manual and choose the subcommands
and options you need.
Procedure statements follow the SAS statement form in that they begin with the keyword proc
followed by the procedure name, any subcommands, and the relevant options, e.g.
You can also run a procedure directly on a permanent SAS data set. So if you don't need to
subset your data or make any variable edits, you don't even need to include the data step in the
program, e.g.
Additional Information
TITLE
Puts a Title line on your output. (Also TITLE2-TITLE10 for additional lines of title)
*
Comments out a line in your program - Much recommended.
/* followed by */
Another way of commenting. Everything in-between is commented out including
semicolons.
Example of a Complete Program
/* Read in poll data, select out all females and save the
file as a permanent SAS data set. */