0% found this document useful (0 votes)
60 views

SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)

This document provides an overview of the essential components needed to write a SAS program, including: 1) Data definition and options to define the data location and environment. 2) The data step to read, modify, subset, and write data by creating temporary or permanent SAS datasets. 3) Procedures to perform analyses like sorting, computing statistics, and regression on the SAS data. Statements like filename, libname, data, set, if, and assignment are used to define, select, and modify data within the data step.

Uploaded by

anis_hasan2008
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)

This document provides an overview of the essential components needed to write a SAS program, including: 1) Data definition and options to define the data location and environment. 2) The data step to read, modify, subset, and write data by creating temporary or permanent SAS datasets. 3) Procedures to perform analyses like sorting, computing statistics, and regression on the SAS data. Statements like filename, libname, data, set, if, and assignment are used to define, select, and modify data within the data step.

Uploaded by

anis_hasan2008
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

SAS: What You Need to Know to Write a SAS Program

This is a brief guide to the essentials you need to know to write a SAS program if working on a
PC. If you are working on CUNIX refer to the handout, SAS on the Cunix Cluster.

A SAS Program consists of a series of SAS statements which are used to define, read in,
manipulate, and analyze data. The typical SAS program is organized into three parts:

1. Data Definition and Options


Define the data location and the environment.
2. Data Step
Read, modify, subset, and write the data
3. Procedure(s)
Perform an action on the data, e.g. sort the data, compute means, run a regression, etc.

An example of a complete program is at the end.

The data step takes most of the time, so plan accordingly. This document will review each of the
three parts in some detail, giving examples of each. It will also provide a few tips in a section on
Additional Information, and will show a complete sample program at the end. The examples
included here will all point to unix directory naming structures. If you are working on another
operating system, only the directory naming conventions will differ.

Since SAS statements are the basis of all three parts of a SAS program, there are a few
generalities which must be mentioned about them:

 All SAS statements end with a ; (semicolon).


 Almost all SAS statements begin with a SAS keyword. e.g. data, set, proc, infile, input,
title, if, options, etc. Exceptions are assignment statements, e.g. age=curryr-birthyr.
 All SAS statements are free format, i.e. they can begin in any column and can run onto
additional lines without any regard for column location. However, it is advised for clarity
that you structure your program for easy reading on your screen.
 Quotes can be single or double but they must match.

1. Data Definitions and Options

The Data Definitions and Options are at the top of most SAS programs. The first thing you need
to do is to tell SAS where the data is. In other words, you must define the location of the data on
your computer or storage device. To define the location of the data in SAS, you need to know:

1. The type of data you are working with, i.e. is it raw data or a SAS data set?
2. If you are reading in raw data, you need to know the length, or lrecl, of the records in the
file.
3. Where the data is. For example:
o survey1.dat
Raw data in your directory on cunix
o survey1.sas7bdat
A SAS data set in your directory on cunix
o C:\Documents and Setting\User\My Documents\survey1.dat
A raw dataset on Windows
o C:\Documents and Setting\User\My Documents\survey1.sas7bdat
A SAS dataset on Windows

For reading in raw data, you need a filename statement to identify its location on your computer.
On cunix, For example,

filename rawin '/p/s/sz/sas/survey1.dat' lrecl=1880;

This statement assigns a ddname (data definition name), rawin, to associate with the raw data file
survey1.dat which is located in the unix directory /p/s/sz/sas/ and has a record length of 1880.
(Hint: If you are unsure of the directory name, give the unix command pwd to find out.)

For reading in a SAS data set, you also need a libname statement. For example:

libname sasdata '/p/s/sz/sas/';

Note: filename statements point to specific files while libname statements simply point to
directories.

SAS Options:

An options statement is used to define an environment for the program. It changes the standard
settings. Some common options include:

 ls
defines the line size for output.
 obs
limits the number of observations processed to allow for program testing
on a small subset rather than reading in the entire data set.
 nocenter
writes all the output in the log and listing files flush left.

A statement with all these options defined would appear as follows:

options ls=78 obs=5 nocenter;

A full list of options is available in the SAS Language guide.

2. The Data Step


The data step portion of a SAS program creates a SAS data set, either permanent or temporary,
from raw data or from another SAS data set. SAS procedures may only be run on SAS data sets.
Therefore all data must be converted to that format in order to run any analyses.

A temporary SAS data set is one that is created in the program and automatically erased when
the program is finished. This is often useful while you are in the testing stages of your analysis.
Creating a permanent SAS data set is useful once you have gotten your data in the form you need
it in and you expect to be working with the data set repeatedly.

This section will review the following facets of the data step:

1. Reading Raw Data


2. Reading a SAS Data Set
3. Selecting and Modifying the Data
4. Saving a Permanent SAS dataset
5. Example of the SAS statements in a complete data step

Reading Raw Data

If you are reading raw data you need:

 A filename statement pointing to the location of the raw data.


 A list of the variables you want.
 Each variable's column position(s) in the file, e.g. 1-3.
 Each variable's type, e.g., numeric or alphanumeric.
 A data statement followed by a name you will assign to the SAS file.
 An infile statement pointing to the raw data defined in the filename statement.
 An input statement followed by the list of variables, their positions, and type.
 Assignment statements subsetting the data or creating or modifying variables (optional).
 A run statement signaling the end of the data step.

Example:

filename rawin '/eds/datasets/userfiles/temp01/data/hh.dat' lrecl=2341;


data one; infile rawin;
        input idnum 1-4
        age                 7-9
        state         $ 17-18
        sex                 55-55;
run;

In the data step above, a temporary SAS data set called one is created. This will be referred to in
the SASLOG as work.one. This data set can be used in subsequent data steps and procedures
within the program. Because it is temporary, this file will be erased automatically when the
program completes its run.
Note: In the input statement, $ is used to indicate alphanumeric variables as in state. Since
alphanumeric, or character, format is a superset of numeric format, all variables may be read-in
in character format. However, only numeric variables may be used in any analysis such as
regressions. So even if all of the values are numbers, if a variable is defined as a character, you
cannot use it for analysis.

Reading a SAS Data Set

If you have a SAS data set, either your own or one you've received, you can either run SAS
procedures directly on it, or you can read it in and make some changes using the data step. See
the Procedures section for an example on running analysis directly on a SAS data set.

If you need to revise the data, e.g. take a subset, create a new variable, etc., you must first read
the data set in.

To do this, you will need:

 A libname statement defining the location of the SAS data set.


 A data statement followed by a name you will assign to the New SAS data set.
 A set statement followed by the name of the SAS data set.

e.g.,

libname sasdata 'C:\Documents and Setting\User\My Documents\';


data one; set sasdata.survey1;

Note: The set statement refers to a file called survey1.sas7bdat in the Windows directory
ddname you use in the libname statement, which acts as a placeholder to be used in the set
statement. It is not part of the file's name. In your directory, the extension will always be
.sas7bdat, regardless of the ddname you use for the libname.

As in the previous example, a temporary SAS data set called one (or work.one) is created. It is
available to be used in subsequent data steps and procedures within the program. This data set
will automatically be erased when the program completes its run since it is temporary.

It is important to realize that any changes you make in the data step will only affect the
temporary data set. The original data set survey1.sas7bdat will remain unchanged.

Selecting and Modifying the Data

If you need to do select or modify data, it must be done within the data step. Some common SAS
statements for this are if statements, assignment statements, and where statements.

 if statements
This command selects whole observations (cases), usually people, and performs an action
on those observations, e.g.:  
if sex = 1;
    Keep only those observations where sex=1

if state = 'JN' then state='NJ';


    Fixes any observations with value for state of JN to be changed to NJ.

if racegrp in(4,5,6,8);
    Keeps observations in any of race groups 4, 5, 6 or 8.

The values for variables in SAS statements must be quoted if the variable is a character
variable and must be unquoted if it is numeric. If you are unsure whether a variable is
character or numeric, you should run proc contents on the data set.

You may also delete specific observations using the if statement:

if racegrp=5 then delete;

Warning! The effect of multiple if statements is cumulative.

 assignment statements
Used for creating new variables or changing the values of an existing one. Examples of
creating a new variable are:

newage=0;
yearetr=byear+65;
income=salary+interest+divdnds;

In changing the values of existing variables, it is best to do this on a new variable created
from an old one, so that you don't lose the original values. e.g.

if 0 le age le 18 then agegrp=1;


else if 18 lt age lt 65 then agegrp=2;
else agegrp=3;

 where statements
Used to subset observations, e.g.

data one; set sasdata.survey1;


keep sex age race income;
        where sex=1;
run;

This will create a work data set which subsets sasdata.survey1 to keep only those
observations for which sex is equal to 1. The keep statement tells SAS to only keep those
variables in the data set one.
A where statement is the only data modification statement that can be used in procedure
statements as well as in the data step., e.g.

proc freq data=sasdata.survey1;


        where sex=1;
run;

does not write out a dataset, but runs the analysis on those records for which sex is equal to
1.

Saving a permanent SAS data set

If you would like a SAS data set to be saved for use beyond the program in which you create it,
you must create a permanent SAS data set. The steps are:

 Decide where you want to save the file and put in a libname statement pointing to that
directory.
 Use a two-part name in the data step, with the ddname from the libname statement being
the first part of the name.

e.g.

filename rawin 'C:\Documents and Setting\User\My Documents\survey1.dat'


        lrecl=1880;
libname sasdata 'C:\Documents and Setting\User\My Documents\';
data sasdata.survey1 infile rawin;
        input idnum 1-4
                  age    7-9

        etc.

This will create a file in the subdirectory C:\Documents and  Setting\User\My Documents\


within the user's home directory. It will be written to disk under the name survey1.sas7bdat.

Full Data Step

A complete step with data modifications would then look something like this:

data one; set sasdata.survey1;


        where sex=1;

yearetr=byear+65;
income=salary+interest+divdnds;
agegrp=age;

if 0 le age le 18 then agegrp=1;


else if 18 lt age lt 65 then agegrp=2;
else agegrp=3;
run;

3. Procedures

SAS procedures are used to perform an action on the data. This includes running any sort of
statistical analyses, including chi-squares, regressions, means, frequencies, and plots, as well as
just sorting or looking at your data (e.g. by running proc print). To use SAS Procedures:

 Decide what statistical procedures are appropriate for your research. You and your
advisor/statistician have to do this. EDS does not provide statistical consulting.
 Very Important!!! First check the data using simple procedures such as proc freq, proc
means, and proc print. You need to run frequencies on all the variables you are going to
use in your analysis so you know what your data looks like. If you are reading in an
existing SAS data set, proc contents will give you information on variable names and
types.
 Look up the particular procedure command in the manual and choose the subcommands
and options you need.

Procedure statements follow the SAS statement form in that they begin with the keyword proc
followed by the procedure name, any subcommands, and the relevant options, e.g.

proc freq data=one;


tables sex age;
run;

You can also run a procedure directly on a permanent SAS data set. So if you don't need to
subset your data or make any variable edits, you don't even need to include the data step in the
program, e.g.

libname sasdata '/p/s/sz/sas/data';


proc freq data=sasdata.survey1; tables sex age state;
run;

Additional Information

Some other nice (but optional) commands

 TITLE
Puts a Title line on your output. (Also TITLE2-TITLE10 for additional lines of title)
 *
Comments out a line in your program - Much recommended.
 /* followed by */
Another way of commenting. Everything in-between is commented out including
semicolons.
Example of a Complete Program

/* Read in poll data, select out all females and save the
file as a permanent SAS data set. */

options ls=78 nocenter;


filename rawin 'C:\Documents and Setting\User\My Documents\survey1.dat'
        lrecl=1880;
libname sasdata 'C:\Documents and Setting\User\My Documents\';

data sasdata.survey1 infile rawin;


        input    idnum 1-4
                      age 7-9
                      state $ 17-18
                      sex 55-55;
        if   sex=1;   
        if   age lt 18 then agegrp=1;
                else if 18 le age lt 65 then agegrp=2;
                else if age ge 65 then agegrp=3;
proc print data=sasdata.survey1 (obs=10);
title 'First 10 records';
proc freq; tables state;
        where agegrp=2;
title 'Frequencies on Females';
title2 'Between the Ages of 18 and 65';
run;

You might also like