A Hands-On Introduction To SAS DATA Step Hash Programming Techniques (V2)
A Hands-On Introduction To SAS DATA Step Hash Programming Techniques (V2)
All other company and product names mentioned are used for
identification purposes only and may be trademarks of their
respective owners.
Workshop Objectives
Illustrate useful code examples to help SAS users:
Review basic merge, join and transpose processes
Understand what a Hash object is
Understand how a Hash object works
See basic Hash object syntax
Experience writing basic Hash object processes
Hash match-merge process using DATA step Hash construct
Hash algorithm using PROC SQL
Hash sort process
Tables Used in Exercises
Movies
Actors
1
A Brief
Review of the
Merge / Join
Process
Why Merge or Join?
Data is often stored in separate data sets or tables
Tables are normalized or “split” into smaller data sets/tables
Each data set/table contains “like” or “similar” information
Common linking column exists
Permit data to be combined as if stored in a single
larger data set or table
Provides exciting insights into data relationships
Types of merges or joins:
Conventional matching merge/join
Outer merge/join
The Matching Process –
Explained
Process of combining tables side-by-side (horizontally)
Some or all of the tables contents are brought
together
Using a “key” against a table of key/value pairs
Matches rows in two or more tables
Disk-based operation
DATA Step
Merge and
PROC SQL
Join Process
DATA Step Merge versus Join
Merge Results Features
1. Data must be sorted by by-value.
2. Requires common variable name.
3. Results are not automatically printed.
A AB B
Exercise #1
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer01” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #1 (continued)
Match-merging two data sets with a subsetting-IF
statement is a common approach used by SAS users.
PROC SORT DATA=libref.movies(KEEP=title rating category)
OUT=movies_sorted ;
BY title ;
RUN ;
PROC SORT DATA=libref.actors(KEEP=title actor_leading)
OUT=actors_sorted ;
BY title ;
RUN ;
DATA Match_Merge ;
MERGE movies_sorted (IN=m)
actors_sorted (IN=a) ;
BY title ;
IF m AND a ;
RUN ;
PROC PRINT DATA=Match_Merge NOOBS ; RUN ;
Exercise #1 (continued)
Exercise #2
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer02” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #2 (continued)
Match-joining two tables with a WHERE-clause is a
common approach used by PROC SQL users.
PROC SQL ;
SELECT MOVIES.title, rating, actor_leading
FROM libref.MOVIES,
libref.ACTORS
WHERE MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #2 (continued)
Left Outer Merge / Join
The result of a left outer merge or join produces a result
set of matched rows plus unmatched rows from the
dominant left table as is illustrated by the shaded area (A)
and (AB) in the Venn diagram.
A AB B
Exercise #3
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer03” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #3 (continued)
A left outer merge matches rows from both data sets and
preserves all unmatched rows from the left data set.
DATA LEFT_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF m ;
RUN ;
PROC PRINT DATA=LEFT_OUTER_MERGE NOOBS ;
RUN ;
Exercise #3 (continued)
Exercise #4
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer04” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #4 (continued)
A left outer join matches rows from both tables and
preserves all unmatched rows from the left table.
PROC SQL;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
LEFT JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Exercise #4 (continued)
Right Outer Merge / Join
The result of a right outer merge or join produces a result
set of matched rows plus unmatched rows from the
dominant right table as is illustrated by the shaded area
(AB) and (B) in the Venn diagram.
A AB B
Exercise #5
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer05” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #5 (continued)
A right outer merge matches rows from both data sets
and preserves all unmatched rows from the right data set.
DATA RIGHT_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF a ;
RUN ;
PROC PRINT DATA=RIGHT_OUTER_MERGE NOOBS ;
RUN ;
Exercise #5 (continued)
Exercise #6
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer06” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #6 (continued)
A right outer join matches rows from both tables and
preserves all unmatched rows from the right table.
PROC SQL ;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
RIGHT JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #6 (continued)
Full Outer Merge / Join
The result of a full outer merge or join produces a result
set of matched rows plus unmatched rows from the left
and right tables as is illustrated by the shaded area (A),
(AB) and (B) in the Venn diagram.
A AB B
Exercise #7
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer07” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #7 (continued)
A full outer merge matches rows from both data sets and
preserves all unmatched rows from both data sets.
DATA FULL_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF m or a ;
RUN ;
PROC PRINT DATA=FULL_OUTER_MERGE NOOBS ;
RUN ;
Exercise #7 (continued)
Exercise #8
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer08” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #8 (continued)
A full outer join matches rows from both tables and
preserves all unmatched rows from both tables.
PROC SQL ;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
FULL JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #8 (continued)
3
DATA Step
Hash Object
Programming
Hash Objects Defined
A Hash object is a data structure
Contains an array of items that maps “keys” to their
associated values
Implemented as a DATA step construct
Not available in PROCs
At the end of the process the hash object is removed
How Does a Hash Object
Work?
The contents of a table are read into memory once
SAS can then repeatedly access memory, as necessary
Memory-based operations (nanoseconds) are typically
faster than disk-based (milliseconds) operations
Users experience faster search, table lookup and
merge (or join) operations
MOVIES Table ACTORS Table
TITLE TITLE ACTOR_LEADING ACTOR_SUPPORTING
A AB B
Exercise #9
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer09” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step hash code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #9 (continued)
Match-merging with a hash DATA step construct uses the
DefineKey, DefineData, DefineDone and Find methods.
data hash_match_merge;
if 0 then set actors; /* load variable properties into hash table */
if _n_ = 1 then do;
declare Hash HTitle (dataset:’actors’); /* declare HTitle for hash */
HTitle.DefineKey (‘Title’); /* identify variables to use as key */
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’); /* identify columns of data */
HTitle.DefineDone (); /* complete hash table definition */
end;
set libref.movies;
if HTitle.find(key:title) = 0 /* lookup TITLE in MOVIES table with hash
*/
then output;
run;
Exercise #9 Dissected
(continued)
data hash_match_merge;
HTitle.DefineKey (‘Title’);
1
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’);
HTitle.DefineDone ();
end;
set libref.movies;
run;
Exercise #9 Dissected
(continued)
data hash_match_merge;
2
if 0 then set libref.actors;
HTitle.DefineKey (‘Title’);
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’);
HTitle.DefineDone ();
end;
set libref.movies;
run;
Exercise #9 Dissected
(continued)
data hash_match_merge;
HTitle.DefineKey (‘Title’);
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’);
HTitle.DefineDone ();
end;
set libref.movies;
run;
Exercise #9 Dissected
(continued)
data hash_match_merge;
HTitle.DefineKey (‘Title’);
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’);
HTitle.DefineDone ();
end;
set libref.movies;
4
if HTitle.find(key:title) = 0 then output;
run;
Exercise #9 in Review
(continued)
data hash_match_merge;
HTitle.DefineKey (‘Title’);
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’);
HTitle.DefineDone ();
end;
set libref.movies;
run;
Exercise #9 Results
Movies Actors
Hash_match_merge
Exercise #10
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer10” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #10 (continued)
Match-joining with a PROC SQL join hash algorithm can
be displayed by using the _METHOD option.
PROC SQL _METHOD;
SELECT M.Title, Length, Category, Year, Studio,
Rating,
Actor_Leading, Actor_Supporting
FROM libref.Movies M,
libref.Actors A
WHERE M.Title = A.Title;
QUIT;
SAS Log Results
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( SASUSER.MOVIES (alias = M ) )
sqxsrc( SASUSER.ACTORS (alias = A ))
Exercise #10 Results
Movies Actors
Exercise #11
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer11” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step hash code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #11 (continued)
The DATA step hash construct uses the DefineKey,
DefineData, DefineDone, Add and Output methods to
rearrange data in an alternate physical order.
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
1
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
2
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
3 if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
4 set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
5 HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 in Review
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Results
Conclusion
Review basic merge and join processes
Understand what a Hash object is
Understand how a Hash object works
Illustrate basic Hash object syntax
Experience coding basic Hash object processes
Hash match-merge process using DATA step Hash construct
Hash algorithm with PROC SQL
Hash sort process
An SQL Book
with “under the
hood” details,
explanations
and
lots of
examples
Available from
Coming SAS 2004!
Winter Press!
Become a Google
Search Pro!
Filled with
Tips, Tricks and
Shortcuts
for Better Searches
and Better Results
Available on www.Amazon.com!
Questions
? A Hands-on Workshop by
Kirk Paul Lafler
[email protected]
@sasNerd