0% found this document useful (0 votes)
18 views

A Hands-On Introduction To SAS DATA Step Hash Programming Techniques (V2)

Uploaded by

Elizabeth Ortega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

A Hands-On Introduction To SAS DATA Step Hash Programming Techniques (V2)

Uploaded by

Elizabeth Ortega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Author Biography

Kirk Paul Lafler is an entrepreneur, consultant and founder of


Software Intelligence Corporation, and has been programming
in SAS since 1979. As the author of six books including Google
Search Complete! (Odyssey Press. 2014) and PROC SQL: Beyond
the Basics Using SAS, Second Edition (SAS Institute. 2013); Kirk
has written hundreds of papers and articles; been an Invited
speaker at hundreds of SAS International, regional, special-
interest, local, and in-house user group conferences; and is the
recipient of 25 “Best” contributed paper, hands-on workshop
(HOW), and poster awards.
A Hands-on Introduction to
SAS® DATA Step Hash
Programming Techniques
A Hands-on Workshop by
Kirk Paul Lafler
Copyright © 2010 – 2017 by
Kirk Paul Lafler and Software Intelligence Corporation.
All rights reserved.

SAS is the registered trademark of SAS Institute Inc., Cary, NC,


USA.

All other company and product names mentioned are used for
identification purposes only and may be trademarks of their
respective owners.
Workshop Objectives
Illustrate useful code examples to help SAS users:
 Review basic merge, join and transpose processes
 Understand what a Hash object is
 Understand how a Hash object works
 See basic Hash object syntax
 Experience writing basic Hash object processes
 Hash match-merge process using DATA step Hash construct
 Hash algorithm using PROC SQL
 Hash sort process
Tables Used in Exercises

Movies

Actors
1

A Brief
Review of the
Merge / Join
Process
Why Merge or Join?
 Data is often stored in separate data sets or tables
 Tables are normalized or “split” into smaller data sets/tables
 Each data set/table contains “like” or “similar” information
 Common linking column exists
 Permit data to be combined as if stored in a single
larger data set or table
 Provides exciting insights into data relationships
 Types of merges or joins:
 Conventional matching merge/join
 Outer merge/join
The Matching Process –
Explained
 Process of combining tables side-by-side (horizontally)
 Some or all of the tables contents are brought
together
 Using a “key” against a table of key/value pairs
 Matches rows in two or more tables
 Disk-based operation

Visually, it looks something like this:


Table One Table Two Table Three ...
2

DATA Step
Merge and
PROC SQL
Join Process
DATA Step Merge versus Join
Merge Results Features
1. Data must be sorted by by-value.
2. Requires common variable name.
3. Results are not automatically printed.

Join Results Features


1. Data does not need to be sorted by by-value.
2. Does not require a duplicate matching column and,
unlike a merge, the duplicate matching columns is
not automatically overlaid.
3. Results are automatically printed unless NOPRINT
option is specified.
What Happens During a
Merge / Join
When merging / joining two data sets / tables:
 An intermediate Cartesian product is constructed
 Rows are selected from the Cartesian product that satisfy the
subsetting-IF or WHERE-clause expression

When merging / joining more than two data sets / tables:


 The Optimizer determines the order of processing to reduce the
size of the intermediate Cartesian product
 The Optimizer reconstructs the merge or join into two or more
two-way merges or joins
 Removal of unwanted rows and columns from the intermediate
data sets / tables
Conventional Match-Merge /
Join
The result of a match merge or join produces a result set
of matched rows from all tables as is illustrated by the
shaded area (AB) in the Venn diagram.

A AB B
Exercise #1
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer01” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #1 (continued)
Match-merging two data sets with a subsetting-IF
statement is a common approach used by SAS users.
PROC SORT DATA=libref.movies(KEEP=title rating category)
OUT=movies_sorted ;
BY title ;
RUN ;
PROC SORT DATA=libref.actors(KEEP=title actor_leading)
OUT=actors_sorted ;
BY title ;
RUN ;
DATA Match_Merge ;
MERGE movies_sorted (IN=m)
actors_sorted (IN=a) ;
BY title ;
IF m AND a ;
RUN ;
PROC PRINT DATA=Match_Merge NOOBS ; RUN ;
Exercise #1 (continued)
Exercise #2
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer02” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #2 (continued)
Match-joining two tables with a WHERE-clause is a
common approach used by PROC SQL users.
PROC SQL ;
SELECT MOVIES.title, rating, actor_leading
FROM libref.MOVIES,
libref.ACTORS
WHERE MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #2 (continued)
Left Outer Merge / Join
The result of a left outer merge or join produces a result
set of matched rows plus unmatched rows from the
dominant left table as is illustrated by the shaded area (A)
and (AB) in the Venn diagram.

A AB B
Exercise #3
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer03” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #3 (continued)
A left outer merge matches rows from both data sets and
preserves all unmatched rows from the left data set.
DATA LEFT_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF m ;
RUN ;
PROC PRINT DATA=LEFT_OUTER_MERGE NOOBS ;
RUN ;
Exercise #3 (continued)
Exercise #4
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer04” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #4 (continued)
A left outer join matches rows from both tables and
preserves all unmatched rows from the left table.
PROC SQL;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
LEFT JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Exercise #4 (continued)
Right Outer Merge / Join
The result of a right outer merge or join produces a result
set of matched rows plus unmatched rows from the
dominant right table as is illustrated by the shaded area
(AB) and (B) in the Venn diagram.

A AB B
Exercise #5
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer05” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #5 (continued)
A right outer merge matches rows from both data sets
and preserves all unmatched rows from the right data set.
DATA RIGHT_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF a ;
RUN ;
PROC PRINT DATA=RIGHT_OUTER_MERGE NOOBS ;
RUN ;
Exercise #5 (continued)
Exercise #6
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer06” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #6 (continued)
A right outer join matches rows from both tables and
preserves all unmatched rows from the right table.
PROC SQL ;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
RIGHT JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #6 (continued)
Full Outer Merge / Join
The result of a full outer merge or join produces a result
set of matched rows plus unmatched rows from the left
and right tables as is illustrated by the shaded area (A),
(AB) and (B) in the Venn diagram.

A AB B
Exercise #7
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer07” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step merge code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #7 (continued)
A full outer merge matches rows from both data sets and
preserves all unmatched rows from both data sets.
DATA FULL_OUTER_MERGE ;
MERGE Movies_sorted (IN=m)
Actors_sorted (IN=a) ;
BY Title ;
IF m or a ;
RUN ;
PROC PRINT DATA=FULL_OUTER_MERGE NOOBS ;
RUN ;
Exercise #7 (continued)
Exercise #8
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer08” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #8 (continued)
A full outer join matches rows from both tables and
preserves all unmatched rows from both tables.
PROC SQL ;
SELECT MOVIES.title, category, rating, actor_leading
FROM libref.MOVIES
FULL JOIN
libref.ACTORS
ON MOVIES.title = ACTORS.title ;
QUIT ;
Exercise #8 (continued)
3

DATA Step
Hash Object
Programming
Hash Objects Defined
 A Hash object is a data structure
 Contains an array of items that maps “keys” to their
associated values
 Implemented as a DATA step construct
 Not available in PROCs
 At the end of the process the hash object is removed
How Does a Hash Object
Work?
 The contents of a table are read into memory once
 SAS can then repeatedly access memory, as necessary
 Memory-based operations (nanoseconds) are typically
faster than disk-based (milliseconds) operations
 Users experience faster search, table lookup and
merge (or join) operations
MOVIES Table ACTORS Table
TITLE TITLE ACTOR_LEADING ACTOR_SUPPORTING

Brave Heart Brave Heart Mel Gibson Sophie Marceau

... Christmas Vacation Chevy Chase Beverly D’Angelo


Christmas Vacation Coming to America Eddie Murphy Arsenio Hall

Coming to America ... ... ...

... ... ... ...


Hash Object Syntax
 The hash object is used by calling methods
 26 known methods exist
 Basic syntax:
 Name of the hash table (user-assigned)
 Dot
 Desired method by name
 Specification to pass to the method
 Examples:
HashKey.DefineKey( );
HashKey.Find( );
Hash Object Methods
Method Description
ADD Adds data associated with key to hash object.
CHECK Checks whether key is stored in hash object.
CLEAR Removes all items from a hash object without deleting hash object.
DEFINEDATA Defines data to be stored in hash object.
DEFINEDONE Specifies that all key and data definitions are complete.
DEFINEKEY Defines key variables to the hash object.
DELETE Deletes the hash or hash iterator object.
EQUAL Determines whether two hash objects are equal.
FIND Determines whether the key is stored in the hash object.
FIND_NEXT The current list item in the key’s multiple item list is set to the next item.
FIND_PREV The current list item in the key’s multiple item list is set to the previous item.
Hash Object Methods
(continued)
Method Description
FIRST Returns the first value in the hash object.
HAS_NEXT Determines whether another item is available in the current key’s list.
HAS_PREV Determines whether a previous item is available in the current key’s list.
LAST Returns the last value in the hash object.
NEXT Returns the next value in the hash object.
OUTPUT Creates one or more data sets containing the data in the hash object.
PREV Returns the previous value in the hash object.
REF Combines the FIND and ADD methods into a single method call.
REMOVE Removes the data associated with a key from the hash object.
REMOVEDUP Removes the data associated with a key’s current data item from the hash object.
REPLACE Replaces the data associated with a key with new data.
Hash Object Methods
(continued)
Method Description
REPLACEDUP Replaces data associated with a key’s current data item with new data.
SETCUR Specifies a starting key item for iteration.
SUM Retrieves a summary value for a given key from the hash table and stores the value
to a DATA step variable.
SUMDUP Retrieves a summary value for the key’s current data item and stores the value to a
DATA step variable.
Hash Match-Merge / Join
The result of a hash match merge produces a result set of
matched rows from all tables as is illustrated by the
shaded area (AB) in the Venn diagram.

A AB B
Exercise #9
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer09” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step hash code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #9 (continued)
Match-merging with a hash DATA step construct uses the
DefineKey, DefineData, DefineDone and Find methods.
data hash_match_merge;
if 0 then set actors; /* load variable properties into hash table */
if _n_ = 1 then do;
declare Hash HTitle (dataset:’actors’); /* declare HTitle for hash */
HTitle.DefineKey (‘Title’); /* identify variables to use as key */
HTitle.DefineData (‘Actor_Leading’,
‘Actor_Supporting’); /* identify columns of data */
HTitle.DefineDone (); /* complete hash table definition */
end;
set libref.movies;
if HTitle.find(key:title) = 0 /* lookup TITLE in MOVIES table with hash
*/
then output;
run;
Exercise #9 Dissected
(continued)
data hash_match_merge;

if 0 then set libref.actors;

if _n_ = 1 then do;

declare Hash HTitle (dataset:’libref.actors’);

HTitle.DefineKey (‘Title’);
1
HTitle.DefineData (‘Actor_Leading’,

‘Actor_Supporting’);

HTitle.DefineDone ();

end;

set libref.movies;

if HTitle.find(key:title) = 0 then output;

run;
Exercise #9 Dissected
(continued)
data hash_match_merge;
2
if 0 then set libref.actors;

if _n_ = 1 then do;

declare Hash HTitle (dataset:’libref.actors’);

HTitle.DefineKey (‘Title’);

HTitle.DefineData (‘Actor_Leading’,

‘Actor_Supporting’);

HTitle.DefineDone ();

end;

set libref.movies;

if HTitle.find(key:title) = 0 then output;

run;
Exercise #9 Dissected
(continued)
data hash_match_merge;

if 0 then set libref.actors;


3 if _n_ = 1 then do;

declare Hash HTitle (dataset:’libref.actors’);

HTitle.DefineKey (‘Title’);

HTitle.DefineData (‘Actor_Leading’,

‘Actor_Supporting’);

HTitle.DefineDone ();

end;

set libref.movies;

if HTitle.find(key:title) = 0 then output;

run;
Exercise #9 Dissected
(continued)
data hash_match_merge;

if 0 then set libref.actors;

if _n_ = 1 then do;

declare Hash HTitle (dataset:’libref.actors’);

HTitle.DefineKey (‘Title’);

HTitle.DefineData (‘Actor_Leading’,

‘Actor_Supporting’);

HTitle.DefineDone ();

end;

set libref.movies;
4
if HTitle.find(key:title) = 0 then output;

run;
Exercise #9 in Review
(continued)
data hash_match_merge;

if 0 then set libref.actors;

if _n_ = 1 then do;

declare Hash HTitle (dataset:’libref.actors’);

HTitle.DefineKey (‘Title’);

HTitle.DefineData (‘Actor_Leading’,

‘Actor_Supporting’);

HTitle.DefineDone ();

end;

set libref.movies;

if HTitle.find(key:title) = 0 then output;

run;
Exercise #9 Results
Movies Actors

Hash_match_merge
Exercise #10
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer10” from the list.
3. The code should display in the SAS Editor.
4. Run the PROC SQL join code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #10 (continued)
Match-joining with a PROC SQL join hash algorithm can
be displayed by using the _METHOD option.
PROC SQL _METHOD;
SELECT M.Title, Length, Category, Year, Studio,
Rating,
Actor_Leading, Actor_Supporting
FROM libref.Movies M,
libref.Actors A
WHERE M.Title = A.Title;
QUIT;
SAS Log Results
NOTE: SQL execution methods chosen are:
sqxslct
sqxjhsh
sqxsrc( SASUSER.MOVIES (alias = M ) )
sqxsrc( SASUSER.ACTORS (alias = A ))
Exercise #10 Results
Movies Actors
Exercise #11
1. Click the Open folder icon located at the top of the SAS
Display Manager.
2. Select the SAS program named “Exer11” from the list.
3. The code should display in the SAS Editor.
4. Run the DATA step hash code by clicking the Submit icon.
5. Let’s discuss the exercise and corresponding output.
Exercise #11 (continued)
The DATA step hash construct uses the DefineKey,
DefineData, DefineDone, Add and Output methods to
rearrange data in an alternate physical order.
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
1
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
2
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
3 if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
4 set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Dissected
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
5 HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 in Review
(continued)
data _null_;
if 0 then set libref.Movies; /* load metadata into hash table */
if _n_ = 1 then do;
declare Hash HashSort (ordered:’a'); /* declare sort order for hash */
HashSort.DefineKey (‘Length’,‘Title'); /* identify sort key(s) */
HashSort.DefineData (‘Title‘,
‘Length’,
‘Category’,
‘Rating’); /* identify columns of data */
HashSort.DefineDone (); /* complete hash table definition */
end;
set libref.Movies end=eof;
HashSort.add (); /* add data with key to hash object */
if eof then
HashSort.output(dataset:’Movies_hash_sorted’); /* write sorted data */
run;
Exercise #11 Results
Conclusion
 Review basic merge and join processes
 Understand what a Hash object is
 Understand how a Hash object works
 Illustrate basic Hash object syntax
 Experience coding basic Hash object processes
 Hash match-merge process using DATA step Hash construct
 Hash algorithm with PROC SQL
 Hash sort process
An SQL Book
with “under the
hood” details,
explanations
and
lots of
examples

Available from
Coming SAS 2004!
Winter Press!
Become a Google
Search Pro!
Filled with
Tips, Tricks and
Shortcuts
for Better Searches
and Better Results

Available on www.Amazon.com!
Questions
? A Hands-on Workshop by
Kirk Paul Lafler
[email protected]
@sasNerd

Thank you for attending!

You might also like