Haas MFE SAS Workshop
Lecture 3:
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
SAS SQL
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
PROC SQL - What
What does SQL can do?
Selecting
Ordering/sorting
Subsetting
Restructuring
Creating table/view
Joining/Merging
Transforming variables
Editing
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 3
PROC SQL - Why
The Advantage of using SQL
Combined functionality
Faster for smaller tables
SQL code is more portable for non-SAS
applications
Not require presorting
Not require common variable names to
join on. (need same type , length)
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 4
Selecting Data
PROC SQL;
SELECT DISTINCT rating FROM MFE.MOVIES;
QUIT;
The simplest SQL code, need 3 statements
By default, it will print the resultant query, use
NOPRINT option to suppress this feature.
Begin with PROC SQL, end with QUIT; not RUN;
Need at least one SELECT FROM statement
DISTINCT is an option that removes duplicate rows
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 5
Ordering/Sorting Data
PROC SQL ;
SELECT *
FROM MFE.MOVIES
ORDER BY category;
QUIT;
Remember the placement of the SAS statements has no
effect; so we can put the middle statement into 3 lines
SELECT * means we select all variables from dataset
MFE.MOVIES
Put ORDER BY after FROM.
We sort the data by variable category
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 6
Subsetting Data
- Character searching in WHERE
PROC SQL;
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;
Use comma (,) to separate selected variables
CONTAINS in WHERE statement only for character
variables
Also try WHERE UPCASE(category) LIKE '%ACTION%';
Use wildcard char. Percent sign (%) with LIKE operator.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 7
Subsetting Data
- Phonetic Matching in WHERE
PROC SQL;
SELECT title, category, rating
FROM MFE.MOVIES
WHERE category =* 'Drana';
QUIT;
Always Put WHERE after FROM
Sounds like operator =*
Search movie title for the phonetic variation of
drama, also help possible spelling variations
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 8
Case Logic
- reassigning/recategorize
PROC SQL;
SELECT title, rating,
CASE rating
WHEN 'G' THEN General' ELSE 'Other'
END AS level
FROM MFE.MOVIES;
QUIT;
The order of each statement is important
CASE END AS should in between SELECT and FROM
Note there is , after the variables you want to select
Use WHEN THEN ELSE to redefine variables
Rename variable from rating to level
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 9
Creating New Data
- Create Table
PROC SQL;
CREATE TABLE ACTION AS
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;
CREATE TABLE AS can always be in front of SELECT FROM
statement to build a sas file.
In SELECT, the results of a query are converted to an output object
(printing). Query results can also be stored as data. The CREATE
TABLE statement creates a table with the results of a query. The
CREATE VIEW statement stores the query itself as a view. Either way,
the data identified in the query can beused in later SQL statements or
in other SAS steps.
Produce a new dataset (table) ACTION in work directory, no printing
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 10
Creating New Data
- Create View
PROC SQL;
CREATE VIEW G_MOVIES AS
SELECT title, category, rating
FROM MFE.MOVIES
WHERE rating = 'G'
ORDER BY title;
SELECT * FROM G_MOVIES;
QUIT;
First step-creating a view,no output is produced; then display the
desired output results
Use ; to separate two block of code inside of proc sql
When a table is created, the query is executed and the resulting
data is stored in a file. When a view is created, the query itself is
stored in the file. The data is not accessed at all in the process of
creating a view.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 11
Join Tables (Merge datasets)
- Cartesian Join
PROC SQL;
SELECT *
FROM MFE.CUSTOMERS, MFE.MOVIES;
QUIT;
Terminology: Join (Merge) datasets (tables)
No prior sorting required one advantage over DATA MERGE
Use comma (,) to separate two datasets in FROM
Without WHERE, all possible combinations of rows from each
tables is produced, all columns are included
Turn on the HTML result option for better display:
Tool/Options/Preferences/Results/ check Create HTML/OK
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 12
Join Tables (Merge datasets)
- Inner Join using WHERE
PROC SQL;
SELECT *
FROM MFE.MOVIES, MFE.ACTORS
WHERE MOVIES.title = ACTORS.title;
QUIT;
Use WHERE to specify connecting columns (title)
table1.matchvar = table2.matchvar
Produce rows that have same movie title
The matching variable can be of different name
different datasets
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 13
Join Tables (Merge datasets)
- Inner Join using WHERE (Cont.)
PROC SQL;
SELECT M.title, M.rating, A.actor_leading
FROM MFE.MOVIES M, MFE.ACTORS A
WHERE MOVIES.title = ACTORS.title;
QUIT;
Short-cut for table names
Can be used in SELECT and WHERE statements
Need to be declared in FROM statement
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 14
Join Tables (Merge datasets)
- Join three tables
PROC SQL;
SELECT C.cust_no,
M.title,M.rating, M.category,
A.actor_leading
FROM MFE.CUSTOMERS C,
MFE.MOVIES2 M,
MFE.ACTORS A
WHERE C.cust_no = M.cust_no AND M.title = A.title;
QUIT;
Use AND in WHERE statement to specify two matching
conditions
Produce rows that satisfies all the conditions
Note: We use MOVIES2 in this example
Can join up to 32 tables in one SQL code
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 15
Join Tables (Merge datasets)
- Inner Joins using ON
PROC SQL;
SELECT M.title, rating,actor_leading
FROM MFE.MOVIES M
INNER JOIN MFE.ACTORS A
ON M.TITLE = A.TITLE;
QUIT;
Same result as using where
WHERE is used to select rows from inner joins
ON is used to select rows from outer or inner
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 16
Join Tables (Merge datasets)
- Left Outer Joins
PROC SQL;
SELECT MOVIES.title, actor_leading, rating
FROM MFE.MOVIES
LEFT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Resulting output contains all rows for which the SQL expression,
referenced in the ON clause, matches both tables and all rows from
LEFT table (MOVIES) that did not match any row in the right (ACTORS)
table.
Essentially the rows from LEFT table are preserved and captured exactly
as they stored in the table itself, regardless if a match exists.
Need to specify a table name for the matching variable in SELECT
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 17
Join Tables (Merge datasets)
- Right Outer Joins
PROC SQL;
SELECT ACTORS.title, actor_leading, rating
FROM MFE.MOVIES
RIGHT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Resulting output contains all rows for which the SQL expression,
referenced in the ON clause, matches both tables and all rows
from RIGHT table (ACTORS) that did not match any row in the
right (MOVIES) table.
Essentially the rows from RIGHT table are preserved and captured
exactly as they stored in the table itself, regardless if a match
exists.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 18
Join Tables (Concatenating)
- Outer Union
PROC SQL;
SELECT * FROM MFE.CUSTOMERS
OUTER UNION
SELECT * FROM MFE.MOVIES;
QUIT;
SQL performs OUTER UNION, similar to DATA steps
with a SET statement to Concatenate datasets.
The result contains all the rows produced by the first
table-expression followed by all the row produced by
the second table-expression.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 19
Transforming Data
- Creating new Variables
/*Creating new variables*/
PROC SQL;
SELECT title, length, category, year, rating,
2006-year AS age
FROM MFE.MOVIES;
QUIT;
You can create new variables within SELECT
statement, the name of new variable follows
after AS.
Note the order of the express is reversed
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 20
Transforming Data
- Summarizing Data using SQL functions
PROC SQL;
SELECT *,
COUNT(title) AS notitle,
MAX(year) AS most_recent,
MIN(year) AS earliest,
SUM(length) AS total_length,
NMISS(rating) AS nomissing
FROM MFE.MOVIES
GROUP BY rating;
QUIT;
Simple summarization functions available
All function can be operated in GROUPs
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 21
Editing Data
Insert observations.
PROC SQL NOPRINT;
INSERT INTO MFE.CUSTOMERS
VALUES(1 'Peng');
INSERT INTO MFE.CUSTOMERS
SET Cust_no=2,Name='Sasha';
QUIT;
There are two ways of inserting observations into a
table. Data type should be the same.
VALUES( ) new values are separated by space.
SET column name = newly assigned values, delimited
by commas.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 22
Editing Data
Deleting rows and Dropping columns
/*Deleting rows*/ /*Droping variables*/
PROC SQL; PROC SQL;
DELETE CREATE TABLE NEW
FROM MFE.MOVIES (DROP=rating) AS
WHERE length LE 100; SELECT *
QUIT; FROM MFE.MOVIES;
QUIT;
Deletingcolumns can be done in SELECT or
in DROP on created table
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 23
Editing Data
Update observations
/*Updating Observation*/
PROC SQL NOPRINT;
UPDATE MFE.CUSTOMERS
SET Name='Liu'
WHERE Cust_no=1;
QUIT;
UPDATE SET WHERE
Find the observation and set new value
If more than one observations satisfies the condition,
all are updated with the new data in SET statement
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND
Alexander Vedrashko 24