chp04 05 More SQL
chp04 05 More SQL
Intermediate SQL
Contents
Chapter 4. Intermediate SQL 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Grouping and summarising information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A very common error with GROUP BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The HAVING clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Writing queries on more than one table - Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Avoiding ambiguously named columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Outer JOINs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Using table aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
SELF JOINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Additional note on joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Nested queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
The depth and breadth of nested queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The UNION operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The INTERSECT operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
The MINUS operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
ANY or ALL operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Correlated sub-queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Additional features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Objectives
At the end of this chapter you should be able to:
Introduction
This chapter examines queries on more than one table, summarising data, and combining the results of multiple
queries in various ways.
Context
This chapter forms the bridge between the chapter in which the SQL language was introduced, and the coverage of
the data definition language (DDL) and data control language (DCL) provided in the next chapter called Advanced
SQL.
1
Grouping and summarising information
Information retrieved from an SQL query can be placed into separate groups or categories with the GROUP BY
clause. GROUP BY is optional. If it appears in the query, it comes after any WHERE clause and before any ORDER
BY.
To answer this question, it is necessary to place employees in the EMP table into separate categories, one for
each department. This can be done as follows:
The query is run in two steps. The first step groups all employees by DEPTNO. The second step counts the number
of employees in each group. Note that GROUP BY, like ORDER BY, can include more than one data item, so for
example if we specify:
the results will be returned initially categorised within departments, and then within that, categorised into
employees who do the same job.
We have specified between the parentheses of the COUNT function above that we are counting EMPNOs. Suppose we
wanted to count the number of JOBs instead.
• If we specified this as COUNT(JOBNO) we would get the number of non-null values in the JOB column for each
department.
• If we specified this as COUNT(*) we would get the number of rows in the table for each department
(irrespective of JOBNO content).
• If we specified this as COUNT(DISTINCT JOBNO) we would get the number of distinct non-null values in the
JOB column for each department.
All column names in the select-list must either appear in the GROUP BY clause or be inside the brackets of an
aggregate function. Many people when first using the GROUP BY clause, fall into the trap of asking for a
data item at the individual employee level, such as SAL. It is fine to display average salaries, as these are
averaged across the group, and are therefore meaningful at the group level. However, if we had asked to display
individual salaries, we would have had the error message “not a group by expression”, referring to the fact that
SAL is an individual attribute of an employee, and not an attribute of the group. The only individual level
items you can include in the select-list, are items which are shared by a every row in the group; and these are
exactly the items you are grouping by. Hence it should be clear why we summarise this as an easy way to check
yourself: All column names in the select-list must either appear in the GROUP BY clause or be inside the brackets
of an aggregate function.
2
The HAVING clause
The HAVING clause filters out specific groups, exactly in the same way that the WHERE clause filters out
individual rows. The HAVING clause always follows a GROUP BY clause, and is used to test properties of the
grouped information. For example, if we are grouping information at the department level, we might use a HAVING
clause to exclude departments with less than a certain number of employees. This could be coded as below,
obtaining the result in figure 2:
Figure 2: Result of SELECT DEPTNO,COUNT(EMPNO) FROM EMP GROUP BY DEPTNO HAVING COUNT(EMPNO) > 4
Department number 10 has four employees in our sample data set, and has been excluded from the results because
of the HAVING clause. The properties that are tested in a HAVING clause must be properties of groups, not of
individual rows. So, like those in the select-list, one must either test against values of a grouped-by item,
such as,
or test against some aggregate property of the group, e.g. the number of members in the group (as in the example),
or tests on average salary of a group, etc. The HAVING clause, when required, always follows immediately after
the GROUP BY clause to which it refers. It can contain compound conditions, linked by AND or OR (as above), and
parentheses may be used to nest conditions.
The order in which the tables are listed in the table-list does not affect the result. Listing both the EMP
and the DEPT tables after the FROM keyword is not sufficient. We want to relate the display of a department
name with the display of numbers and names of employees who work in that department. So we require the query to
relate employee rows in the EMP table with their corresponding department rows in the DEPT table. This is done
is using a relational join:
What this is expressing is that we wish rows in the EMP table to be related to rows in the DEPT table, by
matching rows from the two tables whose department numbers (DEPTNOs) are equal. We are using the DEPTNO column
from each employee row in the EMP table, to link that employee row with the department row for that employee
in the DEPT table.
3
Figure 3: Result of SELECT EMPNO,ENAME,DNAME FROM EMP,DEPT WHERE EMP.DEPTNO = DEPT.DEPTNO
4
Suppose we want to list the names and jobs of employees, together with the locations in which they work. LOC
is stored in the DEPT table, and so requires a join, in order to display employee information along with the
locations of the departments in which they work, as shown in figure4. The SQL standard provides the following
alternative ways to specify this join:
DEPTNO has been used as the data item to link rows in the EMP and DEPT tables. In the DEPT table, DEPTNO acts
as the primary key, while in the EMP table, DEPTNO acts as a foreign key, linking each EMP row with the row
in the DEPT table for the department to which the employee row belongs. If we wish to refer to DEPTNO in the
select-list, we would need to specify which instance of DEPTNO we are referring to: the one in the EMP table,
or the one in the DEPT table. We simply prefix DEPTNO with the table name, placing a full stop (.) character
between the table name and column name: for example, EMP.DEPTNO, or DEPT.DEPTNO. In this way, the system can
identify which instance of DEPTNO is being referenced.
5
Outer JOINs
Supposing, for example, we wish to list all departments with the employee numbers and names of their employees,
plus any departments that do not contain employees.
The basic form of the JOIN only extracts matching instances of rows from the joined tables. To also obtain row
instances that do not match a row in the other table, we use an OUTER JOIN. There are 3 types of OUTER JOIN:
LEFT, RIGHT, and FULL OUTER JOINS. To demonstrate OUTER JOINs, we use the following tables.
Person table
The person table holds the information of people (see figure 5). The ID is the primary key. A person can own a
car or not.
Car table
The car table holds information on cars and REG is its primary key (see figure 6). A car may or may not have
an owner.
Example: List all persons together with their car’s registration and model, including any person without any
car:
Example: List all cars together with their owner’s identification and name, including cars not owned by anyone.
6
Figure 8: Result of the RIGHT JOIN example
If you wish to show both rows of those persons that don’t own any car, and rows of cars that don’t have any
owner (see figure 9), then you need to use the FULL OUTER JOIN:
Aliases, or alternative names, can be specified in the table-list. For example, the above FULL OUTER JOIN query
can be written using aliases:
SELF JOINS
Sometimes it is necessary to JOIN a table to itself in order to compare rows from the same table. An example
of this might be if we wish to compare values of salary on an individual basis between employees.
Example: find all employees who are paid more than “JONES”
What is required here is to compare the salaries of employees with the salary paid to JONES. One way of doing
this involves JOINing the EMP table with itself, so that we can carry out salary comparisons in the WHERE clause
of an SQL query. However, if we wish to JOIN a table to itself, we need a mechanism for referring to the
different rows being compared. In order to specify the query to find out which employees are paid more than
JONES, we shall use two table aliases, E and J for the EMP table. We shall use E to denote employees we are
comparing with JONES, and J to denote JONES’ row. This leads to the following query:
7
The result can be see in figure 10. We ensure that the alias J is associated with the employee JONES through
the second condition in the WHERE clause: AND J.ENAME = ‘JONES’.
We have seen that JOINing two tables together involves one JOIN condition. In general, JOINing N tables together
requires the specification of N-1 JOIN conditions. A lot of work has gone into the development of efficient
algorithms for the execution of JOINs. In spite of this, JOINs are still an expensive operation.
Nested queries
In SQL we can include one query within another. This is known as nesting queries, or writing sub-queries.
Example: Find all employees who are paid more than JONES. This can be seen as a two-stage task:
2. Find all those employees who are paid more than the salary found in step 1.
SELECT EMPNO,ENAME,SAL FROM EMP WHERE SAL > (SELECT SAL FROM EMP WHERE ENAME = 'JONES');
This gives the results in figure 11.
These are indeed the employees who earn more than JONES (who earns 2975). Whenever a query appears to fall
into a succession of natural steps such as the one above, it can be easier to code it as a nested query. Just
bear in mind that some nested queries return one value, others return a set of values. Suppose we want to
find employees who earn the same salary as employee 7566 (Jones). As the inner query returns just one value,
employee 7566’s salary, we use the equal sign, e.g.
8
The depth and breadth of nested queries
There is no practical limit to the number of queries that can be nested. And many queries can be included at
the same level of nesting, using AND / OR :
9
Figure 12: Result of the example using ANY
10
The inner query retrieves the salaries for Department 30. The outer query, using the All keyword, ensures that
the salaries retrieved are higher than all of those in department 30. Note that the NOT operator can be used
with IN, ANY or ALL.
Correlated sub-queries
A correlated sub-query is a nested sub-query that is executed once for each ‘candidate row’ considered by the
main query, and thus uses a value from a column in the outer query. This causes the correlated sub-query to
be processed in a different way from the ordinary nested sub-query. With a normal nested sub-query, the inner
select runs first and it executes once, returning values to be used by the main query. A correlated sub-query,
on the other hand, executes once for each candidate row to be considered by the outer query. The inner query
is driven by the outer query. Steps that the DBMS takes when running a correlated sub-query are:
2. The inner query is executed, using the value from the candidate row fetched by the outer query.
3. Whether the candidate row is retained depends on the values returned by the execution of the inner query.
4. Steps 1 to 3 are repeated until no candidate row remains in the outer query.
Example
We can use a correlated sub-query to find employees who earn a salary greater than the average salary for their
department:
This is a correlated sub-query since we have used a column from the outer select, namely E.DEPTNO, in the WHERE
clause of the inner select.
Additional features
Note that newer and more advanced features that can be used with SELECT, such as CASE and IF operators, are not
examinable in this module. Having mastered this chapter and knowing that they exist however, you are well placed
to use them, should you in practice later encounter a situation where they are the only possible solution.
Review Questions
2. Give a SQL statement to list any departments that do not contain any employees.
11
3. Give a SQL statement to fine Which workers earn more than their managers (hint: remember that the MGR
attribute stores the EMPNO of an employee’s manager)?
4. Give a SQL statement to list the total monthly pay, and the total number of employees, for each department.
5. Give a SQL statement to find all jobs with more than two employees.
7. Give a SQL statement to find whether anyone in department 30 has the same job as JONES.
12
Chapter 5. Advanced SQL
Contents
Chapter 5. Advanced SQL 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Creating tables in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Copying data by combining CREATE TABLE and SELECT . . . . . . . . . . . . . . . . . . . . . . . . 3
Copying table structures without data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
The ALTER TABLE statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Using ALTER TABLE to add columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Modifying columns with ALTER TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Removing tables using the DROP TABLE statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Using DROP TABLE when creating tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Adding new rows to table with INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Changing column values with UPDATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Removing rows with DELETE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Creating views in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Views and updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Renaming tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Creating and deleting a database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Using SQL scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Objectives
At the end of this chapter you should be able to:
• Create, alter and remove views based on SQL tables, and describe some of the strengths and limitations of
views.
Introduction
This chapter introduces the means by which rows and entire tables are created, changed and removed in SQL. The
important concept of views is also covered.
• Data tables
1
• Indexes (data structures which speed up access to data)
• A column name
• An optional indicator of whether or not the column can contain null values
Data types
We shall focus on four specific data types found in almost any database environment.
VARCHAR (<length>)
2. NUMBER: This offers the greatest flexibility for storing numeric data. It accepts positive and negative
integers and real numbers, and has from 1 to 38 digits of precision. The syntax is:
where precision is the maximum number of digits to be stored and scale indicates number of digits to the right
of the decimal point. If scale is omitted, then integer (whole) numbers are stored. Note: Some DBMSs, including
MySQL, expect you to use exact data types for numeric data. For example, if you want to hold integers, then
you must use the INT datatype. If you wish to hold decimal numbers, then you must use the DOUBLE datatype.
3. DATE: The format in which dates are represented within attributes of type date is: dd-mon-yyyy; for
example, 10-jan-2000. If an attribute is of type date, simply specify the word DATE after the name of the
attribute in the CREATE TABLE statement.
The final part of a column specification allows us to specify whether or not the column can contain null values,
i.e. whether or not it is a mandatory column. The syntax is simply to specify “Not null” after the data type
specification if nulls are not allowed.
• The first item is the keyword “CONSTRAINT”, followed by an optional constraint name. Although specifying
a constraint name is optional, it is recommended that you always include it. If you later wish to refer to
the foreign key constraint - e.g. because you wish to remove it - then having a name in the CREATE TABLE
statement will make this much easier.
• The words FOREIGN KEY are followed by a list of the columns to be included in the foreign key, contained
in parentheses and separated by commas.
2
• REFERENCES is a mandatory keyword, indicating that the foreign key will refer to a primary key in another
table.
• The primary key specification starts with the name of the table containing the referenced primary key, and
then lists the columns comprising the primary key, contained in parentheses and separated by commas as
usual.
The full stops (……………) shown in the syntax above indicate that there may be more than one foreign key
specification.
Example
Supposing we have a table which we use to keep track of recording artists. The ARTIST table could be created
with the following statement:
An extremely useful variant of the CREATE TABLE statement exists for copying data. This uses a SELECT statement
to provide the column specifications for the table to be created and, in addition, the data that is retrieved
by the SELECT statement is copied into the new table structure. The syntax for this form of the statement is
as follows:
This form of the CREATE TABLE statement can be used to, for example:
• Copy subsets of tables using the select-list and WHERE clause to filter the data.
• Create tables which combine data from more than one table (using JOINs).
3
To examine the contents of the new table (see figure 1):
Sometimes you may wish to copy the structure of a table without moving any of the data. For example:
DESCRIBE EMPSTRUCT
• Change an existing column from mandatory to optional (i.e. specify that it may contain nulls).
Columns can be added to existing tables with this form of the ALTER TABLE statement. The syntax is:
4
Figure 2: The EMPSTRUCT relation
• Transform a column from mandatory to optional (i.e. specify it can contain nulls).
There are a number of restrictions in the use of ALTER TABLE. For example, you cannot:
DROP TABLE ;
It is deceptively easy to issue this command, and unlike most systems one encounters today, there is no prompt
about whether you wish to proceed with the process. Dropping a table involves the removal of all the data and
constraints on the table and, finally, removal of the table structure itself. To remove our copy of the EMP
table, called EMPCOPY:
5
Using DROP TABLE when creating tables
Sometimes, when we wish to recreate an existing table, it will be necessary to drop the table before issuing
the new CREATE TABLE statement. Clearly this should only be done if the data can be lost, or can be safely
copied elsewhere beforehand, perhaps using a CREATE TABLE with a SELECT clause.
To insert a new row into the table DEPTCOPY (this is a copy of the DEPT table):
Example:
Supposing we created a table MANAGER, which is currently empty. To insert the numbers, names and salaries of
all the employees who are managers into the table we would code:
INSERT INTO MANAGER SELECT EMPNO, ENAME, SAL FROM EMP WHERE JOB = 'MANAGER';
To verify the employees in the table are managers, we can select the data and compare the jobs of those employees
in the original EMP table (the result is shown in figure 3):
Figure 3: Result of the INSERT that selected data from elsewhere in the database
6
<column-name> = <value>
Following the equals sign “=” there are two possibilities for the value to be assigned. An expression can be
used, which may include mathematical operations on table columns as well as constant values. Alternatively,
a SELECT statement can be used to return the value. Examples: For example, to give all the analysts in the
EMPCOPY table a raise of 10%:
UPDATE EMPCOPY SET MGR = (SELECT EMPNO FROM EMP WHERE ENAME = 'KING') WHERE ENAME != 'KING';
We included the final WHERE clause to avoid updating KING’s MGR field. Note this example is very poor, as it
is incorrect if other employees are also named King! To verify that the updates have taken place correctly,
obtaining the result in figure 4, we can run:
If the WHERE clause is omitted, all of the rows will be removed from the table. However, unlike the DROP TABLE
statement, a DELETE statement leaves the table structure in place.
Example 1: To remove all employees named Ford from the EMPCOPY table:
7
Example 2: To remove any employees based in the SALES department:
DELETE FROM EMPCOPY WHERE DEPTNO IN (SELECT DEPTNO FROM DEPT WHERE DNAME = 'SALES');
Example: Create a view showing names and hiredates of employees in the EMP table:
CREATE VIEW EMPHIRE AS SELECT ENAME,HIREDATE FROM EMP WHERE ENAME IS NOT NULL;
To examine the structure of the view EMPHIRE, we can use the DESCRIBE command, just as for table objects,
obtaining the result shown in figure 5:
DESCRIBE EMPHIRE
To see the data in the view, we can issue a SELECT statement just as if the view EMPHIRE is a table. For example,
figure 6 shows the result of running the query below:
The data in base tables cannot be updated via a view in the following situations:
• When the view is based on one table, but does not contain the primary key of the table.
• When a view is based on a GROUP BY clause or aggregate function, because there is no underlying row in
which to place the update.
• Where rows might migrate in or out of a view as a result of the update being made, e.g. one cannot update
ENAME values in EMPHIRE.
Renaming tables
The syntax of this command is:
8
Figure 6: The relation created whenever the EMPHIRE view is accessed
9
The RENAME command is useful when carrying out DDL operations. For example, if we wish to remove a column from
a table and ensure every program using its others columns can continue to use the new copy without any code
changes:
1. Use the CREATE TABLE statement to make a copy of the old table, excluding the column which is no longer
required.
3. Rename the new copy of the table to the old (original) name.
For example, to create a database called STUDENTS that holds student information, we write the create command
as follows:
Review Questions
1. Create the following tables, choosing appropriate data types for the attributes of each table. In your
CREATE TABLE statements, create primary keys for both tables, and an appropriate foreign key for the
student table.
2. Which of these two relations would you create first (TUTOR or STUDENT) and why?
3. Ensure that the STUDENT_NO field is sufficiently large to accommodate over 10,000 students. If it is not,
change it so that it can deal with this situation.
6. Insert a row in the TUTOR relation, using any data values of the appropriate type.
10
7. Insert a row in STUDENT for each row in TUTOR, using their TUTOR_ID as their STUDENT_NO and leaving
DATE_JOINED, COURSE and TUTOR_ID as unknown/inapplicable for the moment.
8. Use CREATE TABLE with a sub-query to make copies of your TUTOR and STUDENT relations.
9. Change the TUTOR_ID of the tutor named “Doe” to any new value of the correct type.
11. Create a view of the STUDENT relation that contain details of all students taking Computing.
12. Create a view containing the names of all tutors and the names of their tutees.
11