978 0 230 00094 0
978 0 230 00094 0
ORACLE
The Theory and Practice of Database Design
SYSTEMS BUILDING
WITH ORACLE
The Theory and Practice of Database Design
William Smith
© William Smith 2004
All rights reserved. No reproduction, copy or transmission of this
publication may be made without written permission.
No paragraph of this publication may be reproduced, copied or transmitted
save with written permission or in accordance with the provisions of the
Copyright, Designs and Patents Act 1988, or under the terms of any licence
permitting limited copying issued by the Copyright Licensing Agency, 90
Tottenham Court Road, London W1T 4LP.
Any person who does any unauthorised act in relation to this publication
may be liable to criminal prosecution and civil claims for damages.
The author has asserted his right to be identified as the author of this
work in accordance with the Copyright, Designs and Patents Act 1988.
First published 2004 by
PALGRAVE MACMILLAN
Houndmills, Basingstoke, Hampshire RG21 6XS and
175 Fifth Avenue, New York, N. Y. 10010
Companies and representatives throughout the world
PALGRAVE MACMILLAN is the global academic imprint of the Palgrave
Macmillan division of St. Martin’s Press LLC and of Palgrave Macmillan Ltd.
Macmillan® is a registered trademark in the United States, United Kingdom
and other countries. Palgrave is a registered trademark in the European
Union and other countries.
ISBN 978-1-4039-0169-9
ISBN 978-0-230-00094-0 (eBook)
DOI 10.1007/978-0-230-00094-0
PREFACE xi
2 LIGA UTOPIA 31
2.1 The Utopian Football League 31
2.2 UML Specification 33
2.3 UFL UseCases 35
2.4 Class diagrams 38
v
vi CONTENTS
3 DATA MODELLING 49
3.1 A Clear Picture of the Data 49
3.2 Relationships 53
3.3 Keys 56
3.4 Entity–Relationship Diagrams 62
3.5 The UFL Entity–Relationship Diagram 68
3.6 ERD Issues 69
3.7 Key Terms 71
3.8 Questions 71
3.9 Exercises 72
4 NORMALIZATION 73
4.1 Set Theory and the Entity–Relationship Model 73
4.2 First Normal Form 74
4.3 A Is for Anomaly, Axiom, and Armstrong 76
4.4 Second Normal Form 80
4.5 Third Normal Form 83
4.6 Boyce–Codd Normal Form 84
4.7 Key Terms 86
4.8 Exercises 87
6 SQL 99
6.1 Taking Stock 99
6.2 Data Definition Language 100
6.3 Create 101
6.4 Alter 106
6.5 Grant 107
6.6 Insert 109
6.7 Select 111
6.8 Formatting Output 114
6.9 Create with Select 118
CONTENTS vii
BIBLIOGRAPHY 537
APPENDIX 540
A.1 Installing Oracle 540
A.2 Using Oracle Enterprise Manager 541
A.3 Using SQL*Plus 545
A.4 Data Types, Classes and Exceptions 547
A.5 Performance Experiments 551
A.6 Entities and Relationships 553
A.7 PL/SQL Collection Methods 555
INDEX 556
PREFACE
I’m not interested in developing a powerful brain. All I’m after is just a mediocre brain,
something like the President of the American Telephone and Telegraph Company.
Alan Turing
This book comes from teaching introductory, intermediate and advanced level
courses at undergraduate and postgraduate level. Much of the structure and
content reflects support material for courses that establish the theoretical aspects
of database systems, underpin practical workshops or initiate advanced, research-
oriented studies.
I have included coverage of all of the central and optional database topics likely
to arise during the whole of a student’s undergraduate career. The material is also
suitable to support students following an MSc conversion programme.
The book is divided into four parts. Part 1, Getting Started, offers introductory
material on the development of data storage techniques, data models and data-
bases. It also sets out the background for the Liga Utopia case study that provides
most of the practical examples in the succeeding text.
The second part, Database Systems, contains course material for a course in data-
base theory, typically taken as part of the second year of study for an undergrad-
uate programme or as a major element of a conversion MSc. The volume of material
in this section is aimed at a full unit within such a modular programme. Sample
teaching plans, lecture slides, tutorial questions and coursework assignments are
available on the support web site.
Implementing the System, the third part, deals with practical issues. This offers a
guide to the problems and solutions associated with the implementation of the
Liga Utopia case study. This can be used to support group workshop tasks. It also
offers insights for individual project work.
The final part, Advanced Database Issues, reviews several topics of current
research interest and delves deeper into some of the material treated in Part 2. Its
xi
xii PREFACE
William Smith
PART 1
GETTING STARTED
CHAPTER 1
Technology is the knack of so arranging the world so that we do not experience it.
Max Frisch
When William the Conqueror invaded England in 1066 and defeated Harold’s
army near Hastings, his immediate task was to pacify the country. After a phase of
castle building, land redistribution and military suppression of the Anglo Saxons,
William ruled in relative peace. He knew that in order to rule a country he needed a
stable government based on a sound income. He needed to raise taxes. In order to
do this he needed data about the land holdings, the agricultural output and the
military capacity of the population.
Therefore, he ordered an inventory. The Domesday Book was completed in 1085.
In a little over 400 handwritten pages, all of the wealth of England was described.
Towards the end of the 18th century, Britain’s youngest ever Prime Minister,
William Pitt, was facing two similar problems. Firstly, he had an expensive war to
fight against Napoleonic Europe. Secondly, his principal revenue, based on the
agricultural wealth of England, held by relatively few landowners, was giving way
to industrial wealth created in the burgeoning cities, centred on the new
manufactories. In 1799, he changed the fiscal face of England by introducing
Income Tax to tap into this wealth and pay for his war. In so doing, he multiplied
the number of taxpayers by perhaps a factor of 1000 and created a new role for the
Treasury in collecting and collating tax data on an unprecedented scale.
3
4 SYSTEMS BUILDING WITH ORACLE
By the middle of the 20th century, not only governments but also large national
corporations were dealing with huge numbers of customers, offering a wide range
of products distributed through complex wholesale and retail networks. All of this
activity was controlled using paper records, card indexes, thousands of clerks and
mechanical office equipment that still relied on the human brain for most of its
logic.
One of the first commercial computer systems to be introduced in Britain was the
Lyons Electronic Office – LEO. In 1951, the LEO computer was operational and ran
the world’s first routine office computer jobs. It used computing equipment based
on programmable computers with tape storage to calculate, among other things,
the payroll for J Lyons & Co.
A technological revolution was on the horizon. The new science of electronics had
created a machine capable of storing complex instructions and applying them to
data at a seemingly incredible speed. The first use of this type of computing
machine had been in cracking German wartime codes. The instructions were
defined by physically varying the connected parts of the machine using plug
cables.
After the war, the machines had been improved so that the set of instructions
could be varied simply by loading a ‘program’ encoded on a roll of punched paper
tape. It was not long before it was realized that much of the logic of commercial
and administrative processes could be captured as a program and applied to records
stored on magnetic tape.
It is part of the physical nature of magnetic tape that the records are stored one
after the other along the tape. Just as with an audio cassette, if you want to play
track 6 you must fast-forward the tape to the position you need. If you then want to
play track 3, you must rewind and so on.
RANDOM Data items recorded one after the other with no implicit
ORGANIZATION ordering.
The data on these early reels of tape were said to be organized randomly if the
data items had no particular order. If the order of the data items was sorted in some
way, say by surname, they were said to be organized sequentially. The access
method, however, is serial whatever the organization of the tape
The introduction of disk storage made a great difference to the way that data
could be processed. The data is stored on the magnetically coated surface of a disk
that rotates at high speed.
A read/write head ‘flies’ just above the disk and reads or writes the data using a
powerful focused magnetic field to manipulate the signals recorded on the disk.
The data is stored in concentric tracks on the disk (Figure 1.1). By moving the
recording head in or out over a radius of the disk, a different track passes under it.
Moving between data items is much faster, since the head can quickly move to fly
above the appropriate track and wait not more than one revolution of the disk for the
data to pass under it. Multiple disks on a single spindle with a head for each surface
increased the storage capacity of the disk pack. It was now possible to position the
head array over track 20, say, on platter 1 and without physically moving the head to
read track 20 on platter 2 simply by electronically switching the active head.
Storing related records on the same track but on different platters of the disk pack
means that a virtual cylinder of data is created (Figure 1.1).
This freedom of movement for the read/write head clearly has implications for
processing speed. On a tape drive, the head does not move; the tape merely passes
6 SYSTEMS BUILDING WITH ORACLE
underneath it. One item of data follows another. The order of the items that are
stored clearly fixes the order of reading.
DIRECT ACCESS A method of data access where the next position is not
predicated by the current position. Compare with serial
access.
A disk does not have its next position prescribed for it by the storage method. It
can read track 20, followed by track 6 followed by track 56. Because we cannot
predict its next position from its current position, a disk is described as a direct
access device. The operating system that is controlling the movements of the disk
is free to move it to wherever the next data item is to be found.
Additionally, more than one program at a time can make read or write requests at
different positions according to their own different requirements. A direct access
disk supports multi-tasking in a way that a serial device could never do.
It became common to find banks, for example, with customer records numbered
in the millions. Clearly, the size of the file that contained such data was huge and it
might span several disk devices.
A BRIEF HISTORY OF DATA 7
The problem of efficient access to such files was shared by most applications, and
COBOL was augmented with a number of standard file handling procedures. These
might allow a program to step through a file, stopping at each customer record to
perform a task and then moving on until encountering the end of the file. This
would suit an application executed at the end of each month to print a statement
for each of a bank’s customers. Figure 1.2 shows how files can be organized either
randomly or sequentially.
INDEX FILE A secondary file associated with a main data file and
sorted on one or more unique attribute values with a
pointer to the occurrence of such an attribute value in the
records of the main file.
An index works just like the catalogue in a library. A separate card file, sorted on
some unique identifier, is kept. In a library catalogue, the reader can search on
8 SYSTEMS BUILDING WITH ORACLE
author’s surname. Since the record cards are kept in alphabetical order, it is rela-
tively easy to find the author’s card.
Searching through cards organized alphabetically, looking for the correct
author, is then very simple. By looking about halfway through the file, you should
find the ‘M’s. Knowing that the author is in the second half of the alphabet, by
looking halfway between the ‘M’s and ‘Z’s you will probably find the ‘S’s and so on.
When the correct card is found, it contains a reference that points to the shelf
where the author’s books are kept. The reader goes to the shelf and retrieves the
book. In this case, the unique identifier is the author’s surname. Figure 1.3 shows
an example of a sorted index of a small number of records, with each entry
pointing to the record number of the main data item. Adding new records in the
main file means reorganizing the index file.
There were several disadvantages associated with using a language like COBOL to
manage large files in this way. Firstly, each program had to have a section in it that
described what kind of data was being used. Since there might be many programs
to perform all the processing associated with the business’s needs, if the data struc-
tures changed, all the programs would need to be updated with the amended
structure.
Secondly, there would have to be separate programs to create the indexes and to
update them when data was added or deleted and, of course, to keep the main data
file in a consistent state. These are known as maintenance tasks.
Over time, the program code for these common maintenance tasks was incorpo-
rated into code libraries that could be linked to application programs at compile
time. This way of working was more efficient, but it still meant that the size of the
executable program was very large and made heavy demands on the memory of the
computer.
As operating systems design advanced, it became possible to enable more than
one user to run a copy of each program simultaneously (Figure 1.4). This multi-user
facility had two implications: there would now be multiple copies of already large
A BRIEF HISTORY OF DATA 9
Monolithic COBOL
application
Monolithic COBOL
application
executables resident in the machine’s memory and the data files had to be
protected from two or more users reading and updating the same data item at the
same time. Adding memory to the machine could solve the first problem. The
second problem needed substantial additions to the standard code libraries to
allow synchronized access to data items.
The use of simple monolithic programs to undertake data processing tasks on
ever-larger data stores was becoming increasingly unwieldy and needed
rationalizing.
One idea was to separate the business logic from the storage and maintenance of
the data. Creating a separate executable program that was responsible for storing a
single image of the data structure and managing the standard maintenance func-
tions would make it possible to strip the entire encumbrance from the user
programs.
One copy of the manager program would be resident in the computer and several
copies of the smaller user programs would communicate with it, requesting stan-
dard functions like searching for a particular data item, updating it free from inter-
ference from other users and storing the update reliably. The idea of the database
had arrived.
COBOL Application
Database Manager
COBOL Application
COBOL Application
COBOL Application
Separation of the business logic in the user program from the common mainte-
nance tasks in the database manager could be said to preview the later develop-
ment of the client–server paradigm. The chief differences were the method of
communication and that fact that ‘client’ and ‘server’ occupied the same memory
space in the same computer.
In order to achieve these aims, the Database Management System (DBMS) had
objectives as set out in Table 1.1. These are general specifications for a complex
software engineering project.
The design brief says nothing about how the data is visualized and defined; it
does not prescribe the storage method and it does not lay down a particular
internal language to perform queries on the data.
The separation of logic from data enables:
G efficient and reliable data management
G reliable and safe schema management
A BRIEF HISTORY OF DATA 11
All database management software follows these guidelines and in addition will
have further, more particular, definitions that allow a broad characterization of
DBMSs into five groups that were developed by the major computer corporations
over the 30 year period from 1965 to 1995.
IBM started development work on the first database, IMS, in the mid-1960s. Early
versions of the software were released in 1968. IMS was based on a view of the data
that became known as hierarchical.
12 SYSTEMS BUILDING WITH ORACLE
Each type of real-world object was modelled as a record type. Records describing
the same type of object had the same structure and consisted of fields describing
the common characteristics of the object.
Thus, there might be a record type Customer that would have fields for surname,
first name, date of birth etc. If this were a banking application then there might also
be a separate record type called Account that described the particular accounts a
customer might hold. Branches in the bank would be represented as another record
type and so on.
Figure 1.6 illustrates the logic of this view. The relationship is in one direction
only, parent to child, and in the physical database is captured as a pointer
connecting, for example, each customer to the account held. This pointer is actu-
ally the disk address of the owned record. If, as in the case of the branch/customer
relationship, the branch were responsible for many customers, then the pointer
would indicate the first customer and a chain of pointers would connect all the
customers from the same branch.
Bank (root)
Branches
Customers
Accounts
Transactions
Figure 1.6 Hierarchical database: example schema. In this example, there are
five record types. The pointers indicate ownership. Access to the database is
through the root of the hierarchy, the Bank.
Each record has to have additional spaces to store child pointer(s) and chain
pointer(s). The pointers would be disk addresses of the storage positions of the
records. The central point about this data model is that the relationship between
record types was expressed through ownership. Thus, branches might own
customers who would in turn own accounts. This view is sometimes also called a
parent/child relationship.
Searching in such a database implies following the links of the pointers. To find a
particular branch we start at the bank/root and find the first branch owned by the
bank.
The chain of branches is then traversed until the particular branch that meets the
search criteria is found. Descending to the first customer, that chain can be
traversed to find all the customers in a branch. This is known as a navigational
query, where record identity and relationships are established by pointer values.
A BRIEF HISTORY OF DATA 13
Figure 1.7 shows that Branch Able has Customers 567, 9348, 2894, 2849 and 8723,
which has an end pointer in the Next Customer field.
The customer records for a particular branch may not be stored together on the
disk, nevertheless they can be retrieved very quickly by following the chain pointer
trail in NextBranch and NextCustomer, directing the read head to the precise posi-
tion indicated by the pointer value.
The definition of the hierarchical tree of relationships is established before any
data is loaded. Linking the disk positions of stored data means that the schema
design must be very carefully thought out. Once the data is loaded, it is not easy to
vary the shape of the tree. However, this same reliance on disk position offers
extremely high-performance data access. This is both because the disk head can be
directed straight to a precise disk position and because related records like the list
of customers for a particular branch can be stored by the database in contiguous
positions on the disk.
14 SYSTEMS BUILDING WITH ORACLE
Starting in 1969 and working through to 1972, the Conference on Data Systems
Languages (CODASYL), which had previously agreed the standard for COBOL, was
developing a different data model based on the concept of a network of relation-
ships (Figure 1.8).
Bank (root)
Branches Customers
Transactions Accounts
Figure 1.8 CODASYL database: example schema. The same five record types as
in Figure 1.6 are present, and the pointers still indicate ownership. Access to
the database is through the root of the network, the Bank. Notice that the
logical relationships can be bi-directional.
Each relationship is still expressed through ownership, but the network database
allows a parent record type to own more than one type of child record. Further-
more, each relationship, again expressed through disk position, is capable of being
represented bi-directionally. There are now ascent and descent pointers to move
between record types. Chain pointers enable a traversal of instances of a record
type.
Figure 1.9 still shows that branch Able has customers 567, 9348, 2894, 2084 and
5849, which has an end pointer in the Next Customer field. Although the customer
records for a particular branch may or may not be stored together on the disk,
nevertheless we can retrieve them very quickly by following the chain pointer trail.
What is new here is the ascent pointer, allowing navigation from Able’s last
customer, 5849, back to the Able branch record. From here, it is possible then to
move along the branch chain pointers to inspect another Branch and its
customers.
A BRIEF HISTORY OF DATA 15
Ted Codd, an IBM researcher, published a paper (Codd, 1970) that sought to apply
the well-known mathematical concepts of set theory to the problems of handling
large collections of data. Groups of similarly structured objects are known as rela-
tions. Applying this theory to databases gave rise to the relational data model.
In the hierarchical and network models, the connections among record types are
held as physical addresses. The content of such records therefore consists of different
data representations. There are the values of real world characteristics of the object
being modelled such as ‘Grant’ or ‘Hepburn’ in the surname field. Then there is a
child pointer with value ‘0185674345965’, which refers not to any characteristic of
the real-world object, but to the disk position of the first child of the record.
Branch Data
Able ...
Baker ...
Charlie ...
Delta ...
Echo ...
The data and the metadata now contain only one kind of data representation.
End-users can more easily visualize both the data and the metadata. The data
model for any particular application is still stored as a schema in the database, but,
as far as the user is concerned, it contains no mention of how the data is physically
stored. Instead, it contains only rules for connecting relations based on equality of
shared data values. This data model contains only information from the real world
and nothing about the physical structure of the database. It is an abstract model
centred only on the data (Figure 1.11).
Bank
Branch Customers
Transactions Accounts
Figure 1.11 Relational database example schema. The five record types of
Figures 1.6 and 1.8 now have relationships expressed through values of the
characteristics of each of the types. Access to the data is through the value of
any characteristic of any of the types. Notice that the logical relationships can
only be uni-directional.
Because the data model was derived from formal mathematical theories, it was
possible to envisage formal proofs of the systems analyst’s work in defining the
schema view of the real world. The relational model unified data and schema in a
provably correct way.
In 1975, a major research effort was initiated at the Berkeley campus of the
University of California. This was known as the Ingres project. Teams of computer
18 SYSTEMS BUILDING WITH ORACLE
scientists who worked on the Ingres project went on to join the commercial project
teams responsible for products such as Oracle and Sybase. Ingres became a
commercial product as well. By the early 1980s there were commercial products
available ready to challenge the position of older packages like IMS and IDMS.
Relational databases are more flexible in their design aspects and it is often
possible to alter the structure of relations and the definition of their interconnec-
tions without unloading and reloading the data. The data access language that
became known as SQL (Structured Query Language) is based on a value-only view of
the database, so that it is independent of physical storage or changes in the file
system. SQL enables impromptu queries that do not rely on hard-wired navigational
pointer trails. It is easier to train non-computing specialists to use the language.
The major disadvantage for relational databases is that the flexibility that comes
from the abstract view may cause performance problems for the very reason that
there are no well-worn navigational pointer trails to aid the speedy resolution of
queries on large data (Figure 1.12).
One of the driving forces behind the development of databases was the need for
data abstraction: the separation of data and its storage from the application logic.
This was thought desirable in order to standardize methods of storage, to offer
protection to the data from multiple users and to give the data a safe environment
for program developers
Throughout the same period, academics were examining ways of strengthening
these aspects of programming languages.
As early as 1967, researchers introduced a new language feature called the class,
which could be extended through an inheritance mechanism. These capabilities
laid the groundwork for object-oriented programming.
What was good for data could also be good for logic. Another important feature
of object orientation is that program code or methods could also be defined within
a class (Figure 1.13). Since the class could be protected from amendment and inher-
ited, it became one answer to the general problem of providing a safe development
environment (Figure 1.14).
I am an
object
ATTRIBUTES
NAME: MARK
State DOB: 14/02/64
JOB: Lecturer
RELATIONSHIPS
BEHAVIOUR
GetAge()
Methods
ChangeJob()
By 1983, the Smalltalk-80 and C++ languages had been defined. These allowed
the inheritance of classes, dividing the data elements and the methods (or proce-
dural elements) into public and private sections.
Data or methods in the private sections could be accessed only through the
public section. Enclosing or protecting data and method in this way is called
encapsulation and is meant to ensure that programmers make use of the class only
20 SYSTEMS BUILDING WITH ORACLE
PERSON CLASS
I am an
object I am also
an object
MARK BILL
OBJECT
GET AGE
METHOD
GetAge()
NAME: MARK Lecturer
DOB: 14/2/64
JOB: Lecturer
CHANGE
JOB
METHOD ChangeJob()
in the way that its authors intended (Figure 1.15). By incorporating standard
classes into their programs, developers could hope for reliable, trustworthy, reus-
able class libraries.
A BRIEF HISTORY OF DATA 21
Once a class is securely defined, it can be protected from amendment and used as
a foundation for a new, more specialized class. Thus a class called Person, which
contains attributes for surname, forename, address, date of birth etc., can be inherited
by the class Employee, which also needs these same characteristics.
Additional attributes can be added outside of the inheritance to complete the
Employee data. The Employee class can also be inherited by the class Doctor or the
class Nurse. In each case the basic personal and employment details have been
incorporated reliably and consistently into the data structure (Figure 1.16).
Person
Employee
Doctor Nurse
Figure 1.16 Trust Me, I’m a Person: the characteristics of the Person class are
inherited by the Employee class with additional characteristics. In a hospital
schema, the Doctor and Nurse classes inherit from both Person and Employee
with further characteristics specific either to a doctor or to a nurse. A Patient
class would be in a different sub-tree, inheriting directly from Person .
During the 1980s the idea of the class was extended to include complex defini-
tions (Figure 1.17). Multiple inheritances meant that a class could inherit from two
or more different class definitions.
Data attributes could be atomic, consisting of simple types (integer, float, string
etc.) or could consist of other classes. Collections of instantiated objects referred to
as lists (ordered) or sets (unordered) could form a single attribute in a class defini-
tion. One of the attributes of a car is a set of wheels (Figure 1.18).
22 SYSTEMS BUILDING WITH ORACLE
Tyre Tyre
Registration
Hub cap number
Hub cap
Seat Seat
Trim Trim
Figure 1.18 Complex objects II: a car consists of other objects (Chassis, Seats,
Wheels etc.); a Wheel consists of other objects (Tyre, Hub cap, Trim etc.).
not to the value of any of its attributes. If any of the object’s attributes refer to other
objects or to object collections such as lists or sets, then these are stored as pointers
to the other objects or to the head of the list or set. When the program stops, the
data is destroyed.
The lack of a consistent model for persistent objects was a major reason for the
lack of penetration of languages like Smalltalk and C++ into the commercial arena.
The need for the development of object databases was underlined by the kind of
application that objects are good at supporting. These all have a wide variety of
data types and almost always include some kind of graphic or image storage and
retrieval requirement (Table 1.2).
Unlike the relational model, the object model has no basis in mathematics; there
is no consistent definition of an object. The precise definition is empirical and a
decision for those who specify each of the languages that claim to be object-
oriented. The development of object databases mirrored the earlier development of
standard data storage functions for COBOL. Measures for storing objects were first
proposed as extensions of particular object-oriented languages like Smalltalk or
C++, or even specific manufacturers’ versions of these languages.
However, many of the object databases share certain characteristics. The
metadata or data dictionary is kept as a class hierarchy, allowing at least single
inheritance. References to instantiated, stored objects are kept as pointers to disk
or file locations and are relative to an arbitrary root of the data. Complex objects,
which refer to other objects or to collections, are also modelled using pointers to
disk or file locations. These can be single pointers to just one object or the head of a
linked list to a collection. Fast access to the data is achieved using a navigational
language embedded in the database software and which allows traversal of the
pointer framework. This access method identifies each object by its pointer value,
not by any value of its attributes (Table 1.3).
Mention of a pointer framework inevitably leads to a comparison with the
CODASYL or network database model, and there are striking similarities. A
complex schema where the DBMS is responsible for transparent maintenance of
the pointer connections implies inflexibility of the definition but speed of access.
24 SYSTEMS BUILDING WITH ORACLE
OOPL OODBMS
Complex objects Persistence
Object ID Disk management
Methods and messages Data sharing
Inheritance Reliability
Polymorphism Security
Extensibility Ad hoc queries
Computational completeness
The object database has the added complication of providing classes with inheri-
tance and of storing the program code for methods both as original source and,
perhaps, as target microcode for the host machine.
This hints at two major problems for the architects of an OODBMS; firstly, how
should the OIDs of a complex object structure in the database map to the OIDs of
the same object structure once it is retrieved into the memory of a client (Figure
1.19), and secondly where should the code for an object’s methods execute?
OIDs based on
memory location
Main or virtual memory
swizzle transform
function
Clearly, any user program that is designed to access an object database should be
compatible with the way that objects are identified in the database. A persistent
complex object consists of individual simple objects connected by persistent
pointers referring to their permanent locations. Retrieving such an object into the
RAM of the client machine for interrogation or update implies using a programming
language whose syntax is comfortable with a mapping of persistent to transient
A BRIEF HISTORY OF DATA 25
(volatile memory) pointers. Most object database vendors provide a variant of C++
or some other language that offers such mapping transparently. A so-called swizzle
function provides for the mapping and allows new connections and new objects to
be stored persistently following a client’s editing session.
The question of where method code runs raises questions of database security if
the code runs in the server and of compatibility if the code is destined to run in the
client. The code can either be pre-compiled and stored as microcode for the server’s
CPU, or the source might be delivered to the client, which must then have an
appropriate compiler to generate the necessary microcode for the client CPU.
One solution is to provide a limited version of an OOPL in which method code
may be written. Constructs, such as direct memory access to the server’s RAM, are
removed so that only friendly code may be generated and the method code stored
on the server.
When objects are retrieved from the database, they are transferred to a page of
memory in the server (Figure 1.20). That page is mirrored in the client, along with a
page manager. Client application code is applied to the objects in the client page
and changes are mirrored to the server page. Changes caused by trusted method
code running in the server are applied to objects in the server page and mirrored
immediately to the client page.
Application
Page
storage
Object
Object
The application code can be written in any language that can access the client
page objects. Methods can be stored once in the server as compiled microcode.
Object databases are good for applications that require a rich data model and
where the queries and transactions are well known in advance and can be repre-
sented in the connections between complex objects. Such queries are essentially
pre-joined and execute with more efficiency that those under the relational model.
Object databases are stretched to perform well with ad hoc queries, where a form of
Object SQL often underpins such functionality. The natural retrieval mode of an
OODBMS is navigational, based on the OID pointer network. Supporting queries
26 SYSTEMS BUILDING WITH ORACLE
based on the values of attributes is alien to the data model and using indexes to
improve performance introduces yet more impedance into the system.
The object database shares a major disadvantage with the Codasyl databases: the
inflexible schema repository. Because the connections between objects are hard-
wired, changing the existing class hierarchy can mean unloading the data,
amending the structure and then reloading the data. Schema evolution is a desir-
able feature if the database is to adapt to changing business needs. The relational
database, because connections are expressed through equality of attribute value,
has advantages in this respect over all other database types.
Of course, vendors are keen to offer as wide a range of facilities as possible in
order to compete. Often this means that the natural advantages of persistent
objects are compromised. Much the same could be said about the vendors of rela-
tional databases, who, for similar reasons, have taken their products away from the
initial pure and provable data model to approach object orientation from the other
direction.
Transforming and
type checking
SQL
Secondary
Relations storage
the principal data access method remains SQL and the product overall retains back-
ward compatibility with previous versions of strictly relational products.
When the data model and the data access method are not based on the same view
there is said to be an impedance mismatch. This applies just as much to the object
database vendor who provides a special version of SQL to query or update persis-
tent objects as the relational database vendor who provides an OO layer above a
storage system based on relational concepts.
member function
current_balance return number );
possible: sub-types can be derived from types. Clearly, complex objects can be
modelled: object attributes can consist of other objects and even object collec-
tions.
The development of object relational databases seems to be more marketing
driven than market driven. It appears more concerned with competing on a facili-
ties manifest with object databases than responding to a genuine consumer
demand. The relational model sits perfectly with the SQL processing model, both
being derived from the same mathematical principles. Introducing an object view
of the data places an impedance between the data model and the favoured
processing model. Using a view does not reduce the existing impedance between
the relational model and an object or record-based processing model since the
underlying structure is unchanged.
A BRIEF HISTORY OF DATA 29
1.12 Î QUESTIONS
1.13 Î EXERCISES
(ii) Search the Internet for examples of applications where the characteristics of
each of the database types are considered an advantage or a disadvantage.
Write a short report matching what you find with the material in this chapter.
CHAPTER 2
LIGA UTOPIA
Some people think football is a matter of life and death but I can assure them it is much more
serious than that.
Bill Shankly
In Utopia, the Utopian Football Association, UFA, regulates football. This body is
responsible for interpreting the rules of football; for the disciplinary hearings of
cases reported to it; for developing coaching programmes in the amateur game; for
organizing the national team; and for keeping a register of players and coaching
staff for all the leagues of professional and amateur teams in Utopia. In addition,
UFA organizes a number of competitions between the clubs, notably the UFA Cup
for professional sides and the UFA Urn for amateur clubs.
There are 80 professional football clubs in Utopia. They are organized and regu-
lated by the Utopian Football League, UFL. The clubs are divided into four divi-
sions of 20 clubs each. UFL sets the fixture list of matches between clubs at the start
of each season, which lasts from August of each year to May of the following year.
Figure 2.1 shows the clubs and some of the players in the league. Each club plays
all of the other clubs in its division twice: once at home and once away. UFL main-
tains a league table for the four divisions based on a points system of 3 points for a
win and 1 point each for a drawn match. Teams with equal points scores are
ordered by the difference between goals scored and goals conceded.
The UFL also keeps a record of the team sheets for each game, including substitu-
tions of players during the game. Goal scorers are noted as well as those players
who receive official cautions (yellow card) or are sent off (red card) by the match
officials. Serious infringements can lead to disciplinary hearings that result in
match suspensions.
31
32 SYSTEMS BUILDING WITH ORACLE
The clubs employ the players. Each played is allocated a squad number, which he
wears on his shirt. This number identifies him to match officials who check the
team sheets submitted by the club before each game. The squad number does not
indicate the position at which the player plays. This is indicated on the team sheet
and reflects his general classification, maintained by the club, as a goalkeeper,
defender, midfielder or forward.
Players may play for several clubs during their careers. They have a contract with
the club that indicates when they joined the club and when they left the club. Each
club also has a manager as well as a specialist coaching staff, all of whom also have
LIGA UTOPIA 33
to be registered with the UFA. Players, managers and trainers are all subject to the
UFA disciplinary code.
UFA also regulates the transfer of players between clubs in the league as well into
the league from other countries and out of the league to other countries. Details of
contracts are kept so that over time these records become a history of the players’
careers.
The UFL currently keeps all its records on a card index filing system but wishes to
computerize. The UFL Secretary has heard of a database package called Oracle and
his regular golf partner has told him about a method of systems development
called object orientation. Apparently, systems designed by this method can be
implemented on a local area network of PCs using either the Java or Visual Basic
languages.
The UFL has therefore called in a consultancy company called Perfect Solutions
Unlimited (PSU) to investigate the opportunities for computerization and to
produce a prototype implementation as part of an evaluation exercise before a
production system is commissioned. The prototype design will look only at first
division clubs and a selection of their players.
The Unified Modeling Language is fast becoming a standard language for analyzing
and specifying the parts of software systems. Object technology began to become
available for commercial use in the early 1990s. With the technology came
attempts to support object analysis and design. Each author proposed their own
competing and incompatible design languages. The lack of a standard method for
modelling systems based on object technology was seen as a hindrance to its pene-
tration in the market.
Booch, Rumbaugh and Jacobson decided to join in designing a common unified
language. UML 1.0 was adopted by the Object Management Group 1 as a standard
in 1997. The OMG is a consortium of software vendors who, with an eye to the
commercial benefits of standards, aim to promote common frameworks for
systems development. As well as the UML standard, OMG are responsible for the
publication of the CORBA2 middleware specification.
Various levels of abstraction can be expressed in UML, ranging from high-level
user requirements, to detailed models of object behaviour. Some UML tools can
generate code from UML models. UML includes Object Constraint Language, to
provide the necessary level of precise formal details for implementations. Figure
2.2 shows the hierarchy of diagrams used in UML.
Designing applications before coding gives an assurance that functionality is
complete and correct, end-user requirements are met, and that scalability, robust-
ness, security and extendability issues have been addressed before implementation.
1 http://www.omg.org/
2 Common Object Request Broker Architecture
34 SYSTEMS BUILDING WITH ORACLE
UML diagrams
Structural
Class diagram
Object diagram
Behavioural
Component
diagram
UseCase diagram
Deployment
diagram Sequence diagram
Activity diagram
Model
management
Collaboration
diagram
Package diagram Statechart
diagram
Sub-systems
diagram
Model diagram
UML may be used to define systems boundaries and display major functionality
from user perspectives. The data component structure may be modelled with class
diagrams and the behaviour of objects (class instantiations) within the system
illustrated with state transition diagrams.
This is not a book on Systems Analysis and Design so there will be no detailed
discussion of UML. The next section, on UseCases, illustrates their use in deter-
mining the client requirements for the UFL system.
LIGA UTOPIA 35
UFL overview
Get reports
UFL Executive
Maintain
matches and
discipline
UFL Clerk
Enter match
details
(teams and goals)
UFL Clerk
Maintain clubs
and players
UFL Clerk
Create matches
for season
UFL Clerk
Maintain match
data
UFL Clerk
Enter discipline
data from UFA
UFL Clerk
The second-level UseCase can be further decomposed, as Figure 2.5 shows. Here
the Maintain Match Data use case has been broken down into its component opera-
tions. The UseCase diagram informs the discovery of processes as well as data
elements within the required system.
For the sake of completeness, and also to underpin the functional design of the
Java implementation of the UFL application, Figures 2.6 and 2.7 show the second-
level use case diagrams derived from the overview.
At this level, the top level has been split into three functions undertaken by
clerks in the organization. It may be that this is sufficient detail to conclude that
the three activities will translate into three screens in the proposed system.
The reports diagram shows that there are three reports that the UFL executive
finds necessary.
LIGA UTOPIA 37
Create and
Select season
maintain club
and match
data
UFL Clerk UFL Clerk
Create and
Add goal scorers maintain player
data
UFL Clerk UFL Clerk
Maintain
Calculate score
contract data
UFL Clerk UFL Clerk
Figure 2.5 UFL UseCase: maintain match data. Figure 2.6 UFL UseCase: clubs and players.
View/print
league table
UFL Clerk
View/print
team sheets
UFL Clerk
View/print
match results
UFL Clerk
When a fair representation of the business functions has been captured in the
UseCase diagrams, it is time to look at the major items of data needed to perform
the day-to-day processing and the other regular maintenance represented in the
diagrams. A class diagram is used to do this. A class consists of data items and
internal mini-programs called methods that work on the data items. To make sure
the list of classes is complete, look through the UseCases to identify nouns that
might lead to data items and verbs that give clues about the methods associated
with the data.
There is a standard way of representing classes within UML and a standard way of
representing the relationships that exist between them.
It is a good idea to define an Address class. Many of the other classes, such as
Club, Player or Coach may have one or more addresses. If a common pattern is
defined then the other classes that use it will always have the same format.
Figure 2.8 shows that the diagram for the Address class consists of a rectangle
divided into two sections: at the top is the name of the class and in the second part
we define the data items that will hold values when the class type is instantiated as
an object. The names of these data items do not tell us very much about their
purpose, so a better version would be as shown in Figure 2.9.
Address Address
FirstLine: String Road: String
SecondLine: String District: String
ThirdLine: String City: String
FourthLine: String County: String
FifthLine: String Postcode: String
2.4.2 ENCAPSULATION
The class is still incomplete. Classes can have public and private data. They can also
have program code or methods associated with their definition. It makes sense to
protect or encapsulate the data items in a private section so that access to the data
can be controlled and the address verified. The next version, in Figure 2.10, has
private data items only and defines public and private methods.
LIGA UTOPIA 39
Address
Private
Road: String
District: String
City: String
County: String
Postcode: String
Public
Address GetAddress()
Boolean SetAddress(x: Address)
Private
Boolean VerifyAddress(x: Address)
Three method definitions have been added in a new third section of the class.
These definitions are called method signatures and they each show four types of
information about the method. Firstly, two of the signatures are public and one is
private. This means that the first two can be called directly by other objects sending
messages to an Address object. The third method is internal and private and can
only be called by one of the first two. Secondly, the names of the methods haves
been defined: GetAddress, SetAddress and VerifyAddress.
Each of the methods has a return type. When the GetAddress method is invoked,
an answer is sent back in the form of an address structure of five lines. When
SetAddress is invoked, a Boolean value (True or False) is sent back, signifying
whether the data was acceptable and was stored.
Input parameters for SetAddress and VerifyAddress have been defined. SetAddress
receives an input parameter in the form of five lines of address data, which it will
assign to the data items. The program code for the public method, SetAddress, will
call the private method VerifyAddress, which could, for example, check whether the
input parameter contains mandatory values for Road and Postcode and whether
Postcode is in the Utopia Mail format.
It is interesting to note that here the input parameters are defined recursively as
being of type Address in the midst of defining that very same address type.
The Address data consists only of simple atomic data types. They are called
atomic because they are indivisible: they have no underlying structure and can be
manipulated only as a whole. Atomic date types vary according to implementation
language.
40 SYSTEMS BUILDING WITH ORACLE
2.4.3 COLLECTIONS
Department
...
workforce <employee>
List attribute
Department
...
workforce {employee}
Set attribute
Collections are held as linked lists in memory. Each node of the linked list points
to the next node. Inside the node is a pointer to the object that is a member of the
collection. The object itself cannot be in the node, since it may be a member of
several collections.
A complex class is one that is made up, at least in part, of other classes. Figure 2.12
is a first attempt at defining the Player class.
Figure 2.12 consists of a rectangular box divided into three sections. The top
section gives the name of the class and must be unique in the eventual class
diagram.
The second section lists the data items associated with the Player class. Some of
them are simple atomic data types such as String or Integer; others consist of other
object types such as Address that will be defined elsewhere in the class diagram.
LIGA UTOPIA 41
Player
Private
Name: string
UFA ID: String
HomeAddress: Address
CorrespondenceAddress: Address
Club: Organization
Squad Number: Integer
Position: PositionTypes
Public
String GetName()
Boolean: SetName(String)
...
In the third section, the method signatures are listed. These are essentially decla-
rations for the methods or program code associated with the Player class. All of
these have return types and some of them accept input parameters. For example,
GetAddress returns an Address object and has no input parameter. SetAddress accepts
an input parameter of type Address and returns a Boolean value indicating success
or failure of the operation.
2.4.5 INHERITANCE
Several of the object typesidentified in the UFL prototype share some common
characteristics. For example, the Utopian Football Association and the Utopian
Football League are both organizations. Players, managers, trainers and referees are
all people and all have to be registered with the UFA.
The object-oriented approach allows the definition of general classes and then
other more specialized classes that inherit the attributes of the general class. The
specialized class may then have other data attributes and methods that relate to its
more specialized role in the application. A general class definition may be checked
for accuracy and completeness. Developers can then be assured that every subclass
that inherits from it has the same accuracy and completeness. It is a good way to
ensure uniformity of design and to avoid errors either in the scope of data attrib-
utes or in the program code for inherited methods.
42 SYSTEMS BUILDING WITH ORACLE
Person
ufa_id string
surname string
forename string
employed <contracts>
...
SetSurname (string)
SetUfa _ID(string)
...
Player
Coach
squad_no integer
CoachCertificate date scored <goals>
... cases<discipline>
...
SetCoach Certificate MakeContract (string)
GetCoachCertificate SelectForTeam (integer)
... Score(goal)
...
The Player class has extra data attributes, squad_no, scored and cases, and three
methods, MakeContract, SelectForTeam and Score. The Coach class has a different
specialized date attribute called CoachCertificate and two specialized methods
called GetCoachingCertificate and SetCoachingCertificate (Figure 2.14).
2.4.6 POLYMORPHISM
Person
ufa_id string
surname string
forename string
employed <contracts>
...
SetSurname (string)
SetUfa_ID(string)
...
Player
Coach Match
squad_no integer
CoachCertificate date scored <goals> match_id integer
... cases <discipline> matchdate date
... scored <goals>
MakeContract (string) ...
... SelectForTeam (integer) SetMatch _ID(string)
Score(goal) SetMatchDate (date)
... ...
...
...
name string
allowed {goalkeeper, started date
stadium string
defender, midfield, finished date
division string
forward}; ...
manager coach
trainers <coach>
... squad <players>
... plays <matches>
...
SetName(string)
SetStadium (string)
...
Clock
Private
Int hour;
Int minute;
Int second;
...
public
displayTime()
...
analogueClock digitalClock
... ...
Public public
displayTime() displayTime()
... ...
11:00 p.m.
the subclass it takes precedence over any inherited attribute or method with the
same name when the class is instantiated as an object.
The Clock class (Figure 2.15) has a displayTime method that does not perform any
action. The analogueClock class and the digitalClock class both inherit from the
Clock class. Each of the descendants has a method, displayTime, which overrides
the inherited method.
LIGA UTOPIA 45
UML
OBJECTS
2.6 Î QUESTIONS
2.7 Î EXERCISE
DATABASE SYSTEMS
CHAPTER 3
DATA MODELLING
The tendency has always been strong to believe that whatever received a name must be an
entity or being, having an independent existence of its own. And if no real entity answering to
the name could be found, men did not for that reason suppose that none existed, but
imagined that it was something peculiarly abstruse and mysterious
John Stuart Mill, A System of Logic
The Object Diagram derived in Section 3.4 will form the basis of the analysis for the
creation of a database schema for the UFL application. Because it follows object-
oriented principles, it also contains a great deal of the specifications for the
processing logic of the proposed system. Each of the object definitions shows the
object’s data elements as well as the methods it contains.
However, the system will be implemented by storing the data in an Oracle rela-
tional database. While this product does allow for the storage of data and, to some
extent, the storage of standard procedures, it is not a fully featured object database
and there will have to be some compromises made as the original design progresses
into reality.
The object diagram will be retained as an important systems design document. It
provides an inventory of the data and processes in the developing system. It will be
used to audit the database schema as well as the functionality of the Java or Visual
Basic programs.
The first step is to take the data parts of the object diagram and to prepare a
diagram suitable to act as a template for the schema of a relational database.
A diagram will be used because a picture can portray a lot of information – it has a
high information density. A diagram is preferable because, if it conforms to a
49
50 SYSTEMS BUILDING WITH ORACLE
Using a data model provides the designer and the programmer with substantial
advantages: the detailed data design makes use of a few building blocks, and the
manipulation of data is performed in the same ways for all data structures.
The entity –relationship model was originally proposed by Peter Chen (1976) as a
way to unify the network and relational database views. Subsequently there has
been a proliferation of theoretical extensions and practical applications of it.
The model is used routinely for system analysis and design. Every year there is an
international conference devoted to it. It is simple enough that the basic concepts
can be learned readily, yet sufficiently powerful for scientific and technical appli-
cations. Entity–relationship is often abbreviated as ER. One refers therefore to the
ER model.
3.1.1 ENTITIES
years ago. Clearly, it cannot contain the values Blue or 150. Club has a value within
a list of the valid names of clubs. The relation Players has six attributes. Each
attribute has a permitted range of values. Attribute names must be unique in the
relation scheme and must describe different aspects of the entity’s nature.
It is important to distinguish a database scheme(logical design) from a database
instance(data in the database at a point in time). A relation-scheme is a list of
attributes and their corresponding domains. A relation is created in the database
and contains values. Its structure is based on the relation-scheme (Figure 3.3).
Players (players-scheme)
1 v1 is a value within the domain D1, v2 is a value within the domain D2 and so on. The
symbol Î means ‘is a member of’ or ‘is an element of’: 1 Î {1..9}
52 SYSTEMS BUILDING WITH ORACLE
When the players relation is stored in the database, it consists of the tuples
t1, t2 and so on. Each tuple has the values indicated
The Players relation consists of tuples, and tuples are members of the Players rela-
tion (Figure 3.4).
Relation-schemes define the tuple structure in terms of attributes and domains.
Relations are based on relation-schemes. Relations consists of tuples which contain
values in conformance to the specification of the relation-scheme (Figure 3.5).
The character Î means ‘is a member of’ or ‘is an element of’, so this formula
means that the tuple defined by the values given is an element of the rela-
tion players. The elements of a set are enclosed in {}; t1 is a member of the
relation Players.
The first step in producing the entity–relationship diagram is to identify the enti-
ties: the real-world objects, events or concepts specified in the class diagram.
Certain of the classes may not have an independent existence but are included
only to provide a user-defined data type. An example of this is the address class,
which is used to specify a consistent compound data element within other classes.
The Golden Rule to start this process is that classes will generally equate to entities
(Figure 3.6).
DATA MODELLING 53
Figure 3.6 UFL entities. Translating the object diagram gives nine entities.
The UFL object diagram has nine major classes: Seasons, Clubs, Players, Matches,
Contract, Discipline, Positions, Teamsheets and Goals.
3.2 Î RELATIONSHIPS
One entity may be involved in several relationships. In one instance, all the
members may be involved (Figure 3.7). All players, for example, have a contract.
Not all players score goals. A different subset of players receive match suspensions
for disciplinary offences arising from matches. Roles can be defined in such cases
(Figure 3.8).
The roles of player are employee, representative, scorer or defendant. This is more
satisfactory than partitioning employees, representatives, scorers and defenders into
four distinct non-overlapping entity sets.
Sometimes the same entity set can appear more than once in a relationship,
playing different roles (Figure 3.9).
The entity set Person appears twice in the relationship in Figure 3.9 and plays the
roles of Father and Child in the first and second occurrences respectively. Such rela-
tionships are also called recursive.
Dimensionality is the number of different entity sets participating in a relation-
ship. It can be one or more, as should be clear from previous examples. A recursive
relationship is unary. A relationship of dimensionality two is called binary. A rela-
tionship of dimensionality three is called ternary.
3.2.1 CARDINALITIES
A relationship has certain numerical characteristics that indicate the nature of the
roles played by the entity sets that take part in the relationship. The minimum and
maximum occurrences of each entity set in the relationship indicates its cardi-
nality in the relationship.
The relationship coverage in Figure 3.10 is total on matches. Every match must
have at least one player. If it is possible that one player may be selected for zero
matches, the relationship coverage is partial on players. This aspect of the relation-
ship is also called the participation or ordinality. The ordinality can only be zero
or one. If the ordinality is zero the participation is optional; if it is one, the partici-
pation is mandatory.
An entity’s arity in a relationship is the maximum number of its instances that
can refer to one instance in the other entity set. It is important only to distinguish
between one and many.
In the players/matches relationship, players has arity n and matches also has arity
n. An equivalent but more concise way to express the combination of participation
and arity is to define two integer numbers for every entity set in a relationship,
called the minimum and maximum cardinality, abbreviated as MinC and MaxC.
They represent the minimum and maximum occurrence of any member of the
entity set in the relationship. They are normally written within square brackets –
[MinC, MaxC] – in the relationship definition (Figure 3.10).
MinC of an entity set indicates the absolute minimum number of instances from
the other entity set that must be assigned to every one of its members. MaxC of an
DATA MODELLING 55
entity set indicates how many of its members can refer to one member of the other
entity set.
When determining the participation in a relationship between entity set A and B,
look to see how the word every or the phrase at least one can be used with an appro-
priate verb to describe the relationship. Alternatively, look to see how the word zero
can be used with an appropriate verb (Figure 3.11).
When determining the arity of the relationship of A to B, look to see if the word
‘many’ or the phrase ‘only one’ can be used with an appropriate verb (Figure 3.12).
3.3 Î KEYS
Relationship Arities
Players PLAY IN Matches N:M
Players HAVE Contracts 1: N
Person IS THE FATHER OF Person 1: N
would effectively be describing the same real-world object or concept and one of
them would be redundant.
3.3.1 SUPERKEYS
All relations have one or more attributes which, when taken together, provide a
unique set of values which serve to set each tuple apart from its neighbours within
the entity set. Any set of attributes that serves in this way is called a superkey.
In the sample Players relation in Figure 3.1, the Nationality attribute on its own or
even in combination with the Club attribute does not contain values that differen-
tiate between the tuples shown. However, the set of attributes {Surname, Nation-
ality, Club} does distinguish the rows and for this fragment of the relation is a
superkey. Figure 3.16 shows other examples of attribute sets that are superkeys for
this fragment.
[Dateofbirth]
[Surname, Dateofbirth]
[Surname, Forename, Dateofbirth]
[UFA_ID, Club]
[UFA_ID]
However, a superkey should serve not just for this relation fragment, but also for
the whole relation. Players will contain tuples with attribute values over the whole
range of their respective domains. Clearly, there will be several players who share
the same birthday, and even a pair of twins with the same surname and birthday.
By taking into account the possibility of attributes displaying the range of their
domain values, a more precise definition of a superkey may be offered.
This definition still yields many superkeys, some of which consist of attribute
sets that require several facts to be known about a player before a search can be
mounted in the relation. A superkey that minimizes the search information needed
is clearly of interest (Figure 3.17).
[UFA_ID, Club]
[UFA_ID, Surname, Club]
[UFA_ID]
[UFA_ID, Surname, Dateofbirth, Club].
[Surname, Forename, Dateofbirth]
The superkey [UFA_ID] is issued by the Utopian Football Association for each
player and is based on an attribute set with one member, containing a unique iden-
tifier, in a domain where duplicates are not allowed. Three other superkeys contain
this attribute in combination with other attributes. Any set of attributes that
contains a superkey is also a superkey.
The central interest is to pick a number of superkeys which do not contain super-
fluous attributes and which might serve as the principal identifier for each row
when it is stored in the database. Each of these optimized superkeys has the poten-
tial to be chosen and is called a candidate key. In order to give a rule for minimizing
the number and size of the superkeys, a new definition is introduced.
Three of the superkeys in Figure 3.17 cannot become candidate keys because they
each contain the subset [UFA_ID], which is itself a superkey.
Figure 3.18 shows the candidate keys and either may be chosen as the unique
identifier when the players scheme is defined in the database. UFA_IDs are known
to be unique and the combination of Surname, Forename and Dateofbirth should be
unique in the Utopian Football League.
[UFA_ID]
[Surname, Forename, Dateofbirth]
When one of the candidate keys is chosen by the database designer as a unique
identifier for a table (relation), it becomes a Primary key. DBMSs allow the specifi-
cation of a primary key at the time that the table is created. The DBMS will then
check every insertion of a row for a unique value in the primary key attribute.
Not all entity sets, although they may have candidate keys, have a primary key
chosen by the database designer. An entity set that depends on another entity set
for its existence may not have a primary key specified at the time the table is
created.
STRONG AND An entity set which has a primary key is a strong entity set.
WEAK ENTITIES
An entity set which does not have a primary key is a weak
entity set
Clearly, a weak entity set cannot be left as a table definition without some sort of
key to link it into the schema of relationships. If this were done, there would be no
60 SYSTEMS BUILDING WITH ORACLE
way to distinguish its rows. The important characteristic of a weak entity set is that
is depends for its existence on its relationship with a strong entity set.
Consider the entity set Goals. It is clear that a goal cannot exist without a player
to score it (Figure 3.19).
A link can be created within the relation Goals by inserting an attribute indi-
cating the identity of the scorer. The only sure way of identifying that player is to
use the primary key of the Players table.
Figure 3.20 shows fragments of the Players and Goals relations as they might be
implemented in the database. Players has a primary key, [UFA_ID]. Goals has no
primary key but the primary key of the player who scored each goal has been
inserted into the Goals table.
An attribute of an entity set that is the primary key of another entity set is called a
Foreign key. Notice that, while the [UFA_ID] attribute in the Players table has
unique values, the foreign key [UFA_ID] attribute in the Goals table has duplicated
values. This enables, for example, Pierre Diocletian to score three goals. The relations
Players and Goals are members of the 1:n (one-to-many) relationship, SCORES.
In the same way, a disciplinary hearing cannot exist without a player who is
accused of an offence. Players are quite capable of having a completely unblem-
ished record with no disciplinary hearings. In the relationship between Players and
Discipline, Players is the strong entity. It has a primary key. Each row in Discipline
DATA MODELLING 61
must contain a reference to the player who committed the offence. Discipline is
dependent on Players. It is a weak entity.
Characterizing relations as strong or weak depends on the role they play in
particular relationship. The answer to identifying strong and weak relations is to
look at the cardinality of the relationship. An entity with a [0,1] role in the rela-
tionship is very definitely strong. It can exist with out being connected to any
instances of the second relation. A MinC value of zero gives the clue. An entity with
a cardinality of [1,N] is just as definitely weak. It must be associated with at least
one row of the other relation in order to exist, and furthermore, more than one of
its rows can refer to the same row in the other table. A MinC value of 1 and MaxC
value of N give the clue.
Remember that the cardinality of a relation is about the role it plays in a partic-
ular relationship. An entity may display different cardinalities in different relation-
ships. Identifying an entity as strong or weak depends on which relationship is
being considered. Figure 3.21 shows two roles that Orders takes in two different
relationships.
In the first relationship, a Customer can exist without making any orders.
Customers is the strong relation and Orders is weak. In the second, a row in Lineitems
cannot exist without being associated with a row in Orders. In this relationship,
Orders is strong. The values of MinC and MaxC determine the strong or weak status.
If the MinCs for the two relations are 0 and 1, then the relation with zero partici-
pation is strong. If both MinCs are 1, look at the MaxCs. If the MaxCs are 1 and N,
Then the relation with arity 1 is strong. If both MaxCs are N then the relationship is
M:N and must be decomposed before strong and weak characteristics can be
determined.
Oracle allows the database designer to nominate attributes in weak entity sets as
foreign keys when the table structure is created. Such nomination will imply the
imposition of integrity rules into the schema.
Since tuples in the weak relation depend on the strong relation for their exis-
tence, if a tuple from the strong relation were deleted, any tuple in the weak rela-
tion that contained a reference to the deleted row would also have to be deleted.
Deleting rows in the weak relation might then trigger further deletes in other rela-
tionships where it played a strong role.
It should also be impossible to insert a row in the dependent table unless a row in
the strong relation already exists with an appropriate primary key.
62 SYSTEMS BUILDING WITH ORACLE
These two rule examples indicate that the database is ensuring referential integ-
rity. Foreign key values in a weak relation cannot exist without a counterpart in the
primary key values of the strong relation.
Over the years since Chen first proposed this method of diagramming a database
design, a number of academics, systems analysts and writers have published their
own ideas on how an ER diagram should look. This book uses what is known as the
Information Engineering approach. The principal difference from Chen’s original
proposal is that the cardinalities in the relationships are coded into the line
symbols.
The Clubs entity, shown in Figure 3.22, has four attributes. The primary key is
Name. All of the entities in the UFL schema design are shown in detail in the
appendix. All of the relationships are shown in the full entity–relationship
diagram, given later in Figure 3.32.
Division
Stadium
Clubs
Name
Manager
Figure 3.22 ERD entities and attributes. An entity is shown as a rectangular box
with the name of the entity in it. Attributes are show as ovals connected to the
entity. Keys are underlined.
DATA MODELLING 63
Figure 3.23 shows an ERD for a typical 1:n relationship. Players must have at least
one contract and many players cannot be referred to by one contract. The cardi-
nality of the relationship is [1,1] on Players and this is shown by the double bar
across the line connecting the Players entity to the relationship.
Must have
Players Contracts
Must have
[1, 1] [1, n]
A contract refers to at least one player but many contracts can refer to the same
player. The cardinality is [1,n] on Contracts and this is shown by the bar and the
crow’s foot on the line connecting Contracts to the relationship.
Figure 3.24 shows that a player may have zero discipline cases and many players
cannot be defendants in one case. The cardinality of Players is [0,1]. The optional
participation of players in the relationship is shown by the zero over the line
connecting Players to the relationship.
Must have
Players Discipline
Must have
[0, 1] [1, n]
A discipline case must involve at least one player and many cases can involve the
same player. The cardinality of Discipline is [1,n] and the symbols on the
connecting line show this.
When a 1:N relationship is proposed, the full ER diagram must show the primary
key attribute of the strong entity and the foreign key attribute of the weak entity.
The participation of an entity in a relationship can only have one of two possible
values, 0 or 1. The arity of an entity can have one of two possible values, 1 or n. The
cardinality of an entity in a relationship therefore can have only four possible
combinations. There are only four different line encodings to consider. Figure 3.25
shows the diagram conventions for these cardinalities.
64 SYSTEMS BUILDING WITH ORACLE
Figure 3.26 shows an ER diagram for the players/matches relationship. In this rela-
tionship, a player plays in a minimum of 0 matches and a maximum of n matches.
A match must have a minimum of 1 player and a maximum of n players.
Must have
Players Matches
Must play in
[0, n] [1, n]
multiple foreign keys offends against the definition of an entity, which can only
have one occurrence of each attribute in its attribute set. There appears to be a
paradox.
The answer is to decompose or resolve the M:N relationship, creating two 1:N
relationships and a third entity to express the foreign keys of both Players and
Matches (Figure 3.27).
The teamsheet entity records the players selected for a particular match and the
matches in which a player appears. It must have at least two attributes: a foreign
key to Players and a foreign key to Matches. Whenever decomposition is performed,
the cardinalities follow the same pattern (Figure 3.28).
The original relations have the same participation in their new relationships as
they had before. Players has an optional participation with Teamsheets and Matches
has a mandatory participation. The arities of the original relations in their new
relationship is always 1. The new relation always has mandatory participation and
an arity of n in its relationships with each of the original relations.
Players and Matches are unchanged. They retain their previous primary keys and
are free to participate in other relationships with other entities. Teamsheets
depends upon the M:N relationship between Players and Matches for its existence.
This dependency is expressed through the foreign keys it contains. It has no other
attributes, so it may be termed an abstract entity. In the UFL application, it serves no
other purpose than to link players and matches and to express their M:N
relationship.
66 SYSTEMS BUILDING WITH ORACLE
Entity
There are at least three such entities in the full UFL diagram. Contracts has
already been mentioned in an example of a 1:N relationship with Players. However,
Contracts needs not just a player but also a club to be complete (Figure 3.30). The
Contracts entity has a 1:N relationship with Clubs. It also has attributes of its own:
Joined and Left which show when a player joined and left a club. It tracks the
player’s career and the club’s employer history.
Must have
Players Contracts
Must have
[1, 1] [1, n] [1, n]
Must have
Clubs
Must have
[1, 1]
Recording goals is a vital part of the UFL scenario. It is not good enough just to
put the final score in the match entity. Players need a record of their goals as well.
This 1:N relation has already been shown as an example and Goals is part of another
1:N relationship with Matches.
DATA MODELLING 67
But just a minute! The early part of this section was spent proving such a relation
and inventing a new entity, Teamsheets, which services just this connection. Is
there not a way in which Goals and Teamsheets could be combined? This would
certainly save a lot of separate process programming and, in a real application,
space.
In this case, the answer is, unfortunately no.
The reasons are to be found in Figures 3.30 and 3.31. Players, if selected, can
appear only once in a match, but they can score any number of goals. In addition, if
they are selected, they may not score any goals in a match. A match may not have
any goals but it must have players. The participation of matches in the two M:N
relationships is different. The two relationships, Teamsheets and Goals are
describing completely different aspects of the connection between Players and
Matches.
Must have
Players Goals
May score
[0, 1] [1, n] [1, n]
Must have
Matches
May have
[0, 1]
In fact, the rule that players can only be selected once for a match should be
enforced by giving teamsheets a compound primary key {UFA_ID, Match_id}. This
would ensure that the combination could only occur once in the Teamsheets
relation.
This would not work for goals. The compound key would prevent the recording
of more than one goal by a player in any one match. Further, Goals has other attrib-
utes of its own. A record is kept of the minute the goal is scored and its type – goal,
penalty, own goal. Goals is not an abstract entity. If it could be combined with
Teamsheets, these attributes would be wasted for most of the rows in a combined
relation.
Players and Matches have two quite different M:N relationships, which, for struc-
tural and semantic reasons, require two separate resolutions into 1:N relationships.
68 SYSTEMS BUILDING WITH ORACLE
The full entity–relationship diagram for the UFL application is shown in Figure
3.32. The nine entities are present, participating in ten relationships. Each entity
either has a primary key or has a foreign key to bind it into a relationship. The
cardinalities are shown in the lines that connect entities with relationships.
Season AwayClub
Forename
Players Must have Positions
Nationality
Must have
Division
Squad_No Stadium
Position
Name
UFA_ID UFA_ID
Discipline Joined
Date Club Manager
Left
Result
Some entities have more than one key. Players has a primary key, UFA_ID, which
is used to express a strong relationship with Discipline, Goals, Contracts and
DATA MODELLING 69
Teamsheets. It has a foreign key, Position, which is used to express its weak relation-
ship with Positions.
Matches takes part in two relationships, with Teamsheets and with Goals, where
its primary key expresses a strong relationship to these entities. It also takes a weak
role with Seasons: a foreign key links each tuple to the permitted values for the
season in which a match is played.
When an otherwise strong relation also participates as the weak side of a relation-
ship, it is often a way of enforcing domain value restriction on an attribute.
There are some aspects of the UFL diagram that need clarification and explanation.
Players has an attribute, Club, which is not used in any relationship. Yet this
attribute relates to the primary key of Clubs and shows the current club of a player.
It is a calculated attribute and strictly does not belong in Players.
Calculated attributes can always be inferred from other values in the database.
The current club of a player can be found by examining the Contracts relation for a
player’s career history. The tuple with no date value in the Left attribute will show
to which club he is currently attached.
There is some risk in including a calculated attribute. If the player moves club,
the Left attribute will be updated: a new Contracts tuple will be created and inserted
into the Contracts relation to show the new club and the date he joined. If the
calculated attribute is not updated in the same operation, there is a danger of the
database becoming inconsistent. The Players relation will show one value for his
current club and the Contracts relation will show another.
Calculated attributes are strictly outside the formal definitions of entities and
their tuple structures. They also offend against the principals of normalization,
outlined in the next chapter.
They are sometimes used in production databases, when performance issues are
critical, as a way of reducing the number or complexity of queries submitted by
users.
Leaving the attribute Club in the Players relation gives an opportunity later to
illustrate the trigger facility of Oracle, which can be used to assure database consis-
tency by preventing unwanted anomalies from occurring.
3.6.2 SUSPECTED M: N
Surely a club plays in many matches and matches must be between at least and no
more than two clubs. Is there not an M:N relationship here that has been mistak-
enly characterized as 1:N? Moreover, are not the attributes HomeClub and
AwayClub describing essentially the same characteristic of Matches, therefore
70 SYSTEMS BUILDING WITH ORACLE
offending against the rules for entity structure? If so, this should have been
resolved into the additional relationship of Figure 3.33.
Must have
Matches Fixtures
Must have
[1, 1] [1, n] [1, n]
Must have
Clubs
Must have
[1, 1]
If the resolution is carried out, Fixtures will have two attributes, for foreign keys
to Matches and Clubs, and must also carry an attribute indicating whether the club
is home or away in this fixture. A fixture will consist of two tuples for each match.
However, it is known that a single match will only ever be related to two clubs
(i.e. n is 2). It can be argued that one match will relate to only one member of Clubs
in the role of Home and to only one member of Clubs in the role of Away. The arity
of clubs in the home role is 1 and in the away role is 1. Many matches can relate to
one club in the home role and many matches can relate to one club in the away
role. The arity of matches is n with respect to each of the roles of Clubs.
Matches is involved in two parallel 1:N relationships with Clubs (Figure 3.34).
These are expressed in one foreign key to Clubs in the Home role and another
foreign key to Clubs in the Away role. The attributes HomeClub and AwayClub are
foreign keys for each of these two relationships respectively and offend against
neither the rules of entity structure nor normalization.
Must have
Clubs Matches
Appear in
[1, 1] [1, n]
Primary key The candidate key chosen as a unique identifier when a table is
created in the database.
3.8 Î QUESTIONS
(i) Explain what is meant by the terms attribute, domain and tuple.
(ii) Explain what is meant by the terms strong relation and weak relation . Explain
how an entity can simultaneously be a strong and a weak relation.
(iii) Explain the terms participation , arity and cardinality.
(iv) Explain why an M:N relationship should always be decomposed in an ERD.
72 SYSTEMS BUILDING WITH ORACLE
3.9 Î EXERCISES
NORMALIZATION
It is often stated that of all the theories proposed in this century, the silliest is quantum theory.
In fact, some say that the only thing that quantum theory has going for it is that it is
unquestionably correct.
Michio Kaku, Hyperspace
E. F. Codd’s first article on the application of set theory to stored data led to the
development of the relational data model. Part of this application of set theory
gives the definitions of entity sets participating in relationships. By using just
these definitions, the entity–relationship diagram can illustrate the entity types in
a database schema and the relationships that exist between these entity types.
If the entity–relationship model is derived from mathematical principles, is
there a way to check our data analysis in a rigorous way to ensure its correctness? In
fact, the entity–relationship model depends on this rigour at the initial stages to
ensure that the query language, SQL, works properly.
The query language is based on defined ways to manipulate sets, subsets and the
relationships between them. It is founded on the relational calculus: a series of set
manipulation functions within set theory that, in turn, depend upon the sets being
well formed. The sets – their attributes, the primary and foreign keys, the relation-
ships in the schema – must conform to certain and definite rules that enable the
database design to be checked for accuracy, consistency and correctness.
These rules define a series of so-called Normal Forms that allow the proposed
entity sets and their attributes to be verified against the underlying mathematical
principles of set theory and the needs of a query language based on the relational
calculus.
73
74 SYSTEMS BUILDING WITH ORACLE
The first rule is that attributes of an entity should have atomic values and that each
attribute should occur only once in the attribute set for each entity. Attributes
describing the same characteristic of an entity must occur only once.
Version I of the Players relation in Figure 4.1 attempts to record the disciplinary
history of each player. In the Utopian Football League, discipline is generally good
and few players get more than two suspensions in their career. Disciplinary hear-
ings are held following a referee’s report and generally result in a match suspension
for the player.
This design allows the date and result of any number of hearings to be recorded
for each player. However, there are attributes that contain multiple values. If this
table were to be queried for the dates of Gregory’s hearings, a single string value –
the concatenation of the two dates – would be retrieved.
The same thing will happen for a query on the outcomes of his hearings. Further,
it is not clear what the names of the attributes are. There seems to be an attribute
called Disciplinary hearing divided into two sub-attributes: Date and Result. All of
this offends against the rule that attributes must be single-valued and that these
values must come from one domain. Entities must have atomic attributes.
The second design (Figure 4.2) attempts to rectify this by allowing two hearings
to be recorded. Each of the attributes has distinct names; the values in each are
atomic and come from single domains.
This relation is in trouble already, because if Gregory is involved in another inci-
dent there will be no space to record the hearing, and for three players space is
wasted because they have led blameless lives (on the pitch at least!).
NORMALIZATION 75
A query on this table would have to refer to what is essentially the same charac-
teristic of a player by two different attribute names. Moreover, it can be seen that
details of disciplinary hearings are not something that every player has.
Although this is a design that a relational database would accept, it has logical
faults in terms of the definition of an entity. As far as the rules for the construction
of an entity set are concerned, dh1_date and dh2_date are repeated attributes and
may not even be attributes for some players.
The solution to this is to recognize that Discipline is in fact a separate entity from
Players and has an attribute set with two members, Date and Result (Figure 4.3). It
has a relationship with Players, expressed through a foreign key.
Discipline is a weak relation, dependent on the strong relation, Players. The third
design splits the original versions into two entities, adding a foreign key to Disci-
pline that indicates the row in Players for whom this is a history.
The basis of the rest of the normalization process is about the connection between
key and non-key attributes in a relation. There are difficulties that can arise later
when the query language is applied to an implemented relation if this connection
is not well-formed.
The particular problems associated with relations that do not have a well-formed
key are known as update anomalies and they can arise when tuples are inserted,
deleted or modified.
Figure 4.4 shows a tentative design for the contracts relation that records the
employment history of players. It identifies the player, shows when he joined or
left a particular club, and names his manager.
If a well-formed key is not chosen for this relation and for the other entities with
which it has a relationship, it can exhibit all the different forms of the update
anomaly.
It may be possible to insert a new contract for a player who does not exist in the
Players relation and assign him to club that does not exist in the Clubs relation.
If the player with UFA_ID UT23685 leaves Beauty United, the Left attribute in the
Contracts relation must be updated. If an update to the contracts relation is
attempted, putting a date value into the Left attribute but identifying the tuple
only by UFA_ID = ‘UT23685‘, three tuples will be modified instead of just the one
intended. What is worse, the data in two of the tuples will be overwritten and
previous values lost.
If this same player were to change his surname, the Player relation would be
updated. This would have no effect on the Contracts relation. There would then be,
within the database, a UFA_ID referring to a player by two different surnames in
two relations.
In the Contracts relation (Figure 4.4), it can be seen that if a set of attributes is
considered so that X is {UFA_ID, Club} and a second set, Y is {Surname, Manager}, the
same combination of UFA_ID and Club values implies the same values of Surname
and Manager in combination (Figure 4.5).
And
So
At first glance it might seem probable that the two attributes keys UFA_ID and
Club would play a deterministic role in the contracts relation. However, looking at
the career of the player Erasmus, it can be seen that he rejoined Purity FC after a
spell at Sobriety FC. The same combination of attributes for UFA_ID and Club do
not determine the same combined set of values for Surname, Joined, Left and
Manager. Therefore there is no functional dependence between these two sets of
attributes.
There may be more than one functional dependency within a relation.
Remember that the definition is in terms of a set of attributes influencing another
set of attributes. Just as several superkeys could be determined in a relation, more
than one functional dependency can probably be found (Figure 4.7).
Reflexive rule
If Y is a subset of X, then X ® Y
Augmentation rule
If X ® Y, then XZ ® YZ
(Notation: XZ stands for X ® Z)
Transitive rule
If X ® Y and Y ® Z, then X ® Z
From the first three rules, it is possible to infer three more rules, shown in Figure
4.9.
Closure of a set F of functional dependencies is the set F+ of all the functional
dependencies that can be inferred from F using Armstrong’s inference rules. If we
look at the rule examples above, then we have at least the functional dependencies
of Figure 4.10).
Second Normal Form is a check that we have chosen a key attribute or set of attrib-
utes that determines the values of all of the non-key attributes in the relation. It
also specifies that the non-key attributes must be functionally dependent on the
full key. If we have a single attribute as a superkey then that poses no problem,
because if a single attribute determines all the other attributes, they must be func-
tionally dependent on the full key. However, all the functional dependencies in
the Contracts relation that would yield a potential candidate key involve more than
one attribute.
There are several superkeys, based on functional dependencies that relate all the
attributes in the relation (Figure 4.11). In all cases, not all of the non-key fields are
NORMALIZATION 81
Decomposition rule
If X ® YZ, then X ® Y and X ® Z
Union rule
If X ® Y and X ® Z, then X ® YZ
Pseudo-transitivity rule
If X ® Y and WY ® Z, then WX ® Z
{UFA_ID} ® {Surname}
{Club} ® {Manager}
leaves us only two superkeys, {UFA_ID, Joined} and {UFA_ID, Left}, that might
become candidate keys.
Although a relation may be in 2NF, there may be non-key attributes that have a
deterministic relationship. The second design still allows updates of the Manager
attribute in Contracts without synchronizing the Manager attribute in Clubs,
causing an anomaly
The revised Contracts relation exhibits a dependency between the non-key fields
Club and Manager. The Manager attribute has a transitive dependency on the key
(Figure 4.14).
The way to remove the dependency is again to leave the LHS of the transitive
dependency in the relation and create a second relation based on the dependency.
When this is done, it must be remembered that duplicate tuples are not allowed or
the new relation will not have a candidate key.
Figure 4.15 shows the revised Contracts relation. The transitive key, Club, and its
dependent attribute, Manager, form the basis of the Clubs relation. Removing a
transitive dependency always yields two relations with a strong/weak relationship.
The transitive attribute is a foreign key in the original relation; it is a primary key in
the new relation.
4.6 Î BOY C E – C O D D N O R M A L F O R M
This rule is important only when the candidate keys are composite and they
overlap. That situation does occur in Contracts, but none of the parts of the candi-
date keys is a determinant. There is no dependency between UFA_ID (the overlap-
ping attribute) and any of the other attributes.
If the player’s Position had been included in the attributes for contracts, then the
situation would be different. The candidate keys would be the same, but the over-
lapping attribute, UFA_ID, would be a determinant for Position (UFA_ID ® Position).
Since UFA_ID is not itself a candidate key, the relation would not be in BCNF. Not
all determinants would be candidate keys. Boyce–Codd violations are rare when
the removal of partial key dependencies and transitive dependencies have brought
the relation to 3NF.
NORMALIZATION 85
The BCNF rule ensures that all the non-key field are describing an aspect of the
entity in question; it rules out any non-key field that describes a characteristic of a
part of a composite key.
In selecting a primary key, the candidate key {UFA_ID, Joined} is preferable to
{UFA_ID, Left} because the attribute Left can contain the null value for a player who
is still with a club. This latter combination still yields a valid key, but in practical
terms, and, given that there is a good alternative, {UFA_ID, Left} should be rejected
because it may lead to unnecessary complications in queries on the relation if it is
implemented in the database.
The candidate key, {UFA_ID, Joined} is chosen as the primary key of the Contracts
relation (Figure 4.16). This satisfies the rule for Boyce–Codd Normal Form.
It may be that neither of these options is available because the business rule is
essential and because the query is so long that it cannot be rescheduled. In these
circumstances, there is a temptation to use a third tactic – de-normalize the data
structure to enable the query to be broken down into manageable parts.
This route is sometimes taken in large production databases, although, with
increasing CPU and disk speeds, it is becoming less common. The decision to de-
normalize should never be taken lightly because the database’s consistency is at
stake. Introducing de-normalized structures opens the database to the update
anomalies reviewed at the start of this chapter. If the decision is taken and the
affected tables are part of the online database, then it must be accompanied by the
creation of automatic controls to protect the consistency of the data. In Oracle,
these controls are known as triggers and are discussed in detail in Chapter 10.
A more secure option is to create temporary tables that hold the de-normalized
data as a snapshot of the data at a certain time and which can be segregated from all
other users. Such temporary tables will not contain updates made after they were
created, in which case the query must be qualified as having only a temporal
consistency with the main data.
Second Take any attributes that exhibit only partial dependency on the
Normal Form superkeys out of the relation to form a separate relation with
– 2NF the partial key. Leave the partial key in the original relation.
Remove any duplicated tuples from the new relation to preserve
functional dependency.
Boyce–Codd Identify candidate keys from the 3NF relation; ensure that all
Normal Form determinants are candidate keys. Remove any offending
– BCNF dependencies by extracting the dependent attribute to another
relation. Choose a primary key from the candidate keys in the
original relation. Base the choice on minimal attribute set
membership, no null values in keys, and convenience. Assign
the attributes of the primary key as prime.
4.8 Î EXERCISES
(i) Describe the kinds of adverse effects on database consistency resulting from
creating a relation that is not in:
G First Normal Form
G Second Normal Form
G Third Normal Form
G BCNF
(ii) Design the entities for a database to serve one of the following:
G An althletics competition
G A criminal intelligence system
G A car hire company
G Show that your design has all entities in BCNF.
CHAPTER 5
SQL
Tuple relational calculus
Relational calculus
Predicate logic
Relational algebra Propositional logic
Algebra Calculus
88
THE ROOTS OF SQL 89
5.1 Î CALCULUS
The word calculus has come to mean a way of calculating. Differential calculus is
applied to very small differences of values. It allows a study of the relationships
between quantities. When the relationship is expressed in an equation involving
two or more quantities that are capable of containing values varying over a contin-
uous range, the rate of change of values (gradient) can be determined through this
method.
When the quantities may hold only one of two values, true or false, the problem
becomes one of logic; other methods need to be adopted.
A sentence composed of only one atom can only have one of two values, true or
false. A sentence composed of two atoms, each having two possible values, can
have four states and so on. A truth table can demonstrate the semantics or meaning
of expressions in propositional logic.
The two propositions, p and q, can each have two possible values; so there are
four combinations. Only if both p and q are true can the sentence pÙq be true. The
truth table that defines the basic propositional connectives is outlined in Figure
5.4.
p q Øp Øq pÚ q pÙq pÞq p Ûq
T T F F T T T T
T F F T T F F F
F T T F T F F F
F F T T F F T T
$ there exists
" for all
"c$ s1$ s2(likes(s1,c) ÙØ likes(s2,c)) For each course c there exists a student s1
that likes the course, and a student s2 that
doesn’t like it
The sentence in Table 5.2 must be capable of resolving into either a true or a false
value by reference to the Courses relation and the likes and dislikes of the Students.
Some more examples of predicates are given in Figure 5.6.
For all courses in set c whose elements are BIS101 and BIS106, there is a student who
likes all the courses in c and a student who dislikes all the courses in c. The targets of
the query are the two free variables, s1 and s2; c is a quantified variable (Figure 5.7).
The student set consists of {Bill, Jana, Wilson}. The values of the likes and dislikes
predicates are shown in the truth table in Figure 5.8.
Bill likes both courses so, in the s1 role, his likes predicate is true. The dislikes
predicate for Bill in the s2 role is false.
The domain of a query consists of the tuples that may be assigned to the free
variables of the formula (Figure 5.9)
There is a Cartesian product between the students in two roles, s1 and s2 and it
will have nine different combinations.
An assignment of values to the free variables of a formula is a tuple that provides
a true or false value to the formula. The selection of a query is defined by the set of
assignments to the free variables that satisfy the formula when the two predicates
are connected by the conjunction operator.
Figure 5.8 shows the likes and dislikes values that each of the students brings indi-
vidually to the s1 and s2 roles. The final evaluation in Figure 5.10 is for the
conjunction operator, Ù. In order for a tuple to evaluate true there must be a true in
the s1 role and a true in the s2 role.
s1 role s2 role Ù
Jana F Jana F False
Bill T Bill F False
Wilson F Wilson T False
Jana F Bill F False
Jana F Wilson T False
Bill T Jana F False
Bill T Wilson T True
Wilson F Jana F False
Wilson F Bill F False
Only one tuple of the Cartesian product returns a true value. Therefore the selec-
tion set is as shown in Figure 5.11.
s1 s2
Bill Wilson
The tuple relational calculus is simply a way of formally defining the result of a
query. It has nothing to say about the mechanism by which the selection set can be
achieved. Because it is composed of a specified grammar, the set can be defined
unambiguously.
94 SYSTEMS BUILDING WITH ORACLE
This would not be the case if the set were to be defined, say, in English, which is
subject to international, national and even regional differences in usage.
Other forms of syntax have been developed which also rely on a specification of
the resultant set. Among these is the Structured Query Language (SQL). A formal
sentence in SQL is composed of three main clauses, as shown in Table 5.3.
Clause Purpose
Select ... Define the selection set in terms of free variables
From ... Define the domain in terms of bound variables (relations)
Where ... Define the formula relating the bound and free variables
The project operator extracts entire attribute columns from a relation into a
resultant or intermediate set. A projection may be specified on just one attribute or
an attribute set. Project is applied to all the tuples of a single relation.
The select operator extracts entire tuples from a single relation into a resultant or
intermediate relation. Tuples are extracted when they meet a selection criteria
based on values in one or more attributes.
The union operator combines two relations, which must consist of the same
attribute sets, into a single resultant or intermediate relation consisting of all of the
tuples from the two original relations and with the same attribute set.
The difference operator compares two relations, R and S, which must also
consist of the same attribute sets, and extracts all the tuples in R that do not also
occur in S.
The Cartesian product operator combines all of the tuples in relation R with all
of the tuples in relation S. The resultant set consists of the sum of the attribute sets
of the contributing relations. The number of tuples in the result will be the number
of tuples in R multiplied by the number of tuples in S.
The union and difference operators are drawn straight from set theory. A relation
is a set of tuples, so set operations should apply. The result of combining two rela-
tions with a set operator is another relation. All of the elements of the new relation
must be tuples, having the same structure, so the scope of set operators is limited to
union compatible relations. Two relations are union compatible if both have
same number of attributes, the names of the attributes are the same in both and
attributes with the same name in both relations have the same domain.
From the five basic operators, a further five secondary operators can be derived.
The intersection operator is also drawn from standard set theory and can only be
applied against two union compatible relations.
INTERSECTION R ∩ S R – (R – S)
The set of tuples in R which also occur in S
96 SYSTEMS BUILDING WITH ORACLE
THETA JOIN R F S sF (R × S)
The select condition, F, is a conjunction of terms of the
form A op B where A is an attribute in R, B is an attribute
in S and op is one of =, <, £, >, ³, ¹
EQUIJOIN R A S sA (R × S)
The select condition, A, is a conjunction of equalities
NATURAL JOIN R A S sA (R × S)
The select condition equates all and only those attributes
in R and S with the same name
The three set operators within the relational algebra may be illustrated by use of a
Venn diagram.
There are two sets or relations R and S represented by two circles containing the
tuples that are the members of each set. Some tuples may be members of both sets
but within each set, there are no duplicate tuples (Figure 5.13).
Union R S R»S
Difference R S R–S
Intersection R S
R«S
Projecting two columns, Name and Stadium, from Clubs yields a result set for each
row in Clubs. The project operator creates vertical slices (all rows) through the rela-
tion (Figure 5.15).
The selection criteria in Figure 5.16 use the Oracle operator like which compares
the value in the stadium attribute for each row with the ‘B%’ string. The % symbol
is a wildcard, so the term means any string beginning with a B and followed by
anything. The select operator creates a result set of horizontal slices (all attributes)
through the relation.
98 SYSTEMS BUILDING WITH ORACLE
SQL
Query: Whether the difference between a mere computer and a man of science be not, that
the one computes on principles clearly conceived, and by rules evidently demonstrated,
whereas the other doth not?
Bishop George Berkeley
A method has been established whereby information about our scenario may be
captured in a formal and methodical way. The object-oriented analysis method-
ology leads into using the Universal Modeling Language (UML), as a diagramming
standard.
The UseCase diagrams help define what the UFL application will do; the class
diagram enables a view of the data the application will use.
Because the Oracle database will be used, it has been necessary to convert the
classes’ into logical structures compatible with the relational model. That model
has been defined in terms of entities and relationships and applied to create an
entity–relationship diagram.
The process of normalization then gives a formal method to assure the correct-
ness of the ER diagram. The process of transferring the design schema into reality
can now begin. Oracle establishes and maintains the physical structures that will
become part of the application by means of SQL.
SQL, the Structured Query Language, is a mature, powerful and versatile rela-
tional query language. The history of SQL extends back to IBM research begun in
1970.
In 1974, IBM began the System/R project and developed SEQUEL, or Structured
English Query Language. System/R was implemented as an IBM prototype in 1975.
It was then completely rewritten in 1976–1977 to include multi-table and multi-
99
100 SYSTEMS BUILDING WITH ORACLE
The American National Standards Institute (ANSI) standardized SQL in 1986 and
the International Organization for Standardization (ISO) standardized it in 1987.
The United States government’s Federal Information Processing Standard (FIPS)
adopted the ANSI/ISO standard. In 1989, a revised standard known commonly as
SQL89 or SQL1 was published.
Due partially to conflicting interests from commercial vendors, much of the
SQL89 standard was intentionally left incomplete, and many features were labelled
implementer-defined. In order to strengthen the standard, the ANSI committee
revised its previous work with the SQL92 standard ratified in 1992 (also called
SQL2).
This standard addressed several weaknesses in SQL89 and set forth conceptual
SQL features which, at that time, exceeded the capabilities of any existing RDBMS
implementation. In fact, the SQL92 standard was approximately six times the
length of its predecessor. Because of this disparity, the authors defined three levels
of SQL92 compliance: Entry-level conformance(only the barest improvements to
SQL89), Intermediate-level conformance (a generally achievable set of major advance-
ments) and Full conformance (total compliance with the SQL92 features).
More recently, in 1999, ANSI/ISO released the SQL:1999 standard (also called
SQL3). This standard addresses some of the more advanced and previously ignored
areas of modern database systems, such as object relational database concepts, call
level interfaces and integrity management. SQL:1999 replaces the SQL92 levels of
compliance with its own degrees of conformance: Core SQL:1999 and Enhanced
SQL:1999.
SQL is divided into two main groups of commands. There are commands to define
storage structures – the data definition language (DDL) – and commands to manage
the stored data – the data manipulation language (DML).
The DDL consists of five principal verbs: create, grant, revoke, alter and drop.
These form the basis of commands to create tables, indexes and other structures; to
grant or revoke access to other users; to alter the internal definitions of stored
structures; and to drop the structures from the data dictionary along with any data
they may contain.
SQL 101
6.3 Î CREATE
The first task in implementing the UFL application is to translate the design in the
ER diagram into Oracle tables. In general, each entity will equate to one table.
Each attribute has a name and is specified as being of a particular type such an
integer, a string, a date and so on. These simple types are described in Table 6.1. A
full list of the standard Oracle data types is given in Section 22.4.
Each of these types either has a specific size or is capable of having its size
defined. Generally, if the data type has a size parameter, this must be present when
a column is defined. The three most frequently used data types are strings,
numbers and dates. Table 6.1 shows the Oracle data types used in the UFL
application.
The number type can hold integers, fixed decimal currency or variable decimal
values depending on the precision and scale definitions (Table 6.2).
Figure 6.1 shows the first attempt at creating the Players table using the SQL
command create table. The command specifies the name of the table and the names
of each of the columns. The columns have been assigned a data type and some defi-
nitions have been restricted to disallow null values when a row is inserted or when
a value is updated. A primary key has been identified and, as a side effect of the
command, an index with a system assigned name will have been created automati-
cally. Whenever a row is inserted into the players table, the index will be
maintained.
The syntax of the create command simply requires an indication of the type of
database object to be created and its name. Here the database object is a table and
its name is players. The column definitions are enclosed in brackets and each
102 SYSTEMS BUILDING WITH ORACLE
6.3.1 CONSTRAINTS
Look at the Club entity in the ERD of Figure 6.2. It has four attributes. Each of these
attributes has a domain of values and one of the attributes is the primary key. The
Club entity is a strong relation and is in BCNF.
This entity translates into a table with four columns and with Name as the
primary key.
A constraint can be specified at the time that the table is created and is a clause
within the create table command (Figure 6.3). It has a name and its specification is
stored separately from the table definition in the data dictionary. Because of this, a
constraint can also be added after the table has been created and it can be dropped
or altered after table creation.
Constraints offer a more flexible way of specifying the nature of a table’s
columns or the behaviour of the table resulting from changes in the data values in
other tables that participate in a relationship.
Constraints may be defined as the Clubs table is created by including a constraint
clause within the definition for each column. This is called a column constraint
(Figure 6.3). In this first example, the constraint is defined in the clause relating to
SQL 103
Division
Stadium
Clubs
Name
Manager
the specification of the name column of Clubs. The constraint is called pk_clubs and
it defines the name column as a primary key. The index created will be called
pk_clubs
The alternative is to define the constraint after defining all the columns.
Adopting this syntax means that the columns involved must be indicated and a
table constraint is created. In Figure 6.4, the alternative syntax at the end of the
column definitions again indicates a constraint called pk_clubs and the column
with which it is associated. Again, this results in the creation of an index on the
name column; the index will be called pk_clubs. The other constraints require the
other columns to have real and valid values set when a row is inserted or updated
Before the clubs table can be re-created using this new format, the first version
must be destroyed. The drop table command (Figure 6.5) deletes any data in the
table as well as the table structure itself. Use it with care!
The relationship that exists between clubs and players can be defined in the
structure of the table Players. There are two ways to do this. The previously defined
104 SYSTEMS BUILDING WITH ORACLE
table Players can be dropped and a new version created (Figure 6.6) or the table defi-
nition can be altered.
In Figure 6.6, the Players table is dropped. A new version of Players is then
created, using constraints to define a primary key and disallowing null values in
the other columns. A further table constraint called fk_club, which is associated
with the club column, is then defined. This constrains values placed in the Players’
column club to match pre-existing values in the name attribute of the Clubs table.
In other words, a foreign key has been specified in Players that is the basis of the
1:N relationship between Players and Clubs. Every time a row is inserted in the
Players table, the database will ensure that the player is assigned to a valid club or
else it will disallow the insertion.
If an attempt is made to delete a row in Clubs and there are players assigned to
that club, the deletion will be disallowed because the result would be a value in
some Players’ rows referencing a non-existent row in Clubs. Notice that commas
separate the table constraints.
Constraints may be used to enforce structural or referential integrity for primary
and foreign keys. Other types of constraint may be used to assure behavioural
integrity; that is, conformation to the business rules affecting the application. This
is usually associated with checking values entered in attributes when rows are
inserted or updated.
SQL 105
6.4 Î ALTER
The constraint in Figure 6.8 allows a cascade delete of rows in Players consequent
on a proposed deletion of a row in Clubs. In tables that have foreign key constraints
based on a weak relationship to the primary key of a deleted Players’ row, there may
be collateral deletions. For example, the Goals table and the Discipline table contain
a dependency on Players. Deleting a club could result in deleting all the players
assigned to that club, all the goals scored by those players and all the disciplinary
actions against those players. You can see why it is called a cascade delete. If the
SQL 107
foreign key constraint in Goals does not have a cascade delete defined, then a
cascade delete triggered in Players from Clubs will be aborted and the originating
action cancelled.
6.5 Î GRANT
She also has a resource role which enables her to create a private schema and,
within that, to create tables, indexes and other data structures. She can insert rows
into her tables, issue queries, update or delete rows and drop or alter structures
owned by her.
108 SYSTEMS BUILDING WITH ORACLE
In addition, this user has been granted certain system privileges by the DBA.
These are shown in Figure 6.10 and include create type, execute any type and unlim-
ited tablespace.
Myfriend may be permitted to exercise any privilege on the Clubs table and to pass
these on to any other user (Figure 6.12).
The scope of access given can be restricted by specifically indicating the privi-
leges granted. Myfriend can be allowed only to query and update existing rows but
not insert or delete rows and not to alter or drop the table (Figure 6.13).
The keyword public allows the privilege to be given to all users. The Clubs table
can be made available to all users, allowing them to read the data but not to change
it in any way (Figure 6.14).
6.6 Î INSERT
The insert verb allows rows to be inserted into a table. The command must conform
to any constraints on the table such as disallowed null values, unique primary keys
or foreign keys referencing other tables. If there are check constraints on any
column, these will also be triggered by the insertion of a row (Figure 6.15).
Figure 6.16 shows the simplest form of the insert command. The table expression
is a single table. All four columns have been cited in the columns section since the
table was created with not null constraints for all columns
The values clause lists the string constants corresponding to the order of the
columns section. If the columns section is omitted, the values must be ordered as the
columns were specified in the table creation statement. Multiple insert statements
can be grouped together as in Figure 6.17 to populate the Positions table.
Now there is one row in the Clubs table, rows can be inserted in the Players table
(Figure 6.18). Because the Players table has a foreign key constraint on it, inserts
will fail if there is not a corresponding club to which players may be assigned.
Program files to create and populate the design schema are available from this
book’s web site.
SQL 111
6.7 Î SELECT
The SQL select command is used to read the data stored in tables available to the
user. It consists of three major clauses used to specify the target of the query, the
domain or data source, and any criteria to be applied against the source.
The database will check that the columns in the select clause are to be found in
the selection domain, the from clause. Any ambiguity, such as columns in more
than one table having the same name, must be resolved with pseudonyms. If the
selection predicate is left out then all the rows in the selection domain will
contribute to the result set (Figure 6.19).
select UFA_ID, forename, Selection set: these are the columns that
surname, dateofbirth, club will be projected into the result
from players Selection domain: the source of the data
where club=’Beauty United’; Selection predicate: conditions for inclusion
in the selection set
The example in Figure 6.20 shows a query that demonstrates the first two clauses
of a select statement. Firstly, the target selection set is defined as five columns from
the Players table and secondly the Players table is confirmed as the domain or
source of the query. The column Dateofbirth is given a pseudonym or alias. There is
no selection criterion, so the whole Players relation contributes to the result.
Twenty-two rows will be selected.
If all the attributes are required and no pseudonyms are needed then a useful
shorthand is available (Figure 6.21).
6.7.1 RESTRICTIONS
In order to introduce a restriction on the rows of players in the result, a where clause
is used. The where clause in Figure 6.22 contains a selection predicate or selection
criterion that all returned rows should have a value in the Clubs column that equals
‘Beauty United’. This query returns 11 rows.
Oracle’s like operator, illustrated in Figure 6.23, performs string comparisons
between its argument and the value of Clubs in each row of the Players table. The
syntax of the argument, ‘B%’, means that the target string should have a B as its
first character, followed by any other characters. The % symbol is a wildcard for
this operator.
More then one restriction can be included in the where clause by using the logical
operators and, or. Care must be taken when combining these operators to ensure
that they are not mutually exclusive (Figure 6.24).
A player cannot have a surname that begins with a B and at the same time begins
with an S. The group of players whose surnames begin with S and B is perhaps
understandable in English, but not in SQL. The selection predicate is applied to
each row and must be satisfied in total if that row is to be included in the result set.
In fact, this is an impossible condition.
SQL 113
1 row selected.
The correct operator is an or (Figure 6.25). This is the correct way of putting the
question in English as well.
Comparison restrictions, using the comparison operators <, =, >, = and ? can also
be applied (Figure 6.26). The comparison operators can operate on number, strings
114 SYSTEMS BUILDING WITH ORACLE
3 rows selected.
or dates and the output can be sorted by one or more attributes of the selection
domain, whether it is in the selection set or not (Figure 6.27).
Table 6.3 shows some of the SQL*Plus formatting commands and their effects.
set linesize 75
set pagesize 40
set headsep |
set num 6
clear breaks
clear computes
rem The SQL query, output must be ordered for the breaks to work
select club, ufa_id, forename, surname, squad_no, position
from players
order by club, position;
breaks cleared
computes cleared
Squad
Club. UFA Reg. Forename. Surname. Number Position
------------- -------- --------- ----------- ------ ----------
Beauty United BT46286 MARC MORE 16 defender
QT873456 EDWARD KANT 4
UT23685 GEORGE CONSTANTINE 5
FT246517 ALAN GREGORY 9 forward
TY48429 WINSTON PLATO 7
VT432167 PIERRE DIOCLETIAN 11
SC159647 RAMESH THACKERAY 2 goalkeeper
AT245876 BRIAN HEGEL 27 midfield
KL23659 CHARLES PAUL 8
ED24659 VICTOR SOCRATES 12
UT236965 HENRY DICKENS 6
************* -------- **********
club count 11
--------
TOTAL 22
Figure 6.29 (p. 117) shows the SQL*Plus output after the formatting statements
have been applied and the SQL statement executed.
The create command allows a table to be created and populated with data as a result
of a select statement. Essentially this command is copying data from one table to
another, creating the second table as part of the same query.
FULLNAME
-------------------
ANTONIO BACON
PETER AUSTEN
EDUARDO ERASMUS
FRANCIS JUSTINIAN
PAUL JOHN
NIKOS ARISTOTLE
EDGAR TIBERIUS
DEREK MARX
JOHN INNOCENT
LUCIUS AUGUSTUS
LUIGI SHAKESPEARE
11 rows selected.
Figure 6.30 shows a simple example. Data from the Players table is extracted with
the select statement, which concatenates the forename and surname attributes of
Players and populates purity_players. The new table has one attribute, fullname.
Columns take the same name and data type as in the selection set. The select
clause can contain any SQL function, join, sub-query, union or other clause
permitted for a standalone select. Column constraints on the target table may be
added afterwards with an alter table statement.
SQL 119
The insert command also allows data to be copied from one table to another pre-
existing table by using a select statement in place of the values clause. If the
columns for insertion are included in the insert clause, values from the selection set
must match for data type, but do not have to have the same name.
Figure 6.31 shows an insert statement into purity_players that adds just the fore-
name column of all players. The second select statement yields 33 rows: the 11
original rows and the 22 just inserted.
FULLNAME
---------------------------------------------------
ANTONIO BACON
PETER AUSTEN
EDUARDO ERASMUS
FRANCIS JUSTINIAN
PAUL JOHN
NIKOS ARISTOTLE
EDGAR TIBERIUS
DEREK MARX
JOHN INNOCENT
LUCIUS AUGUSTUS
LUIGI SHAKESPEARE
ANTONIO ü
ï
PETER ï
ï
BRIAN ý inserted rows
MARC ï
ï
EDUARDO ï
þ
...
6.11 Î SEQUENCES
Oracle provides a method of generating numbers for abstract keys, which are guar-
anteed to be unique in a multi-user environment. A sequence is a pseudotable with
two pseudocolumns: currval and nextval.
120 SYSTEMS BUILDING WITH ORACLE
A sequence to generate, say, student IDs can be created much in the same way as
a table (Figure 6.32). The starting number and the increment may be specified. The
defaults are 1 and 1. Because a sequence is a pseudotable, it is subject to the same
concurrency control as normal tables. Only one user at a time can update its value.
NEXTVAL
----------
1000000
1 row selected.
CURRVAL
----------
1000000
1 row selected.
NEXTVAL
----------
1000001
1 row selected.
Selecting the value of nextval updates it and returns the new value. Selecting the
value of currval returns its current value without any update. Currval is not avail-
able until after the first selection of nextval, which initializes the sequence.
The dual table used in the examples in Figure 6.32 is provided as part of the
Oracle installation and, by convention, is used whenever a select does not other-
wise have a legitimate table to include in its from clause. Newstudent is a sequence (a
pseudotable), not a table, and so cannot appear in this clause.
A sequence can be used in an insert or create statement as part of the select clause.
Figure 6.33 shows first the creation of a sequence called newperson, starting at 5000
and incrementing by 5. This sequence is included in the select clause of a create
command with a pseudonym of identity. The Players table contributes forename and
surname to the random_people table.
SQL 121
11 rows selected.
An unrestricted select on this table reveals that each row has been given a unique
number in the identity attribute, starting at 5000 and incrementing by 5.
6.12 Î UPDATE
The update command can be used to change the value of a single attribute in a
single row or many attributes in many rows. It takes a similar form to the select
statement except for the order of clauses (Table 6.4).
update players
set club='Dingbat United'
where ufa_id='AT245876';
update players
*
ERROR at line 1:
ORA-02291: integrity constraint (A27WS.FK_CLUB) violated - parent key
not found
If a single statement is due to update 50,000 rows and the violation occurs at the
first row it will not be discovered until just before the statement enters its pre-
commit state. All 50,000 write operations will have to be rolled back.
6.13 Î DELETE
The delete statement is very similar to the update statement. Deletions are carried
out at the row level. Rows are selected for deletion according to a selection predi-
cate applied against a selection domain (Table 6.5).
table along with all remaining rows, re-create the table and re-populate it from the
temporary table.
Foreign key Specify foreign key relating to primary key of another table.
constraint Referential integrity will be imposed
Insert into Row inserted into table with attribute values specified in
A(a, b, c) statement
Values(1, 2, 3)
Insert into Rows in selection set from B inserted into table A, data types a,
A(a, b, c) d; b, e; and c, f must be compatible
(select d, e, f)
from B ...)
6.15 Î QUESTIONS
6.16 Î EXERCISES
(i) Create a table called, Beautylist, which consists of all the players who play for
Beauty United.
(ii) Practise updates and deletions in Beautylist.
CHAPTER 7
MORE SQL
A join takes place when more than one table is cited in the from clause of a select
statement or in a sub-query associated with an insert, update or delete command.
The relational algebra has only one basic operator that can be applied to two
tables and that is the Cartesian product. Joining two tables in this way is very
expensive in terms of operational resources because the result will be a relation
with all the attributes of both tables and with every row of the first table combined
with every row of the second.
Performing a Cartesian product on a table with 6 attributes and 200 rows and
another table with 4 attributes and 1000 rows will yield a table with 10 attributes
and 200,000 rows!
Repeated operations like this would clearly fill the allocated tablespaces and put
a huge strain on memory and even CPU time. The output table would also be very
little use in terms of information value.
While the Cartesian product is the theoretical basis of the join operation, the
goal is never to use it. More efficient and analytic join methods can be derived,
using the select and project constructs to limit the processing time for a join and to
render the output useful in terms of the application it is serving.
There are at least four secondary algebraic operators derived from the Cartesian
product: the theta join, the equijoin, the natural join and the semijoin (Table 7.1).
125
126 SYSTEMS BUILDING WITH ORACLE
Joins take place in select statements. The selection set can take any attribute from
the tables in the selection domain; the join condition, together with any other
restrictions, is placed in the where clause or selection predicate.
More than two tables may be specified in the from clause. A join can only be
applied between two tables at a time, since the join is a binary operator. The query
is processed by joining first two tables, using the join condition that links them, to
give an intermediate relation. Sufficient attributes are projected into this interme-
diary to apply the next join condition, and so on (Figures 7.1 and 7.2).
It is important to ensure that there are enough join conditions to avoid a Carte-
sian product in the result. The minimum number of join conditions is one less than
the number of tables in the from clause (Figure 7.3).
Any rows not meeting the selection predicate do not occur in the resulting rela-
tion. All players have clubs, but only those players who have scored a goal will
appear in the output of the query in Figure 7.3. In fact, those players who have
scored two goals will occur twice in the output, any with three goals will appear
three times and so on. In this case, that is what is wanted: a list of players who have
MORE SQL 127
12 rows selected.
6 rows selected.
Figure 7.4 SQL join with dates and nulls (natural join).
scored goals with a note of the minute in the match when they scored each goal.
This match report shows that six goals were scored and that Purity FC won 4–2.
Another report showing a list of players, their age and length of service with their
current club demonstrates a join between the Players and contracts tables. The
query in Figure 7.4 shows this join. The output is formatted with SQL*Plus
commands and the query makes use of date arithmetic as well as dealing with the
null value.
Ages can be calculated by comparing sysdate(today’s date and time now) and
dateofbirth for each player. The SQL function months_between returns the number
of months between two dates; dividing by 12 gives years. Length of service is given
by applying months_between to sysdate and the joined attribute of Contracts. Each
player may have had several contracts with various clubs. Their current contract
has a null value in the left attribute of contracts.
Both Players and Contracts have identically named attributes. To avoid ambiguity
a pseudonym is given to these tables in the from clause. These pseudonyms are used
in the selection set and in the selection predicate.
MORE SQL 129
The queries in Figures 7.3 and 7.4 are examples of inner joins. Inner joins only
give results where the full join conditions are met. Rows not meeting the join
conditions are excluded from the intermediate and final output relations. This
exclusion at the intermediate stage will affect the results of later joins if there are
any. However the query optimizer orders the joins, the result will be the same. It is
desirable that exclusions take place early in the execution phase, so that subse-
quent joins are made between smaller intermediate relations.
The manager of Beauty United needs a list of all his players showing which ones
have scored goals. In order to get the goalscorers, a join of Players with Goals is
needed. However, a simple inner join will exclude the players who have not scored.
In this case, an outer join is used (Figure 7.5).
16 rows selected.
The (+) symbol next to the join condition for goals.ufa_id indicates an outer join.
For Players who have matching ufa_id attributes in goals, the attributes minute and
type are projected into the intermediate relation as normal. Players with no match
are added to the intermediate relation with pseudo-columns for minute and type,
which contain null values. These rows show up with blank spaces in the final output.
Oracle allows only one outer join in a query. The or logical operator and the in set
operator are disallowed in the selection predicate.
7.1.3 SUB-QUERIES
Perhaps the most complex select restrictions are the comparison operators in, any,
all and exists and their negations not in, not exists, which test for membership of a
set. The set can be composed of constants or it can be the result of a query within
the query. In the latter case, it is called a sub-query. If the sub-query is guaranteed
to return only one row then the comparison operators can be used as well as the set
membership operators.
The first example in Figure 7.6 constructs an intermediate set composed of two
string constants. The value of the club attribute in Players is tested for equality with
SURNAME NAME
------------------------- -----
DIOCLETIAN Beauty United
CONSTANTINE Beauty United
PLATO Beauty United
AUGUSTUS Purity FC
INNOCENT Purity FC
5 rows selected.
any member of this constructed set. The overall query yields 11 rows, the players of
Beauty United.
The second example constructs a comparison set from a select statement applied
against the clubs table. The intermediate set will have one member with a single
attribute having the value Purity FC, managed by Mercury. Again, the value of the
club attribute in players is tested for equality with any member of this intermediate
set. The overall query yields 11 rows: the players of Purity FC.
If a list of all players was required with an indication of who had scored at
least one goal, the selection criteria would need to be changed. In this case, just the
existence of a player’s ufa_id in the goals table would indicate that he had scored at
least once.
A sub-query with the in set operator gives the answer required, as shown in Figure
7.7. Again, only rows for which there is a full correspondence in the join condi-
tions will appear in the result.
The example in Figure 7.8 provides a list of all players showing who has scored,
who has been disciplined and who has had a relatively blameless, if uneventful,
career. The illustration of the outer join above showed how to achieve such a report
when only one outer join was required. Here, two outer joins are needed and Oracle
will only allow one inside each query.
The important word there is ‘query’. It would be possible to write two outer join
selects: on Players with Goals and on Players with Discipline and then combine them
with a union operator.
This logic will not produce the required report. There are players who have
scored but who have not been disciplined and there are players who have been
disciplined but who have never scored a goal. Such players would appear in the
null-padded list of one of the outer joins as well as appearing either as a goal scorer
132 SYSTEMS BUILDING WITH ORACLE
or as having had a disciplinary penalty. Players who have neither scored nor been
disciplined would appear in both null-padded lists.
The answer is three select statements, none of which is an outer join. These are
then combined using the union operator to form the final list. First, an inner join
on Players and Goals reveals the goal scorers. Secondly, an inner join on Players and
Discipline lists those who have been disciplined. Thirdly, an inner join with
conjoined sub-queries lists those who have neither scored nor been disciplined.
The union operator is a set operator and requires its operands to be union
compatible. That is, each must consist of the same attributes. This is achieved by
using a pseudonym called note.
The DBMS must determine how best to retrieve the data, basing its decision on the
table structure and the presence or not of an appropriate index. When a query is
started that requires data to be retrieved from multiple tables, the DBMS must also
decide in which order to access and join the individual tables. This decision is
helped if statistical information is available on the tables that need to be accessed.
By knowing the likely number of entries matching a particular query, the DBMS
can make informed estimates of the number of rows resulting from a particular
table access and restriction. This information is important when deciding how best
to bring the tables together.
Oracle introduced this method of determination, which it refers to as cost-based,
from version 7. Before that, the optimizer relied on built-in rules regarding tables
and indexes. The query execution plan cost estimates reflect what the database
knows about gathered statistics on unique and duplicated value attributes.
The revised relational algebraic formulations will reflect what is known about
foreign key relationships, primary key constraints and other structural rules
embedded in the schema (Figure 7.9).
134 SYSTEMS BUILDING WITH ORACLE
Oracle reveals its query execution plan through the explain plan mechanism. The
first step is to create a table called plan_table into which rows will be inserted when
explain plan is invoked on a query. The script ultxplan.sql is found in the ora92/
rdbms/admin directory and automates the creation of this special table. Cost-based
plans cannot be prepared unless statistics have been gathered about the tables in
the query. The analyze command takes care of this (figure 7.10)
A plan is created and plan_table is populated by prefixing the query with explain
plan for (Figure 7.11).
The last script, in Figure 7.11, utlxpls, calls a routine that is stored in the database
during installation. This performs a query on the table plan _table and outputs the
formatted result as shown in Figure 7.12.
The plan in Figure 7.13 is read bottom-up. The first operation reads all rows from
the players table into a local memory buffer. Oracle has correctly estimated that
MORE SQL 135
explain plan for select surname, club Populate plan _table with the
from players where club in (select name query plan for the select state-
from clubs where manager=’Mercury’); ment given
@utlxpls Oracle supplied script to query
plan_table
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 385 | 5 |
|* 1 | HASH JOIN | | 11 | 385 | 5 |
|* 2 | TABLE ACCESS FULL | CLUBS | 1 | 17 | 2 |
| 3 | TABLE ACCESS FULL | PLAYERS | 22 | 396 | 2 |
--------------------------------------------------------------------
17 rows selected.
this will result in an intermediate buffered relation of 22 rows (✓). Larger tables
might be paged out to disk by the virtual memory system.
Next, rows from the Clubs table that conform to predicate 2 are read into a buffer.
Oracle has correctly estimated that this will amount to 1 row (✓).
Oracle has decided that a hash join on players.clubs and clubs.name in the
unsorted buffers is the cheapest way of implementing the in operator and the sub-
query. The estimate for the intermediate relation is 11 rows (✓). The last operation
is to project the required attributes into the final selection set, estimated at 11 rows
(✓).
Alternative and more costly strategies are illustrated in Figures 7.13 and 7.14.
In Figure 7.15, Oracle plans first to make a selection on goals, estimating 2 rows
will be returned on predicate 5 ("GOALS"."MATCH_ID"=1). This is an underesti-
mate, but it is based on the average number of rows in the Goals table for different
match_ids.
Next, a hash join is performed on predicate 2 between Players and Clubs. Because
of the foreign key constraint, Oracle can determine that every player will be in the
joined relation, which will therefore have 22 rows.
Finally, this intermediate relation is joined with the restricted Goals interme-
diate. Because no attributes are projected from Goals, a left semijoin can be carried
out. Oracle is slightly underestimating the final result at 3 rows.
MORE SQL 137
@utlxpls;
----------------------------------------------------------------
| Id | Operation |Name | Rows | Bytes | Cost|
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 171 | 8 |
|* 1 | HASH JOIN SEMI | | 3 | 171 | 8 |
|* 2 | HASH JOIN | | 22 | 1034 | 5 |
| 3 | TABLE ACCESS FULL | CLUBS | 19 | 399 | 2 |
| 4 | TABLE ACCESS FULL | PLAYERS | 22 | 572 | 2 |
|* 5 | TABLE ACCESS FULL | GOALS | 2 | 20 | 2 |
------------------------------------------------------------------
1 - access("PLAYERS"."UFA_ID"="GOALS"."UFA_ID")
2 - access("PLAYERS"."CLUB"="CLUBS"."NAME")
5 - filter ("GOALS"."MATCH_ID"=1)
Restricted Reduce the result set with join condition base on attribute
joins values: theta joins, natural joins, equijoins and semijoins specify
which kind of attributes and result projection
138 SYSTEMS BUILDING WITH ORACLE
Outer joins Include those rows failing the join condition with dummy
attributes in the selection set. One outer join per SQL
statement. Oracle (+) operator
Unions Basic operator. Accumulation of two result sets into one. Sets
must have the same attributes (name and type)
Query Translate the relational algebra into file and buffer operations;
optimization chose most efficient (lowest cost) execution plan. Cost usually
expressed in disk I/Os and CPU time
Explain plan Reveal the execution plan and estimates for cost of operations
7.5 Î QUESTIONS
7.6 Î EXERCISES
(i) Store the result of the queries in Questions (i)–(vi) (above) in suitable temporary
tables.
(ii) Determine the execution strategy for each of the queries above.
CHAPTER 8
DATA SECURITY
8.1 Î TRANSACTIONS
The database software checks an SQL statement for basic conformity with the struc-
tural rules constructed by the DBA. These will include ensuring that the user has
access rights; that a proposed update is permissible in terms of the domain ranges
of attributes; and so on.
139
140 SYSTEMS BUILDING WITH ORACLE
However, an SQL statement that results in the update of a single row in a table,
because of the complexity of its restriction criteria, may have to be translated into
many database operations in order to achieve that aim. It is also possible that a
block of SQL insert or update statements, each logically dependent on the success of
its predecessors or successors, may be submitted.
A systems failure such as a power cut may interrupt a database file operation of
any of these multiple statements. One of them may contain a syntax error and fail
after several have succeeded. It may be that the partial success of the block, with
some updates completed and others incomplete, offends against the business rules
of the organization.
There may be several independent SQL program blocks submitted simulta-
neously by different users and any one of them might be trying to read or write data
already updated by another user.
There are four problems here:
G What is the unit of work for a user submitted job? Is it a single database file opera-
tion? Is it a single SQL statement or is it a block of interdependent SQL state-
ments?
G If a block of statements is the job unit and it fails midway, how can already
successful updates, which depend on subsequent failed updates, be prevented
from compromising the adherence of the database to its business rules? If a
single SQL statement is the job unit, how does the system cope with a single
statement which is designed to update 25,000 rows, and which fails after
updating 24,999 rows, leaving the last row out of step with the others?
G How can a temporary update from one SQL statement be protected from being
overwritten by a conflicting update to the same row from another user’s state-
ment?
G Even if different users’ updates are segregated in a temporary filespace, how do
we reliably transfer their potentially conflicting contents into the database
proper at the end of a job?
8.2 Î ATOMICITY
A transaction may consist of just one SQL statement or it may involve multiple
statements that are logically dependent on each other and that must be all be
successfully completed. A transaction cannot be defined simply as any particular
number of SQL statements, since it has a variable length.
In order to achieve a clear definition of the transaction as a unit of work, it is
necessary to define precisely when that work is completed. For this, a new SQL
command is needed: the commit statement. Issuing the commit command finally
DATA SECURITY 141
and irrevocably ends the transaction. Until that time, the work is not permanently
saved in the database.
A transaction can be defined as any number of SQL statements contained within
two commit statements. The first marks the end of the previous transaction and the
start of a new work unit. The second marks the end of that new work.
Pre-commit Committed
Active
Failed Aborted
The transaction may also move from the pre-commit to the aborted state if a roll-
back command is issued instead of a commit.
Try the sequence of SQL statements in Figure 8.2 in terms of transactions. There
are three transactions. The first has just one statement before the commit. The
second transaction has three SQL statements: a select, an update and another select.
This second select includes a mistaken update that does not offend against any
database constraint, but is then manually reversed by the rollback; the last select (in
a third transaction, not yet ended) shows the reverted value.
Before the rollback is issued, the mistaken update can only be seen within the
transaction and only by the user executing it. Other users cannot read this interme-
diate result.
8.3 Î CONSISTENCY
A single SQL statement can still contain instructions to update, insert or delete
rows in a way that would offend against the business policies of an organization.
Before the database prepares a statement for execution, it is checked for
syntactic validity: unmatched or missing brackets, misspelled keywords and so
on. In other words, does the statement conform to the defined grammar?
DATA SECURITY 143
If it does, then it is checked for semantic validity. Does the user have access to
the items in the from clause; do they exist; are the names of columns misspelled?
After semantic checking, the statement is passed for execution. At the end of this
phase, the statement is checked for conformity with constraints imposed by the
DBA: has the statement changed or deleted a primary key, leaving foreign keys
unreferenced; does an update exceed the percentage increase allowed on this row?
If these checks find a constraint violation, the statement will fail.
If the statement passes all these checks, it will execute and enter the pre-commit
stage. Choosing to commit the statement will take the database to a state which is
consistent with the syntactic, semantic and constraint rules embedded in its
schema. Choosing to roll back the statement will take the database back to a state
that was already consistent with these rules.
An SQL statement is therefore guaranteed to preserve consistency of the database
by virtue of the syntactic, semantic and constraint checks imposed upon it. The
serial execution of multiple SQL statements must similarly be guaranteed to
preserve consistency. If they are grouped within an atomic transaction, then either
all the statements succeed and are committed or one fails and causes the whole
transaction to be rolled back.
If transactions from different users are executed serially so that each one does not
start until the previous one is committed, then they also guarantee consistency.
However, this is an impractical proposal since many non-conflicting transactions
would be queued. It is far better idea to allow concurrent execution of transactions
and to provide a mechanism to segregate their intermediate results.
144 SYSTEMS BUILDING WITH ORACLE
8.4 Î ISOLATION
8.4.1 SERIALIZABILITY
A schedule is the sequence that indicates the chronological order in which instruc-
tions of concurrent transactions are executed. A schedule for a set of transactions
must consist of all instructions of those transactions and must preserve the order in
which the instructions appear in each individual transaction.
Only read and write instructions are considered because generally only these
operations give rise to conflict between transactions. In between reads and writes,
transactions may perform calculations or other manipulations in local buffers
without conflict.
A bank example illustrates the scheduling problem. A customer has a current
account and a savings account. He telephones his branch, asking for £50 to be
transferred from his current account to his savings. On the same day, he decides to
transfer another £40 from the current account to the savings account. Figure 8.3
shows the two transactions, T1 and T2.
The two transactions are not concurrent and do not interfere. At the close of busi-
ness his current account is –£90 and his savings account is +£90.
Figure 8.4 shows a serialized schedule in which the instructions of T1 and T2 are
concurrent and which preserves the consistency of the database: the sum of A and
B remains the same.
Figure 8.5 shows a concurrent transaction that does not preserve the sum of A
and B. T1’s write operation on A is based in the original value of A but takes place
after T2 has updated it. T2’s £40 debit update will be overwritten by T1’s £50 debit
and lost.
Both transactions base their calculation of B’s update on the same original value
of B. T2’s update takes place after T1’s, overwriting it. The £50 deposit is lost.
At the end of the two transactions the current account balance is –£50 and the
savings account is +£40. Not only did the customer fail to transfer the full £90, but
he lost £10 in the process!
DATA SECURITY 145
Transaction 1 – T1 Transaction 2 – T2
read Read (A)
buffer A = A – 50
write Write(A)
read Read (B)
buffer B = B + 50
write Write(B)
read Read (A)
buffer A = A – 40
write Write(A)
read Read (B)
buffer B = B + 40
write Write(B)
T1 T2
read Read (A)
buffer A = A – 50
write Write(A)
read Read (A)
buffer A = A – 40
write Write(A)
read Read (B)
buffer B = B + 50
write Write(B)
read Read (B)
buffer B = B + 40
write Write(B)
T1 T2
read read(A)
buffer A = A – 50
read read(A)
buffer A = A – 40
write write(A)
read read(B)
write write(A)
read read(B)
buffer B = B + 50
write write(B)
buffer B = B + 40
write write(B)
Instructions I 1 and I2 of two transactions conflict if and only if there exists some
item accessed by both and at least one of these instructions writes to it (Figure 8.6).
T1 T2 Conflict status
I1 = read (A) l2 = read (A) I1 and I2 do not conflict
I1 = read (A) I2 = write (A) They conflict
I1 = write (A) I2 = read (A) They conflict
I1 = write (A) I2 = write (A) They conflict
T3 T4
read (A)
write (A)
write (A)
The serialized schedule in Figure 8.8 can be rewritten because there are two pairs
of non-conflicting instructions that can be temporally interchanged to achieve a
serial schedule:
T1 T2 T1 T2
read (A) read (A)
write (A) write (A)
read(A) read(B)
write (A) write (B)
read(B) read(A)
write (B) write (A)
read(B) read(B)
write (B) write (B)
The serialized schedule has two Temporal interchange of these
pairs of non-conflicting instructions instructions yields a serial schedule
<T1, T2>
If S and S¢ are two schedules with the same set of transactions, S and S¢ are view
equivalent if three conditions are met:
G For each data item, if transaction Ti reads its initial value in schedule S, then
transaction Ti must also read its initial value in schedule S.¢
G For each data item, if transaction Ti executes a read on it in schedule S, and that
value was produced by transaction T j (if any), then, in schedule S,¢ transaction Ti
must also read the value of it that was produced by transaction Tj.
G For each data item, the transaction (if any) that performs the final write opera-
tion on it in schedule S must perform the final write operation on it in schedule
S.¢
T1 T2 T3
read (A)
write (A)
write (A)
write (B)
T1 T2
read (A)
write (A)
read(A)
read(B)
T1 T2 T3
read (A)
read(B)
write (A)
read(A)
write (A)
read(A)
Ti Tj
T1 T2 T3 T4
read (A)
read (B)
read (C)
read (B)
write (B) write (C) read (B)
write (B)
read (D) read (C)
T1 conflicts with T2 over item B; T1 accessed B first
T2 conflicts with T4 over item B; T2 accessed B first
T3 conflicts with T4 over item C; T3 accessed C first
T1 conflicts with T3 over item C; T1 accessed C first
B
T1 T2
C B
T3 T4
C
8.5 Î DURABILITY
Once a transaction commits, the system must guarantee that the results of its oper-
ations will never be lost, in spite of subsequent failures. This same condition must
also hold true if the transaction aborts.
The same database component that assures atomicity also provides durability
assurance: the recovery manager guarantees commit and rollback.
During a transaction, there may be many temporary updates of the data. In order
to preserve recoverability the previous values of the affected rows must be kept
until the transaction ends. This guarantees the possibility of rollback. Tracking the
progress of updates to the data during a transaction is called logging and implies
the use of a separate tablespace used for this purpose.
In order to protect against catastrophic system failure like a disk crash, it is a wise
strategy to store the log tablespace on a second disk away from the user’s database
tables. This brings performance benefits as well as improving security; disk I/O to
the log and to the database can take place in parallel (Table 8.2).
commit state for each of the transactions in it there must be some kind of mecha-
nism to guarantee the schedule.
There are two principal types of concurrency control approaches for resolving
conflict. Either, when two operations conflict, the second is made to wait until the
first has completed or the operations conflict is noted and resolved at the pre-
commit stage.
The first approach assumes that there will often be conflict and that all opera-
tions need protection to be put immediately in place on their first access of a data
item in case a subsequent operation causes conflict. That second operation will be
put into a wait state. This approach leads a group of concurrency measures that are
known as pessimistic.
Optimistic measures are those that assume that conflict will be relatively rare
and avoid all the overhead of providing immediate protection to database opera-
tions. Instead, they determine if conflict has occurred at the pre-commit stage and,
if it has, roll back the conflicting transactions and possibly restart them. If the
savings in resource utilization, gained from avoiding immediate protection,
outweigh the work lost in rollbacks, then optimism in concurrency control can
offer greater throughput of work and still guarantee database consistency.
8.6.1 LOCKING
The simplest pessimistic scheme of concurrency control uses some kind of locking
system. Locks indicate conflict and are blocking in that they cause wait states. A
database will lock data at the row, page or table level and use at least two kinds of
locks to indicate what kind of operation caused the lock: a read lock or a write lock.
A read operation within an SQL statement cannot access the data until it has
requested and acquired a read lock on the target row and the same condition
applies for a write operation. Conflicts only occur when one of the competing
transactions, T1 or T2 wishes to initiate a write. The lock manager arbitrates lock
requests using the rules of Table 8.3.
T1 T2 Lock manager
Read lock on (A) acquired Request read lock on (A) Granted
Read lock on (A) acquired Request write lock on (A) Not granted, wait for
lock
Write lock on (A) acquired Request read lock on (A) Not granted, wait for
lock
Write lock on (A) acquired Request write lock on (A) Not granted, wait for
lock
DATA SECURITY 153
Lock request by T2
Lock held by T1 Read Write
Read No conflict Conflict
Write Conflict Conflict
T1 T2 Lock manager
Write lock on (A) read lock on (D) Granted
Write lock on (B) read lock on (E) Granted
Write lock on (C) read lock on (F) Granted
read lock on (G) Granted
read lock on (H) Granted
read lock on (J) Granted
REQUEST TABLE READ LOCK Not granted, wait for lock
154 SYSTEMS BUILDING WITH ORACLE
transaction releases its locks for the escalation to proceed. The exclusive table lock
cannot be granted while another transaction is holding a single row level lock.
8.6.2 DEADLOCKS
Mutually exclusive lock systems can also give rises to deadlocks. The four condi-
tions for deadlock are set out in Table 8.6. A deadlock will continue forever if it is
not detected and one of the conditions removed. Most databases resolve deadlock
by arbitrarily aborting one of the transactions, causing a rollback and then
restarting it. That is, pre-emption is introduced, removing one of the deadlock
conditions.
Deadlock can be detected using a wait-for graph that plots the state of all transac-
tions in a wait state. Each transaction is a node in the graph and directed edges
indicate the holder of the sought-for resource. A cyclic graph indicates that dead-
lock has occurred (Figure 8.15).
T1 has B, T2 has A,
wants A wants C
T3 has D, T4 has C,
wants B wants D
The goal of the validation phase is to guarantee that only serializable schedules
remain. The technique is actually to find an equivalent serializable schedule.
Each transaction is assigned a sequential Transaction Number (TN) during execu-
tion. If the transactions are run in the order of TNs then an equivalent serial
schedule is obtained.
Each transaction has a ReadSet (RS) – the items it reads – and a WriteSet (WS) –
the items it writes. Tracking the transaction and maintaining its WriteSet and
ReadSet can put on a lot of overhead for transactions that perform a lot of full table
scans or row updates. The transaction is validated on the basis of the three condi-
tions in Table 8.7.
Consider two transactions, Ti and Tj, where Ti < Tj. If any one of the three
serializability conditions in Table 8.7 is true, then Ti is serializable.
There are three possible classes of conflict: W–R, R–W and W–W and two possible
orderings of the transactions: i before j or j before i. Therefore there are six (3 × 2)
156 SYSTEMS BUILDING WITH ORACLE
possible conflict orderings to consider. Each condition guarantees that the transac-
tion go in one order only: i before j.
For condition 1 all conflicts are ordered i before j (true serial execution).
For condition 2, there are no W i–Rj or Rj–Wi conflicts since Ti’s WriteSet does not
insect with Tjs ReadSet. There are no W j–Ri or Wj–Wi conflicts since the write phase
(and hence the read phase) of Ti precedes the write phase of Tj.
This leaves the possibility of Ri–Wj and Wi–Wj, both of which are ordered i before
j.
For condition 3, there are no W i–Rj or Rj–Wi conflicts since Ti’s WriteSet does not
intersect with Tj’s ReadSet. There are no Wi–Wj or Wj–Wi conflicts since Ti’s
WriteSet does not intersect with Tj’s WriteSet. Wj–Ri is not possible since the read
phase of Ti precedes the write phase of Tj. This leaves only the possibility of Ri–Wj,
which again is ordered i before j.
Satisfaction of any of these rules proves either a true serial execution or a
serializable execution for Ti, which can then be validated and proceed to commit. If
none of the conditions holds true because, for example, Ti’s WriteSet does intersect
with Tj’s ReadSet and the two write phases overlap, then Ti must be rolled back and
restarted.
Assigning TNs at the beginning of transactions is not completely optimistic,
since a transaction would immediately be certain to fail the validation test if its
predecessor transactions were still running. Instead, TNs are assigned at the end of
the read phase. Note that this guarantees the satisfaction of the second half of
condition 3.
Using timestamps for concurrency control involves marking each data item with
the time it was last written and the time it was last read.
Every transaction gets a unique timestamp (TS) at startup. The transaction may
read a data item if its timestamp is greater than the item’s Write TS. If the transac-
tion’s timestamp is greater than the item’s existing Read TS, then the transaction’s
TS replaces the item’s Read TS.
The transaction may write to an item if its timestamp is greater than either the
item’s Write TS or the item’s Read TS. In this case, the transaction’s TS replaces the
item’s Write TS.
If either of two cases are not true, the transaction is aborted and restarted.
Forcing timestamp order imposes a much tighter restriction than other
concurrency schemes and may lead to cascading aborts. The rolling back of a failed
transaction, undoing the revised item timestamps, may cause other transactions to
DATA SECURITY 157
abort. There is also a price to be paid for the constant updating of timestamps that
is necessary even for a read-only transaction.
Multi-version timestamping techniques involve keeping versions of the same
data items distinguished by their timestamps. Transactions are again allocated a
timestamp as they start. Read requests result in the appropriate version being deliv-
ered. Write requests, creating a new version, can only succeed if there are no reads
between the new version and the creation of a new version (a write from another
transaction). Timestamping is not a popular protocol and is not offered in any of
the major vendor products. For the time being, it remains an object of interest to
researchers.
8.7 Î RECOVERY
In order to service the recoverability of transactions a database will keep some kind
of log that records not only all uncommitted changes to the database, but in
certain cases, the previous values of updated rows.
The log information can be used by the database to determine the new and
previous values of data items and these values may be supplied to read operations,
according to their position in a schedule.
Logging is a means of ensuring that the database is protected against cata-
strophic failure; it aids the isolation of transactions and guarantees durability. Logs
are recorded in a separate tablespace from the normal data. A prudent DBA will
place it on a separate disk with, perhaps, a second mirrored copy.
As they start, transactions are issued with a numerical or timestamp identifier. As
a transaction proceeds, any write operation to update, insert or delete a row leads
to the creation of a log entry. These log entries may be interspersed with the records
of other transactions but not for the same rows. Concurrency control sees to that.
The log file is normally viewed as having a circular organization. It has a fixed
length and records are overwritten on top of the records of already committed
transactions. When the end of the file is encountered, the logging process recycles
to the beginning of the file and continues to overwrite the records of committed
transactions. This method of work is known as incremental logging.
Two strategies can be adopted in a logging process. The differences lie in the form
of the log entry and where the update is stored pending the end of the transaction.
The deferred update strategy means that the database remains unchanged until
commit. An after image of any updated and, so far, uncommitted values is kept in
the log. When the transaction is committed, all of the updates are copied from the
log to the database. If the transaction should be aborted, the log entries are very
easily marked as complete and available for overwriting; nothing needs doing to
the database proper.
While it is active, a transaction can read the results of its own updates from the
after images in the log. If concurrency control allows it, other transactions read the
original values directly from the database
158 SYSTEMS BUILDING WITH ORACLE
This strategy means that updates are stored and accumulated in the log very
quickly and the query reaches the pre-commit stage faster. The commit process is,
however, much longer and therefore involves a greater risk of failure during this
critical stage.
The immediate update strategy means that the updated and, so far, uncommitted
values of a row are written straight into the database. It is essential that a before
image and an after image of each updated row should be stored in the log.
A transaction may read its own updates directly from the database; other transac-
tions may read the original values from the before images in the log, if concurrency
control allows it.
After commit, the log entries are marked as complete and available for overwrite.
Locks are released on the already updated rows in the database. A rollback takes
much longer in this case, as the before images have to be copied back to the data-
base. In practice, there are more commits than rollbacks in most production data-
bases, so the second strategy is generally preferred.
DATA SECURITY 159
8.7.1 JOURNALS
In order to give a second level of defence against system failure, most databases
offer the option of journaling log entries and row updates. A separate tablespace is
made available on a different disk and every change to any row and every log entry
is recorded. The recording is generally made in terms of update vectors rather than
image copies in order to conserve space. An update vector simply records the actual
change instruction that was applied to the row or log entry.
From time to time, backup copies of the database may be taken. Large data files
might be copied to tape or CD and the full database might occupy several tape
spools or CDs. If the backup is taken when there are no active transactions, then
the journal can also be backed up and cleared.
A systems administrator would be well advised to ensure that data log and
journal files are kept on separate disks to minimize the damage done by a system
crash. Many corporations for whom the database is a critical survival factor keep
logs and journals mirrored at remote sites (Table 8.8).
The ANSI/ISO SQL standard defines three undesirable phenomena that may be
experienced by a database (Table 8.9).
Oracle offers only two levels of transaction isolation: read committed and
serializable.
When a transaction runs at the read committed isolation level, a query statement
sees only data committed before the query began; it never sees either uncommitted
data or changes committed during query execution by concurrent transactions.
(However, the query does see the effects of previous updates executed within its
own transaction, even though they are not yet committed.)
This is a system known generally as Multi-version Concurrency Control. In
effect, a query sees a snapshot of the database as of the instant that it begins to run.
Serial queries may see different data, even though they are within a single transac-
tion, if other transactions commit changes during execution of the earlier queries.
As they start, Oracle queries are given a numerical identifier, the System Change
Number (SCN). The increasing numbers also indicate the order of start time. Infor-
mation maintained in Oracle’s rollback segments is used to provide statement level
read consistency. The rollback segments contain the old values of data that have
been changed by uncommitted or recently committed transactions. Figure 8.17
shows how Oracle provides statement-level read consistency using data in rollback
segments.
Referring to Figure 8.17, it can be seen that as the statement with SCN 100023
proceeds, it reads blocks written with SCNs lower than 100023, confident that
these must be the result of statements that committed before it began.
When it encounters a block labelled SCN 100024, this indicates a result from a
statement that started after 100023 began and which may or may not have
committed. Because Oracle uses an immediate update log method, the before image
will be in the rollback segment. Statement 100023 now turns to the rollback
DATA SECURITY 161
segment and reconstitutes the previous version of the row, written by SCN 10011
before returning to the data block committed by SCN 10019. All of the data read by
100023 is committed although there may be a pending update on one of the data
blocks it bypassed.
Since, in read committed mode, each new query statement starts with a new snap-
shot that includes all transactions committed up to that instant, subsequent
queries in the same transaction may see the effects of newly committed concurrent
transactions. Unrepeatable reads and phantoms in a transaction are possible at this
level of isolation. Only within a single query is an absolutely consistent view of the
database seen.
Read committed transactions will wait if they try to change a row updated by an
uncommitted concurrent transaction. The second transaction that tries to update a
given row waits for the other transaction to commit or roll back and release its lock
on the row. If that other transaction rolls back, the waiting transaction can proceed
to change the previously locked row as if the other transaction had not existed.
However, if the other transaction commits and releases its locks, a read
committed transaction proceeds with its intended update. Lost updates may there-
fore occur in read committed mode.
The partial transaction isolation provided by read committed mode is adequate for
many applications, and this mode is fast and simple to use. However, for applica-
tions that do complex queries and updates, it may be necessary to guarantee a more
rigorously consistent view of the database than the read committed mode provides.
initial view of the database, so there is no logical conflict in using the new version
of the row as the starting point for the new transaction.
Note that only updating transactions may need to be retried – read-only transac-
tions will never have serialization conflicts.
The serializable mode provides a rigorous guarantee that each transaction sees
the same wholly consistent view of the database. However, the application has to
be prepared to retry transactions when concurrent updates make it impossible to
sustain the illusion of serial execution. Most commonly, serializable mode is neces-
sary when a transaction performs several successive queries that must see identical
views of the database.
Oracle provides data concurrency and integrity between transactions using locking
mechanisms. The normal policy adopted by the lock manager is to lock data at the
lowest level of restriction that guarantees data integrity while allowing the highest
degree of data concurrency.
All locks acquired by statements within a transaction are held for the duration of
the transaction, preventing destructive interference (including dirty reads and lost
updates) from concurrent transactions. Any changes made by one transaction
become visible only to other transactions that start after the first transaction is
committed and its locks are released.
Oracle does not use read locks. Because of multi-version concurrency control,
read operations do not interfere with writes. A simple select query reads unlocked
data in the database and uses the rollback segment to discover the previous values
of locked, updated rows.
DML operations (select for update, insert, update and delete), which lead to write
operations, acquire data locks at two different levels: for specific rows and for
entire tables.
A transaction acquires a table lock whenever a row lock in that table is acquired.
This is necessary to reserve DML access to the table on behalf of the transaction and
to prevent DDL operations that would conflict with the transaction.
A table lock will be automatically requested and acquired by DML statements
and will be one of two types: row share (RS) or row exclusive (RX). Other modes are
available for manual programming using the lock table command.
A row share table lock (RS) acquired by a select for update statement indicates that
it holds row locks (TX) in the table preparatory to updating, but the updates will be
performed by a subsequent statement in the same transaction. When the subse-
quent statement begins it will use the already acquired TX locks and the table lock
is converted to a row exclusive table lock A row share table lock held by a transac-
tion prevents other transactions, such as those containing DDL statements, from
acquiring exclusive write access to the whole table.
A row exclusive table lock (RX) generally indicates that the transaction holding
the lock is holding TX locks for one or more updates to rows in the table. These
probably result from inserts, updates or deletes. A row exclusive table lock prevents
more types of whole table locks being acquired by DDL transactions.
An RX lock held by a transaction allows other transactions to acquire an RX lock
in order to insert, update, delete or lock rows concurrently in the same table. The
TX locks resolve any conflicting row writes. Clearly, only one transaction at a time
can hold a TX lock on a single row.
Select queries are least likely to interfere with other SQL statements because they
only read data. Insert, update and delete statements can have implicit queries as part
of the statement.
A select query acquires no data locks. Other transactions can update a table being
queried by a select statement, including the specific rows being queried. Such
queries are often referred to as non-blocking queries. A query does not have to wait
for any data locks to be released; it can always proceed.
Implicit queries in update, insert or delete statements usually return an interme-
diate relation consisting of ROWIDs to be updated, copied as inserts or deleted.
Only when the statement starts its write phase does it start to acquire TX and RX
locks.
Lock escalation sometimes occurs when numerous locks are held at one level of
granularity (rows) and a database raises the locks to a higher level of granularity
(page or table). If a single user locks many rows in a table, some database products
will automatically escalate the user’s row locks to a single table lock. The number of
locks is reduced, but the restrictiveness of what is being locked is increased.
Oracle has no mechanism to escalate locks. Lock escalation greatly increases the
likelihood of deadlocks. Oracle automatically detects deadlock situations and
resolves them by rolling back one of the statements involved in the deadlock,
DATA SECURITY 165
thereby releasing one set of the conflicting row locks. This introduction of pre-
emption destroys the basis for the deadlock continuing.
Multi-table deadlocks can usually be avoided if transactions accessing the same
tables lock those tables in the same order, either through implicit or explicit locks.
For example, all application developers might follow the rule that when both a
master and detail table are updated, the master table is locked first and then the
detail table. If such rules are properly designed and then followed in all applica-
tions, deadlocks are very unlikely to occur.
Oracle keeps its transaction log in a rollback segment. The old values of updated
rows are written in the log, identified by a transaction identifier, pending a commit
or rollback. The updated values are written straight to the database.
When a row is updated or deleted during a transaction, data blocks in the user
tablespace and in the rollback segment are changed. Oracle records all changes to
data blocks, including rollback entries, in the redo log. The redo log is essentially a
journal mechanism.
This second recording of the rollback information is very important for active
transactions (not yet committed or rolled back) at the time of a system crash. If a
system crash occurs, Oracle automatically restores the rollback segment informa-
tion, including the rollback entries for active transactions, as part of instance or
media recovery. Once the recovery is complete, Oracle performs the actual roll-
backs of transactions that had been neither committed nor rolled back at the time
of the system crash.
Redo entries record data that can be used to reconstruct all changes made to the
database, including the rollback segments. The online redo log also protects roll-
back data. When the database is recovered using redo data, Oracle reads the change
vectors in the redo records and applies the changes to the relevant rollback
segment or data tablespace.
Journals Records all data updates and log entries. May be used to
restore database or log from a previously saved version
8.11 Î QUESTIONS
(i) Describe in detail, giving examples, the kinds of events that a transaction
manager is meant to prevent.
(ii) Show in each case which property or properties of a transaction offer a specific
remedy.
(iii) Explain the necessary conditions for deadlock. Illustrate what methods can be
used to resolve deadlock and what condition they eliminate.
(iv) Explain why it is that optimistic concurrency control methods, which might
offer better throughput statistics, are not more used by database vendors.
8.12 Î EXERCISES
(i) Research the Oracle concurrency control system and write brief notes on all the
types of locks at its disposal.
(ii) The aim of a transaction management system is to provide data safety while
not over-burdening the server and compromising performance. Starting with
the material in this chapter and researching reliable sources on the Internet,
write a short report, evaluating Oracle against other vendors in this regard.
CHAPTER 9
FURTHER SQL
Science means simply the aggregate of all the recipes that are always successful;
the rest is literature.
Paul Valéry, Moralités
Up until now, retrievals have been concerned with information exactly as it was
entered into the database. Often, elements of this data have been chosen for the
selection set through restrictions and projections (via SQL select statements).
However, the selection set has always been a subset of the tuples held in the
relation.
There are, however, a set of problems for which such a strategy is inappropriate.
For example, if it is necessary to know how many players play for each Liga Utopia
club the method of selection is limited to the SQL statement in Figure 9.1. This
results in an ordered list, with duplicates, of clubs. To answer the question asked, a
manual count of each repetition of each club has to be carried out. This is hardly a
convenient solution.
At this juncture, SQL needs to move beyond the confines of the relational algebra
and allow for the aggregation (summarization) of tuples. This is achieved using the
group by clause in the select statement (Figure 9.2).
In essence, the group by clause is instructing the DBMS to produce a single
summary tuple for each set of players that play for a particular club. In this
example, only one tuple is produced per club, regardless of how many players play
for that club. This still does not answer the question. To find out how many players
play for each club, aggregate rows produced by the group by clause need to be
queried with what is known as an aggregate function.
167
168 SYSTEMS BUILDING WITH ORACLE
select club
from players
order by club;
CLUB
------------------------------
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Beauty United
Purity FC
Purity FC
Purity FC
...
select club
from contracts
group by club;
CLUB
----------------
Beauty United
Purity FC
2 rows selected.
There are a number of such functions specified in SQL. Some of the most
commonly used aggregate functions are listed in Table 9.1.
Figure 9.3 shows the count function in use. Like each of the other aggregate func-
tions, count performs a calculation on an attribute(s) of the underlying tuples for
FURTHER SQL 169
CLUB COUNT(*)
------------------------- ----------
Beauty United 11
Purity FC 11
2 rows selected.
each aggregated row. In this example, the use of a specific attribute is spurious – the
use of any attribute, or all attributes (*), will result in the same count. The other
functions in Table 9.1 are not so forgiving and require a numeric attribute to be
specified as the basis of their calculation.
The group by clause in the Figure 9.3 statement means that, as before, only two
rows will be returned by the query. The count function meanwhile has kept a tally
of how many underlying tuples have contributed to each aggregated row in the
final output.
Each of the common functions can be included in the selection set to calculate
not only the number of players associated with each club, but also the club’s total
salary bill for their players, as well as the average, minimum and maximum salaries
for such players (Figure 9.4).
It should be noted, somewhat counter-intuitively, that each of these aggregate
functions could be used across a set of non-aggregated tuples! The total number of
players in the league, their total salary bill and the average, minimum, and
170 SYSTEMS BUILDING WITH ORACLE
select club, count (*), sum (salary), avg (salary), min (salary),
max (salary)
from contracts
group by club;
2 rows selected.
1 row selected.
maximum salaries for all players, would be produced via the statement in Figure
9.5.
This is because once an aggregate function has been used on the select line, an
implicit aggregation of all of the selected tuples into a ‘single group’ row will occur.
Only aggregate functions are allowed in the selection set of such a statement. Any
non-aggregated attributes in the selection set must appear in a group by clause. The
statement in Figure 9.6 therefore results in an error.
Aggregate functions can be nested in the selection set as in Figure 9.7. Care must
be taken to ensure that the query evaluates to a single group only for the outermost
function.
Without a group by clause, the max function performs an implicit aggregation
across the whole table, leaving nothing for the count function to aggregate.
FURTHER SQL 171
Select count(max(salary))
From contracts;
ERROR at line 1:
ORA-00978: nested group function without GROUP BY
Inserting the group by clause in Figure 9.8 means that the max function will
return two aggregated rows, one for each club, leaving the count function to return
a single value aggregated across the whole table. Because only one group by is
allowed within a query, this is the maximum level of function nesting permitted.
Any further analysis must be done with a sub-query, as in Figure 9.9.
Here the selection domain is itself an aggregated sub-query that returns eight
rows (there are four position values for each club). Each row consists of the club
select count(max(salary))
from players, contracts
where contracts.ufa_id=players.ufa_id
and left is null
group by players.club;
COUNT(MAX(SALARY))
------------------
2
1 row selected.
select count(max(salary))
from (
select players.club, avg(salary) as salary from players, contracts
where contracts.ufa_id=players.ufa_id
and left is null
group by players.club, position)
group by club;
name and an average salary for each position type. These rows are then aggregated
into two rows by the max function grouped by club and finally counted. When
using a sub-query it is important to match return column names and types with the
main query’s usage.
9.2 Î HAVING
It is possible to use a where clause as well as a group by clause in a single select state-
ment. However, the where clause operates as a restriction on the base tuples, not on
the aggregated rows produced by the group by clause. For instance, the SQL state-
ment in Figure 9.10 generates an error.
CLUB COUNT(*)
------------------------------------
Beauty United 8
Purity FC 10
2 rows selected.
CLUB COUNT(*)
------------------------- ----------
Purity FC 10
1 row selected.
no rows selected
The problem with this syntax is that there are two where clauses in the same state-
ment: one for the base tuples and one for the aggregated rows. This would be both
difficult to interpret and easy to get wrong! In any event, the Oracle syntax checker
will refuse to allow this query.
The solution is the introduction of the having clause to act as an alias for the
aggregate where clause.
This new clause allows both base and aggregate restrictions in the same SQL
statement. The statement in Figure 9.13 will exclude players with a salary of less
than €500,000 from being aggregated into the count for their club. Further, once
aggregation has taken place, any club with fewer than ten well-paid players is
restricted from appearing in the final output.
Some care in formulating queries is still needed and an appreciation of how the
query is implemented assists in this. The query in Figure 9.14 is syntactically
correct but yields no rows. The aim was to count those players who earn more than
€500,000 euros but less for the average salary for their club. Some players fit into
this category, but the query is not finding them. Why?
The answer is in the group by clause. The base rows are being grouped on salary so
that salary can appear in the having clause. This satisfies a first-level syntax rule: all
base attributes must be grouped in an aggregate query.
The avg(salary) aggregate is therefore being calculated for each set of players in
the same club and with the same salary. Since no one earns more than the average
in such circumstances, the query returns no rows.
The having clause is applied only when the preceding parts of the statement have
produced an intermediate relation. If the first part of the query is applied sepa-
rately, then Figure 9.15 shows that 18 players have salaries over €500000. Two, who
play for Purity FC, have been grouped on salary to give an average of their identical
10 rows selected.
@utlxpls
salaries of €750000. In all cases, because of the grouping the player’s salary is equal
to the average salary of that group. Notice that the count(*) aggregate is similarly
counting only players with the same salary who play for the same club.
When the having clause is applied to this intermediate relation, not
unsurprisingly, it yields no rows.
The answer, as with many problems in computing, is to divide the query into its
component parts. First, find the average salary for each club in a sub-query and,
using roles, join that intermediate relation with the full contracts relation and
perform the restricted count
Figure 9.16 shows the result for this strategy. The contracts relation has role a
and the intermediate result of the sub-query has role b. This intermediate relation
FURTHER SQL 177
CLUB SUM(COUNTSTAR)
------------------------- ------------
Beauty United 5
Purity FC 6
2 rows selected.
is itself the result of an aggregation and has two columns: club and average(an alias
for avg(salary).
A join between a (restricted on salary>500000) and b, predicated on equality of
the club attribute, yields a further intermediate relation. This is then aggregated in
the group by clause and further restricted in the having clause. These operations can
clearly be seen (reading upwards) in the explain plan output for this query shown
in Figure 9.17.
However, the original question has not been fully answered. The query in Figure
9.16 needs further summarization to yield a report similar to that in Figure 9.11. In
Figure 9.18, the query from Figure 9.16 has been embedded in another query. The
count(*) has been aliased and the new encapsulating query is now using the result
of the previous statement as an intermediate relation in its selection domain.
Ever since its standardization by the American National Standards Institute (ANSI)
in 1986, and its adoption as an international standard by the International Organi-
zation for Standardization (ISO) in 1987, SQL has continued to evolve as a database
language.
This evolution has taken the form of named, standardized, formal revisions of
the language. The first of these was SQL89 (released in 1989), which introduced,
178 SYSTEMS BUILDING WITH ORACLE
among other minor changes, the concept of declared integrity constraints. It is this
version of SQL that is often mistakenly referred to as SQL1.
The next revision of the SQL standard was released in 1992 and is formally
known as SQL92, more commonly referred to as SQL2. This was a major update to
the standard, adding some 600 pages to the formal specification. These additions
included enhanced schema manipulation and dynamic creation and execution of
SQL statements, as well as network environment features for connection and
session management. It is this version of SQL that is invariably referred to as stan-
dard SQL and which has, to different degrees, been implemented by the major rela-
tional database vendors.
It is important to note that each release of the specification has had as one of its
major goals complete upward compatibility with all previous releases of the stan-
dard. For example, a database application developed using an SQL89-based
product should still function correctly if transferred to an SQL92-based
environment.
The current release of SQL is formally referred to as SQL:1999. It is more
commonly referred to as SQL3. This is by far the most major revision, not only of
the language, but also of the underlying data model upon which the language is
based. This is reflected in the fact that SQL3 was in development for the best part of
a decade prior to its official release as SQL:1999, and development of parts of its
original specification has been postponed until a future release of the standard
(Table 9.2).
The reason for the delay lies in the fact that SQL:1999 has transformed SQL from
a relational data language into an object relational data language applied to an
object relational data model. The principal sections of the SQL:1999 Foundation,
sometimes known as Core SQL:1999, are shown in Table 9.3. The full document, in
180 SYSTEMS BUILDING WITH ORACLE
PDF form, can be obtained from the American National Standards Institute
(ANSI)1.
Most major vendors claim compliance with the SQL3 Core (Table 9.4). This
group underpins relational database interoperability; the object technology offers
the advantage of a standard in the face of competition from object database
vendors, as does the OLAP section with respect to proprietary multi-dimensional
database vendors.
9.4 Î VIEWS
SQL views are the logical external realizations of a database’s schema from the
perspective of an end-user.
A good relational database implementation will usually be normalized to at least
BCNF. However, although this reduces data inconsistencies, it does so at the cost of
increased DML complexity. Normalization generally replaces a small number of
large tables with a large number of small tables. Therefore, SQL data extraction and
modification of normalized tables is complicated by the common necessity to join
small tables together in order to retrieve data with some inherent relationship.
An SQL view enables the construction of a virtual table that emulates the struc-
ture of an un-normalized table. This may be done in order to simplify query access,
without breaking the underlying relational schema upon which it is based.
As SQL views are logical abstractions on top of a relational schema, they are most
often used to limit the view of the underlying schema to that deemed appropriate
for each set of end-users. Rather than allow access to the complete Players table,
only the player_v view of it, described in Figure 9.19, would be accessible to Beauty
United’s management. Such restrictions on access need to be enforced through the
setting of explicit security groups, with associated access rights, for each set of
users.
The single table view of Figure 9.19 provides a logical subset of the Players SQL
table that only contains the row details of those currently playing for Beauty
United. Figure 9.20 shows the result of selecting all elements of this SQL view.
Figure 9.21 shows a logical view constructed over the join of two fully normal-
ized relational tables: Players and Contracts. Again, this example view is limited to
those players currently playing for Beauty United.
In the example above it is necessary to view only the current contract details
of each Beauty United player. Therefore, the predicate contracts.left IS NULL has
to evaluate to true, i.e. the contract is currently open-ended. As there is a 1:M rela-
tionship between Players and Contracts, without this restriction there would be a
separate view-row created for each past contract (regardless of club) that exists for
each current Beauty United player. This would result in the mistaken creation of
1 http://www.ansi.org/
FURTHER SQL 181
11 rows selected.
multiple rows of information for a single player. Figure 9.22 shows the result of
selecting player details from such a syntactically correct but logically incorrect view.
Views are, under certain conditions, updatable as well as selectable. In effect, the
end-user updates the logical view, and the database server updates the appropriate
underlying table. Figure 9.23 shows the result of an update operation on the SQL
view, players_v, of Figure 9.21. As can be seen, only the relevant row of the Contracts
table has had its salary attribute altered by the update operation.
A potential problem with an updatable SQL view is that an attribute used in the
view’s defining expression could itself be updated. The updated row(s) may then
no longer satisfy the view’s definition and would therefore be dropped from the
view. Figure 9.24 shows an example of this row migration. To avoid this situation a
with check option clause can be used in a view definition.
13 rows selected.
UPDATE players_v
SET salary = salary*1.1
WHERE ufa_id = 'VT432167';
1 row updated.
1 row selected.
3 rows selected.
Figure 9.25 shows an example of the check option in operation. This optional
clause restricts view updates to prevent the removal of a row from the view that has
not been removed from the view’s base table(s). The clause therefore acts as an
additional data integrity, and security, guarantee.
In order for a view to allow updatable operations, it cannot be based on a select
statement that contains any of the following:
G SET operations
G DISTINCT operators
G aggregate or analytical functions
G grouping or sorting operations
G collection or sub-query selection expressions
184 SYSTEMS BUILDING WITH ORACLE
UPDATE player_v
SET club = 'Purity FC';
11 rows updated.
no rows selected
UPDATE player_v
SET club = 'Purity FC';
UPDATE player_v
*
ERROR at line 1:
ORA-01402: view WITH CHECK OPTION where-clause violation
base table were not key-preserved, and multi-row updates were allowed via the view,
data inconsistencies could occur.
In the joined-table view of Figure 9.21, the relationship between the two base
tables, Players and Contracts, is 1:M: each player could have many contracts over a
period of time, even though logically only one of those contracts could ever be
current. However, as the primary key of the Contracts table is a composite of ufa_id
and joined, such a logical restriction is not enforced and, it is therefore technically
feasible to have more than one current contract for any particular player. This
implies that a change to a single row of the Players base table cannot be guaranteed
to affect only a single row in the players_v view.
Conversely, a change to the values of a Player’s attributes within a single row of
the view should not be allowed as there could be an indirect relationship with
other view rows based on the same underlying base row, and inconsistency would
result if only one of those related view rows was altered. For example, it would be
perfectly possible to insert the two rows of Figure 9.26 into the Contracts table.
This would result in the two rows of Figure 9.27 being logically inserted into the
players_v view of Figure 9.21. However, both of these view rows would relate to a
single row in the Players table.
In order to guarantee that such inconsistencies are prevented, only the table on
the child (weak) side of a 1:M joined-table view is deemed to be key-preserved.
Updates to any attributes of the parent table of such a join-table view are always
disallowed. Figure 9.28 documents an attempt to perform such an illegal update.
SET squad_no = 14
*
ERROR at line 2:
ORA-01779: cannot modify a column which maps to a non key-preserved
table
COLUMN_NAME UPDATABLE
------------------------------ ---
UFA_ID NO
SURNAME NO
FORENAME NO
DATEOFBIRTH NO
SQUAD_NO NO
POSITION NO
NATIONALITY NO
CLUB NO
JOINED YES
LEFT YES
SALARY YES
11 rows selected.
shows the corrected query and the result of querying the players_v view of Figure
9.21.
For view columns that are not directly updatable, Oracle provides instead of trig-
gers that perform a user-defined procedure in place of an intercepted DML opera-
tion. Such procedures usually update the view’s relevant underlying base table(s).
Section 13.5.1 describes, and gives examples of, these PL/SQL triggers.
Views Virtual table created from restriction and/or join on base tables.
Used for security purposes or to simplify queries
View updates Updates are generally allowed to a single table view. Join view
updates allowed to key preserved (weak) base tables
9.6 Î QUESTIONS
9.7 Î EXERCISE
(i) Create views to support the SQL queries from this chapter. Test the perfor-
mance of the queries using the tables directly compared with the view queries.
Explain plan will reveal the plans, which should give an indication of
performance.
CHAPTER 10
SERVER-SIDE PROGRAMMING
You pull the trigger and after that you do not understand anything that happens.
Jean-Paul Sartre, Dirty Hands , Act 5, Scene 2
SQL is a declarative language, not procedural or imperative. The art of writing SQL
instructions is based on the ability to define clearly what is required, rather than
programming a set of instructions to meet such requirements.
Select name
From players
Where salary > 1000000;
The question, which players earn over 1,000,000, is translated into an SQL query.
This unambiguously states that the requirement is to see the names of all players
earning over €1,000,000 (Figure 10.1). At no point is there any attempt to show
how these names would be picked from the physical files that make up the
database.
In a relational database, it is the role of the query parser and optimizer to trans-
late the defined requirement into the most efficient query plan and to decide the
appropriate execution strategy of file access and buffer sorts and merges to provide
the answer to such a query. Because the SQL language is based on a formal grammar
connected to the relational calculus, it is capable of provably correct machine
translation into discrete operations.
As early as the 1930s, Alan Turing proved mathematically that all problems of
programming logic and computation could be accommodated by the combination
189
190 SYSTEMS BUILDING WITH ORACLE
An SQL query, no matter how complex, provides a single answer to a single ques-
tion. If a set of questions requires answering, it would be perfectly possible to issue
a series of SQL queries, one per question, and record the answer to each. However,
if the asking of the second question was dependent on the outcome of the first, the
specifics of the third question varied according to the answer to the second, and
the fourth question was repeated for each of the items given in the answer to the
third, this could not be automated using SQL alone.
In such a situation, each of the questions could be considered as a single instruc-
tion, with each instruction being related to other instructions through sequence,
selection and iteration operations. In essence, this would result in a set of declara-
tive statements being woven together through a procedural language.
The requirement to bind various queries together through a procedural program-
ming language is a very common one, and various mechanisms have been built to
facilitate this. These can be classified as either server-side or client-side program-
ming mechanisms.
The client-side mechanisms can be subdivided into the embedding of SQL state-
ments into programming languages (using SQLJ to embed SQL into Java), or the
use of middleware to translate native programming language instructions into SQL
instructions (using JDBC to connect a Java program to a relational database.
Chapter 12 contains a thorough discussion of client–server models and the use of
JDBC.
This chapter concentrates on server-side programming mechanisms that allow
for the procedural programming of sets of queries. It is important to note that, in
the context of database servers, to run a program server-side implies that it will
execute internally within the DBMS itself, rather than just share the DBMS’s phys-
ical host.
SERVER-SIDE PROGRAMMING 191
10.1 Î PL/SQL
DECLARE
DECLARE
...4
...1
BEGIN
BEGIN
...5
...2
END
EXCEPTION
... DECLARE
...3 ...6
END BEGIN
...7
END
This allows for the definition, should they be required, of local and global vari-
ables. Figure 10.3 shows this block structure. Variables declared in section 1 can be
used anywhere in the program. The main execution block (2), contains a nested
declaration and execution block as does the main exception block (3). Variables
declared in (4) are scoped only for (5) and variables declared in (6) are scoped only
for (7).
PL/SQL allows for the creation of five specific programming constructs: anony-
mous block, stored procedure, stored function, package and trigger, each of which will
be described in this chapter.
Each PL/SQL program has development, compilation, deployment, execution
and removal phases. Development consists of the creation of the procedural logic
source code; compilation verifies the source code, binds any declared variables,
and translates it into suitable executable object code (known as p-code); deploy-
ment places the executable code in the database (accessible via the data dictio-
nary); execution implements the executable code; and removal expunges the
executable code from the database.
An anonymous block is so called because it is a block of code without a perma-
nent identifier, and differs from the other constructs in that it is immediately
executed upon successful compilation and therefore does not have separate
deployment and removal phases. All other PL/SQL programs have an identifier
and, once successfully compiled, are stored in the database awaiting execution,
and will remain there until explicitly removed.
In essence, an anonymous block is an encapsulated set of procedural instructions
that are compiled for immediate execution. Since it is not stored permanently in
the database, the source code is lost at the end of the session that created it. Other-
wise, it may be kept in an external text file and loaded into the client tool when
needed. In this respect, and, in terms of its compilation and immediate execution,
its behaviour and status are equivalent to an SQL batch file or an interactive SQL
query.
The PL/SQL compiler and execution component, known as the PL/SQL engine, is
integrated within the Oracle database server, as well as in various client-side tools
such as Oracle Forms, and Oracle Reports. Its role is to compile and execute the
procedural logic statements in the PL/SQL program, and to pass all SQL statements
to the SQL query executive, also located within the database server, for action.
Figure 10.4 shows the architecture of the PL/SQL engine.
As an illustration of the need for procedural constructs and an example of the
development of a PL/SQL program, consider the requirement for Liga Utopia to
calculate each club’s total points for a particular season. The calculation will
involve selecting each match that a club played in that season, determining
whether the club won, drew or lost the match, allocating three points for a win and
one for a draw, and then adding those points to the season’s total for the club. This
calculation will then need to be repeated for all other clubs in the league.
Figure 10.5 provides the basic pseudo-code for such a calculation. This involves
all three basic programming constructs – sequence, selection and iteration – and is
therefore a prime candidate for expressing in a procedural programming language
such as PL/SQL.
SERVER-SIDE PROGRAMMING 193
PL/SQL engine
PL/SQL
PL/SQL block PL/SQL block statement
executor
Oracle server
Figure 10.6 adds further detail to the pseudo-code through the inclusion of
specific SQL statements and a set of local variables. This highlights a major incon-
sistency between the declarative set-based nature of SQL and the procedural
record-based approach of PL/SQL.
That discrepancy is known as an impedance mismatch, and it is evidenced by the
fact that an SQL statement, such as select name from clubs; will return an unknown
number of rows, whereas PL/SQL, like any procedural language, can only work
with a single row or record at a time. The cursor mechanism handles such a
mismatch. This permits a program, having issued a potentially multi-row select
statement, to examine each row in the returned result on a row-by-row basis.
To translate the detailed pseudo-code of Figure 10.6 into workable code, it is neces-
sary to define the local variables used in the declaration section of a block. The
standard syntax for variable declarations is given in Figure 10.7.
DECLARE
variable_name1 Oracle data type;
variable_name2 Oracle data type;
...
variable_nameN Oracle data type;
In the case of the pseudo-code of Figure 10.6, the variable declarations would be
as shown in Figure 10.8.
DECLARE
v_season VARCHAR2(8);
v_club VARCHAR2(25);
v_total_points NUMBER(3);
v_match_id NUMBER;
v_goals_for NUMBER(2);
v_goals_against NUMBER(2);
v_players_club VARCHAR2(25);
v_goals_type VARCHAR2(10);
However, certain of the variables defined in Figure 10.8 are required to hold the
contents of columns belonging to tables defined in the Liga Utopia database. It
would be preferable to link the variable definition explicitly to the equivalent data-
base attribute definition. Doing this will keep the PL/SQL code synchronized with
any changes to the table definition. Figure 10.9 demonstrates how to link PL/SQL
196 SYSTEMS BUILDING WITH ORACLE
DECLARE
v_season Matches.Season%TYPE;
v_club Clubs.Name%TYPE;
v_total_points NUMBER(3);
v_match_id Matches.Match_id%TYPE;
v_goals_for NUMBER(2);
v_goals_against NUMBER(2);
v_players_club Clubs.Name%TYPE;
v_goals_type Goals.Type%TYPE:
The declaration specifies the name of the database table, followed by the name of
the column in the table. The %TYPE instruction causes a call to the data dictionary
to fetch the type definition used to define the column in the table’s create state-
ment. This is then used to define the PL/SQL variable.
Changing the data type of a table attribute may require changes to the source
code. If, for example, an attribute is changed from a character string to a number,
some of the operators applied to it in the program may no longer be appropriate.
The declaration block remains constant; the execution block may need amend-
ment. This will be determined by recompiling the PL/SQL source code and debug-
ging any compilation errors.
If, as in the case of the season–points–club problem, there is a requirement to
process the results of a select statement iteratively, a temporary holding area for the
result set will need to be declared in the PL/SQL declaration section. This type of
variable is referred to as a cursor, and its declaration syntax is shown in Figure
10.10.
Cursor_name is the name of the PL/SQL variable that will be created to contain
the result set produced by the execution of the specified select statement. The
return-type is an optional specification that defines the structure for each of the
individually referenced rows. Figure 10.11 shows the three cursors that are required
for the ongoing season–points–club problem.
SERVER-SIDE PROGRAMMING 197
CURSOR c_club IS
select name from club;
CURSOR c_match IS
select match_id from match
where (homeclub = v_club or awayclub = v_club)
and season = v_season;
Note that it is permissible, indeed often necessary, to place variables in the select
statement of a cursor definition. These variables will be type checked at compile-
time, but not evaluated until run-time. Neither of the first two cursors has a return
type specified. This is common practice when a cursor returns rows that always
consist of a single atomic value, i.e. one column only. Return types are used to
enforce type checking when fetching cursor rows, and they are particularly helpful
in situations where each row returned consists of attributes with different data
types.
Defining the return type of a cursor can be done in one of two ways. If the query
is simple and selects all the columns from just one table then the %ROWTYPE
attribute can be used. If the select statement is more complex and does not select all
columns, uses operators applied to columns or involves more than one table, then
the programmer must define a special return type for the cursor and for the struc-
ture to receive the cursor’s returned rows.
The %ROWTYPE of Figure 10.12 is another example of an anchored declaration. In
this case, it relates the type of row returned from the cursor to the attribute defini-
tions for the Players table. Players%ROWTYPE is an example of a structured data
definition, in that the type can be broken down into a set of constituent attribute
types, one for each named column in the Players table. Figure 10.13 shows the anal-
ysis of this structured data definition.
Players%ROWTYPE
ufa
_id surname forename dateofbirth nationality club
varchar2(10) varchar2(25) varchar2(25) date varchar2(20) varchar2(30)
DECLARE
v_surname Players.surname%TYPE;
v_player_record Players%ROWTYPE;
BEGIN
OPEN c_player;
FETCH c_player INTO v_player_record;
v_surname := v_player_record.surname;
...
END
The open cursor instruction causes the cursor’s defined select statement to be
executed (in the database server), and the result set to be placed in the declared
cursor variable c_player. The fetch statement retrieves the next data row (starting at
the first) from the cursor variable, c_player and places it into the structured vari-
able, v_player_record. The name and type of each element of v_player_record matches
the name and type of each attribute defined in the Players table. The begin and end
statements delimit the execution section of the block.
Returning to the season–points–club example, the third cursor defined in Figure
10.11 uses a user-defined cursor return type. The record returned from each cursor
fetch is not anchored to a row in a database table. It illustrates a constructed vari-
able record type that will have been defined in the PL/SQL block’s declaration
SERVER-SIDE PROGRAMMING 199
section. Figure 10.15 shows that the cursor will fetch each selected data row into a
variable of type v_cursor_rec_type, the v_goals variable defined in the example. The
two parts of a v_cursor_rec_type record (players_club and goals_type) can then be
referenced as v_goals.players_club and v_goals.goals_type.
v_goal v_cursor_rec_type;
The local variables defined in Figure 10.9, the cursor declarations of Figure 10.11
and the cursor record definition of Figure 10.15 form the declaration section of the
implementation given in Figure 10.16.
The declaration section of any block is immediately followed by the keyword
begin. This indicates the beginning of the compulsory execution section of the
block, and implicitly signifies the end of the declaration section.
Figure 10.17 gives a first-cut implementation of the execution section of the
season–points–club example. This implementation is a direct translation from the
pseudo-code of Figure 10.6, and needs to be amended in order to handle a number
of omissions that are not immediately obvious.
Cursors are dynamic in that they are evaluated at run-time, and can use variables
as part of their select statement. The advantage of this is demonstrated in the code.
Such run-time binding allows the cursors to be evaluated multiple times within an
iterative structure, each time with a new value in the predicate clause of the associ-
ated select statement. When the cursor, c_match is opened, causing its select state-
ment to be executed for the first time, it will create a result set containing all the
matches (for the input season) of the first in a list of clubs. When c_match is opened
for the second time, it will result in a result set containing all the matches (for the
input season) of the second in the list of clubs, and so on. In order to reallocate the
memory space used to contain the result set of c_match it is necessary to remove the
previous contents of its result set from memory. This is achieved by using the close
statement to delete the contents of the cursor prior to reopening it with new values
in the predicate clause of its select statement.
PL/SQL uses a colon in front of the equals sign to differentiate the assignment
operator from the comparison operator, and uses <> to denote a not equal condi-
tion. Iteration is achieved within a loop...end loop construct, all operations within
200 SYSTEMS BUILDING WITH ORACLE
DECLARE
v_season Matches.Season%TYPE;
v_club Clubs.Name%TYPE;
v_total_points NUMBER(3);
v_match_id Matches.Match_id%TYPE;
v_goals_for NUMBER(2);
v_goals_against NUMBER(2);
v_goals v_cursor_rec_type;
CURSOR c_club IS
select name from club;
CURSOR c_match IS
select match_id from match
where (homeclub = v_club or awayclub = v_club)
and season = v_season;
it being repeated until the loop is explicitly terminated. Selection is decided upon
through the if...then...elsif...else...end if statement. The elsif and else clauses are
both optional, as demonstrated in Figure 10.17.
The weaknesses exhibited in the first-cut solution are as follows:
G The input source for the setting of the v_season variable has not been estab-
lished.
G There is no mechanism in place to exit the loop...end loop statements once any of
the cursor result sets have been fully processed.
G Once the total club points for the season have been calculated, they are over-
written by those of the next club, prior to being reported to the client initiating
the program.
SERVER-SIDE PROGRAMMING 201
BEGIN
v_season := input;
OPEN c_club;
LOOP
FETCH c_club INTO v_club;
v_total_points := 0;
OPEN c_match;
LOOP
FETCH c_match INTO v_match_id;
v_goals_for := 0;
v_goals_against := 0;
OPEN c_goals;
LOOP
FETCH c_goals INTO v_goals;
IF (v_club = v_goals.players_club AND
v_goals.goals_type <> 'own goal') OR
(v_club <> v_goal.players_club AND
v_goals.goals_type = 'own goal') THEN
v_goals_for := v_goals_for + 1;
ELSE
v_goals_against := v_goals_against + 1;
END IF;
END LOOP;
CLOSE c_goals;
IF v_goals_for > v_goals_against THEN
v_total_points := v_total_points + 3;
ELSIF v_goals_for = v_goals_against THEN
v_total_points := v_total_points + 1;
END IF;
END LOOP;
CLOSE c_match;
END LOOP;
CLOSE c_club;
END;
/
The first problem, establishing an input source for the variable v_season, is solved
through a substitution variable that prompts the user to enter a value for it at run-
time. Figure 10.18 demonstrates a very simple six line anonymous block that takes
input via a substitution variable identified as &input. Figure 10.19 shows the output
generated by the execution of this block. It should be noted that ‘old 4:’ and ‘new 4:’
relate to the contents of line four of the block pre- and post-variable substitution.
202 SYSTEMS BUILDING WITH ORACLE
DECLARE
v_number number;
BEGIN
v_number := &input;
END;
/
The second problem, concerning the exiting of the loop statement once the
cursor records have all been processed, is resolved through the checking of a special
cursor attribute after each fetch operation. A forced exit (Figure 10.20) from within
the loop is performed if the %NOTFOUND attribute returns a value of true. The exit
is contained in a separate statement from the fetch. Its effect is to perform a jump to
the first statement after the next end loop if its condition is evaluated as true.
LOOP
FETCH c_cursor INTO v_cursor;
EXIT WHEN c_cursor %NOTFOUND;
...
END LOOP;
Each declared cursor has four attributes associated with it. These are explained in
Table 10.1. Three of these attributes contain a Boolean value (true or false) and may
be used directly in the condition of an if statement or in the exit condition of a
loop. The %ROWCOUNT attribute may be used in the end condition of a for loop or
in any other situation where the number of rows fetched from the cursor needs to be
known.
SERVER-SIDE PROGRAMMING 203
BEGIN
v_season := &input;
OPEN c_club;
LOOP
FETCH c_club INTO v_club;
EXIT WHEN c_club%NOTFOUND;
v_total_points := 0;
OPEN c_match;
LOOP
FETCH c_match INTO v_match_id;
EXIT WHEN c_match%NOTFOUND;
v_goals_for := 0;
v_goals_against := 0;
OPEN c_goals;
LOOP
FETCH c_goals INTO v_goals;
EXIT WHEN c_goals%NOTFOUND;
IF (v_club = v_goals.players_club AND
v_goals.goals_type <> 'own goal' ) OR
(v_club <> v_goals.players_club AND
v_goals.goals_type = 'own goal') THEN
v_goals_for := v_goals_for + 1;
ELSE
v_goals_against := v_goals_against + 1;
END IF;
END LOOP;
CLOSE c_goals;
IF v_goals_for > v_goals_against THEN
v_total_points := v_total_points + 3;
ELSIF v_goals_for = v_goals_against THEN
v_total_points := v_total_points + 1;
END IF;
END LOOP;
CLOSE c_match;
DBMS_OUTPUT.PUT_LINE(v_club || ' ' || v_total_points);
END LOOP;
CLOSE c_club;
END;
/
Although the solution provided in Figure 10.22 fully answers the season–points–
club problem, there are often potential run-time issues that need to be addressed
when designing a PL/SQL program. For example, Figure 10.25 shows a very simple
anonymous block used to display the surname of a player whose ufa_id is input.
The select statement will, at most, return one row of data (ufa_id is the unique
primary key of the table) so there is no need to use a cursor to handle a possible
multiple row result set. However, there is a possibility of an incorrect ufa_id being
206 SYSTEMS BUILDING WITH ORACLE
PL/SQL block
DECLARE
...
...
BEGIN
...
...
END
entered. This would result in no rows being returned from the select statement and
the forcing of a NO_DATA_FOUND error condition (Figure 10.26).
To provide the programmer with control in situations such as this, where poten-
tial errors can be anticipated, PL/SQL provides the optional Exception Section. This
section is located between the last Execution Section statement and the block’s
terminating end statement. Incorporating an exception handler in this section
allows the error condition to be anticipated and dealt with inside the program.
Figure 10.27 demonstrates how the NO_DATA_FOUND error condition can be
trapped and handled so that the program fails gracefully.
Figure 10.28 demonstrates this program in operation. The NO_DATA_FOUND
exception is one of Oracle’s predefined error conditions that is included with every
Oracle installation. Other pre-defined exceptions are listed in Table A.4.
As well as predefined error conditions, there are two other groups of error condi-
tions that can be handled in PL/SQL: non-predefined database exceptions, and
user-defined exceptions.
A non-predefined database exception is an existing Oracle error condition that
does not posses a pre-defined exception name. For example, Oracle has a error code
SERVER-SIDE PROGRAMMING 207
DECLARE
v_ufa_id varchar2(10);
v_name varchar2(25);
BEGIN
v_ufa_id := &ufa_id;
select surname into v_name from players
where ufa_id = v_ufa_id;
dbms_output.put_line('Player ' || v_ufa_id || ' is ' || v_name);
END;
/
Correct keyboard
Enter value for ufa_id: 'VT432167' entry
old 5: v_ufa_id := &ufa_id;
new 5: v_ufa_id := 'VT432167';
Player VT432167 surname is DIOCLETIAN
DECLARE
v_ufa_id varchar2(10);
v_name varchar2(25);
BEGIN
v_ufa_id := &ufa_id;
select surname into v_name from players
where ufa_id = v_ufa_id;
dbms_output.put_line('Player ' || v_ufa_id || ' surname is ' ||
v_name);
EXCEPTION
WHEN NO_DATA_FOUND THEN
dbms_output.put_line('Player ' || v_ufa_id || ' does not exist!');
END;
/
----------------------------------------
There are two alternate ways to handling such an error condition in a PL/SQL
program. Firstly, it is possible to use the catch-all exception handling clause WHEN
OTHERS in the exception section. This will handle any Oracle error condition that
is not already being specifically handled. The main problem with this approach is
that it is too general to distinguish between different types of error. It therefore
does not allow different responses from the program when faced with varying error
conditions. It is most often used as a final check to ensure that a PL/SQL program
exits cleanly when an unanticipated error is encountered.
DECLARE
v_season varchar2(10);
v_home_club varchar2(25);
v_away_club varchar2(25);
invalid_match_club EXCEPTION;
PRAGMA EXCEPTION_INIT (invalid_match_club, -2291);
BEGIN
v_season := &season;
v_home_club := &hometeam;
v_away_club := &awayclub;
insert into matches values
(newmatch.nextval, v_home_club, v_away_club, SYSDATE, v_season);
dbms_output.put_line('Match: ' || v_home_club || ' V ' || v_away_club || 'inserted
OK');
EXCEPTION
WHEN invalid_match_club THEN
dbms_output.put_line('Error: club does not exist!');
WHEN OTHERS THEN
dbms_output.put_line('Error: (' || SQLERRM || ') TRAPPED');
END;
/
returns an error message associated with the most recent Oracle error (Figure
10.31).
Exceptions are evaluated top-to-bottom, so the generic WHEN OTHERS clause
appears last in the exception section of a PL/SQL block.
A user-defined error condition is declared in a similar way as a non-predefined
database error.
There is no need for the pragma association as the error is not directly related to
an Oracle error condition.
The error condition is user-specific, i.e. related to the business logic of the
program, as opposed to the data logic of the database server, so the raising of the
error condition also has to be user-defined. Figure 10.32 gives an example of the
raising of such an error condition when the number of players in a given club is less
than eleven.
Anonymous PL/SQL blocks are perfectly acceptable for interactive queries that
require an element of procedural processing, such as the example used in the
SERVER-SIDE PROGRAMMING 211
----------------------------------------
previous section. There are a number of overheads and limitations associated with
the use of PL/SQL anonymous blocks that require addressing.
The first of these problem areas concerns the cost associated with compiling the
PL/SQL block of code each time it requires processing. If the code is to be processed
rarely this is an acceptable overhead. However, if the code is to be processed often,
and is unlikely to be subject to change between executions, then the cost of the
second and subsequent compilations is an unnecessary overhead to the user.
The second problem lies in the fact that an anonymous block has to be available
at the client for transmission to a PL/SQL engine for compilation and execution.
If there is only one client then there is no problem. However, each subsequent
client will have to gain access to a copy of the code. This could be resolved by
placing the source code in a file on a central repository from which it can be loaded
to individual clients. This, unfortunately, introduces problems of its own in terms
212 SYSTEMS BUILDING WITH ORACLE
DECLARE
v_ club varchar2(25);
v_player_count number(2);
insufficient_players EXCEPTION;
BEGIN
v_club := &clubname;
select count(*) into v_player_count
from players where club = v_club;
IF v_player_count < 11 THEN RAISE insufficient_players;
END IF;
EXCEPTION
WHEN insufficient_players THEN
dbms_output.put_line
('Club ' || v_club || ' only has ' || v_player_count ||
'players!' );
WHEN OTHERS THEN
dbms_output.put_line('Error: (' || SQLERRM || ') TRAPPED');
END;
/
of increases in network traffic (to obtain the source code), and potential inconsis-
tencies between the code at different clients if the source code at the repository is
subject to change over time. It also probable that the database operators will find
loading such files troublesome and error-prone.
The solution to both of these problems is thankfully the same. Rather than store
the source code at a central repository, store the compiled code at the database
server and allow it to be executed from the client. This avoids unnecessary compila-
tion each time the code needs processing, and allows any client (with sufficient
security clearance) to initiate such processing without first having to access the
source code.
PL/SQL provides such a solution in the form of PL/SQL stored programs, of which
there are two specific categories: stored procedures and stored functions.
From a programming point of view there is very little additional coding required to
change an anonymous block into a stored procedure. Indeed, it could be argued
that the act of giving an anonymous block a name is all that is required to convert
SERVER-SIDE PROGRAMMING 213
it. However, apart from an identifier, a stored procedure has one other major
programming advantage over an anonymous block: the ability to accept input
parameters.
Figure 10.33 shows how a procedure is given an identity, taking the source code
from the season–points–club example. A block header is created that contains the
name of the procedure, and the DECLARE statement is replaced by the IS keyword.
Stored programs allow for input, output and input–output parameters. The
v_season substitution variable is replaced by the input parameter i_season. The
v_goal v_cursor_rec_type;
CURSOR c_club IS
select name from clubs;
CURSOR c_match IS
select match_id from matches
where (homeclub = v_club or awayclub = v_club)
and season = i_season;
SQL*PLUS> @c:\points.sql
Procedure created.
Once the procedure has been successfully compiled, it is available for execution.
Such execution is initiated via a client request either through a client process such
as SQL*Plus, or from another program already executing within the database
server. The form of a procedure request from a separate client process is given in
Figure 10.35.
The form of a procedure request made from within an executing PL/SQL process
is given in Figure 10.36.
procedure_name (input_parameters);
Figure 10.37 shows the procedure call from SQL*Plus and the resulting output.
set serveroutput on
SQL*PLUS> execute club_points ('2003/04')
AC Compassion 0
Beauty United 12
Bountiful Athletic 0
Forgiveness FC 0
Freedom United 0
Grace Town 0
Honesty United 0
Knowledge City 0
Liberation United 0
Nirvana City 0
...
set serveroutput on
execute club_points
BEGIN club_points; END;
*
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00306: wrong number or types of arguments in call to 'CLUB_POINTS'
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored
*
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00306: wrong number or types of arguments in call to 'CLUB_POINTS'
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored
As well as allowing the use of input parameters, procedures can also use output
and input–output parameters. Such parameters, when combined with the block-
structured nature of PL/SQL, enable dynamic interchange of variables between
calling and receiving PL/SQL programs.
The match_goals procedure to find the number of goals scored for, and against, a
particular club, in a particular match (Figure 10.40), is called from a separate
club_points procedure that calculates the total points scored for each club in a
particular season (Figure 10.41). The output resulting from such an interaction is
shown in Figure 10.42. In essence, this example takes the earlier procedure of
Figure 10.33 and splits its functionality across two new procedures.
Note that the logic of the new procedure has been suitably altered to ensure that
the club the player was playing for at the time of the match is credited with any
goals scored by that player, rather than, as mistakenly calculated in our earlier
examples, the club for which the player is currently playing. Hence the difference
in output values!
Load and compile match_goals before club_points. If match_goals does not exist
when club_points is compiled, the call to match_goals in it will cause a compilation
error.
Input–output (IN OUT) parameters, such as those used in match_goals, allow vari-
ables to be passed from calling programs to receiving programs, for the receiving
program to alter their values accordingly, and for the updated values to be passed
back to the calling program for further calculation. Figure 10.42 shows a section of
the output produced by the execution of the club_points procedure. This demon-
strates the changed values of the io_goals_for and io_goals_against input–output
variables before and after each call to match_goals. Input variables are read-only,
whereas output variables are write-only, in that they can only be assigned values
from within the called procedure.
SERVER-SIDE PROGRAMMING 217
BEGIN
OPEN c_club;
LOOP
FETCH c_club INTO v_club;
EXIT WHEN c_club%NOTFOUND;
v_total_points := 0;
OPEN c_match;
LOOP
FETCH c_match INTO v_match_id;
EXIT WHEN c_match%NOTFOUND;
v_goals_for := 0;
v_goals_against := 0;
DBMS_OUTPUT.PUT_LINE ('Execute match_goals( ' || v_club || ',' ||
v_match_id || ',' || v_goals_for || ',' ||
v_goals_against || ' )');
match_goals(v_club, v_match_id, v_goals_for, v_goals_against);
CLOSE c_match;
DBMS_OUTPUT.PUT_LINE
(v_club || ' ' || v_total_points);
END LOOP;
CLOSE c_club;
END club_points;
/
v_goal_rec v_cursor_rec_type;
A function always returns a result, even if only a null, to its calling point. A PL/SQL
function is like a function in any other programming language. It is called, as part
of an assignment operation, to carry out a predetermined calculation, often with
run-time input parameters, and returns the calculated result to an assignment vari-
able. An example of a function call is in Figure 10.43, which calculates the total
salary of all players currently contracted to play for a particular club:
v_club_salary := club_sal(i_club);
The name of the function is club_sal, and the name of the club whose salary bill is
to be calculated is represented by the input parameter i_club. When this assign-
ment expression is executed (generally from within another block), club_sal is
called with the current value of the i_club variable substituted as its input param-
eter. The value returned from the club_sal function is then assigned to the
v_club_salary variable.
A PL/SQL function evaluates to an assignable value, and as such is inextricably
tied to an assignment operation. To achieve the same result with club_sal as a
procedure would require v_club_salary to be declared within club_sal as an output
parameter.
Procedures can only pass back values through out or in–out parameters. Func-
tions pass back values intrinsically as their return type. Functions can accept out
and in–out parameters, but their use is considered a side-effect and is poor
programming practice.
One of the main advantages of using a function rather than a procedure is that a
function can be used in the selection set of a select statement. The club_sal function
can easily be called for each club in the league (Figure 10.44).
The coding of a function is almost identical to that of a procedure save for its
declaration name type and return type, and the need for a return expression within
the execution section. Figure 10.45 shows the code for the club_sal function.
SERVER-SIDE PROGRAMMING 221
It is often the case that a number of procedures and functions are created for
specific application areas, to provide a set of related functionality. In such cases, it
would be convenient to group the related programs within a common framework.
It is precisely this role that is provided by PL/SQL packages.
It is not only procedures and functions that can be contained within a package. It
is also possible to include any combination of the following PL/SQL constructs:
G variables
G cursors
G constants
G exceptions
G PL/SQL tables
All such elements fall within the scope, or context, of the package that declares
them, and as such can be considered as ‘global variables’ accessible by any of that
package’s programs.
A particular advantage of using packages to collect related functions and proce-
dures is one of performance improvement over the use of standalone functions and
procedures. In a standalone situation every program is loaded into process memory
the first time it is called from a client process. If there are three standalone func-
tions, all of which are called from a single procedure, there will be four separate
222 SYSTEMS BUILDING WITH ORACLE
loads of compiled p-code from disk to memory (one for the procedure and one for
each of the functions). This is potentially a very inefficient mechanism, and it is far
better to load all of the related programs in one operation. As soon as an element of
a package is referenced, all elements of that package are loaded into process
memory. This is a much more efficient mechanism, provided that only related
items are included in any particular package.
A package is somewhat similar to the concept of an object in an object-oriented
programming language. An object has a set of attributes and a set of methods that
can act upon the attributes. Each method within the object can call any other
method belonging to that object. Any other object can call a method of the given
object provided it has sufficient privileges and knows the signature of the particular
method it is interested in. The call must use the fully scoped name of the object’s
method and supply any required parameters (Figures 10.46 and 10.47).
object_name.method_name(input_parameters);
END club_points;
v_goal_rec v_cursor_rec_type;
CURSOR c_goals RETURN v_cursor_rec_type IS
select clubs.name, goals.type from goals,
clubs, contracts, matches
where goals.ufa_id = contracts.ufa_id
and clubs.name = contracts.club
and matches.match_id = goals.match_id
and goals.match_id = i_match
and matches.matchdate between contracts.joined
and NVL(contracts.left,'31-DEC-9999');
BEGIN
...
END match_goals;
END clubstat;
Figure 10.49 shows a section of the body of the PL/SQL package for club related
statistics. In this example, the only variables defined are the named procedures and
function. A package body is required only when its specification includes func-
tions, procedures or cursors.
Figure 10.50 shows a standalone package specification that requires no package
body. The user-defined exceptions declared in the package are available to any PL/
SQL module, provided their fully scoped names are used.
Figure 10.51 shows an anonymous block that makes use of the one of the pack-
aged exceptions and provides action code for it. User response to the input state-
ment results in the exception being raised.
DECLARE
v_sal NUMBER;
v_club Clubs.name%TYPE;
BEGIN
v_club := &club;
v_sal := clubstat.club_sal(v_club);
dbms_output.put_line(v_sal);
if v_sal is NULL then raise ligaerror.no_salaries_found;
end if;
EXCEPTION
when ligaerror.no_salaries_found then
dbms_output.put_line
('No salaries assigned to club ' || v_club);
END;
BEGIN
FOR v_week IN 1..38 LOOP
IF matchdates.matchdates_table(v_week) >= SYSDATE THEN
dbms_output.put_line
(matchdates.matchdates_table(v_week));
END IF;
END LOOP;
END;
ERROR at line 4:
ORA-06550: line 4, column 44:
PLS-00225: subprogram or cursor 'MATCHDATES' reference is out of scope
ORA-06550: line 3, column 3:
PL/SQL: Statement ignored
A package declaration gives its variables global scope; the package description
shown in Figure 10.54 would enable the anonymous block to reference
matchdates_table successfully.
set serveroutput on
execute clubspkg.matchdates('14-AUG-2003')
Week 1 - 14-AUG-03
Week 2 - 21-AUG-03
...
Week 38 - 29-APR-04
PL/SQL procedure successfully completed.
SQL> BEGIN
2 FOR v_week IN 1..38 LOOP
3 IF clubspkg.matchdates_table(v_week) >= SYSDATE THEN
4 dbms_output.put_line
5 (clubspkg.matchdates_table(v_week));
6 END IF;
7 END LOOP;
8 END;
14-AUG-03
21-AUG-03
...
29-APR-04
PL/SQL procedure successfully completed.
Figure 10.55
PL/SQL triggers are blocks of code that contain the details of DML event notifica-
tions that the block will respond to, as well as the programmed response to such a
notification being received. Figure 10.57 shows the general structure and syntax of
a trigger.
The first line of a PL/SQL trigger registers the unique name by which it will
be called and executed. Note, however, that PL/SQL triggers operate within a
different namespace than other named blocks, and it is the programmer’s responsi-
bility to ensure that the same name is not also used for a procedure, function or
package.
The second line of the trigger registers interest in the events to which the trigger
(acting as an event consumer) wishes to be notified, and the SQL table to which
those events will apply. The registration syntax also indicates whether events of
interest are those that are associated with DML operations that have already taken
place or are associated with DML operations that are about to take place.
For instance, if the operation of interest was an insert into a particular table, the
event of interest is either that an insert has taken place or that an insert is about to
take place. The after keyword is used to indicate that interest is in those operations
that have taken place, and the keyword before is used to indicate that interest is in
those operations that are about to take place.
Because a before-trigger occurs prior to the execution of the DML operation that
initiated or fired it, it is possible for the trigger’s processing to alter the data
contents of that DML operation. This is a very powerful mechanism, and is most
often used to add transparently derived column values to inserted table rows. The
cost of using a before-trigger that alters the data contents of the SQL operation upon
which it is based is the need for an additional logical read of the (possibly updated)
data once the trigger has completed and before the data is written to disk.
The for each row optional expression indicates the granularity of event notifica-
tion. With this expression, an event notification will be raised for each and every
row affected by the DML statement of interest. Without such an expression a
single notification will be raised regardless of the number of rows affected. The
decision as to the appropriate level of granularity to adopt is, of course, a matter for
SERVER-SIDE PROGRAMMING 231
All notifications that are received which contain data values that do not evaluate
to true in the filter will simply be discarded and thus avoid any unnecessary calls to
the trigger’s execution section. Because notification filtering is row-oriented, a
when clause can only be used with a row-level trigger.
A row-level trigger has access to the before-image and after-image of each data
row affected by the triggering event. This information is passed to the trigger as
part of the event notification that fires it. If the triggering event is an update opera-
tion, then both a before-image and after-image will be populated, and will respec-
tively contain the attribute values before the update took place and the attribute
values once the update takes place. With a before-trigger these after-image values
are subject to change prior to the after-image being applied to the database.
An insert trigger only has after-image data available, and a delete trigger only has
before-image data available. After-image attributes are identified with the prefix
NEW, and before-image attributes with the prefix OLD. In Figure 10.58, the notifi-
cation filter is based on the value of the club attribute once the update that caused
the trigger to fire has been processed.
232 SYSTEMS BUILDING WITH ORACLE
The PL/SQL trigger in Figure 10.59 is a typical example of a row-level trigger used
for audit purposes. The trigger responds to a notification that an update has been
made to the contracts table by inserting a row into the audit table.
DECLARE
v_log_details VARCHAR2(100);
BEGIN
v_log_details :=
(OLD.UFA_ID || ' Old Salary: ' ||
:OLD.salary || ' New Salary: ' || :NEW.salary);
INSERT INTO AuditLog (datetime, user_id, details)
VALUES (LOCALTIMESTAMP, USER, v_log_details);
END contracts_au_trigger;
The SQL statement for creating the audit table is given in Figure 10.60. Note the
use of the Oracle 9i data type TIMESTAMP. This is used because it acts as a more
precise measure of time than the more familiar DATE type. The TIMESTAMP data
type measures time intervals in hundredths of seconds rather than in full seconds,
as with the DATE type. The PL/SQL function LOCALTIMESTAMP in Figure 10.59
returns the current date and time based on the geographical location in the data-
base configuration. If this is set to GMT, it will return the date and time in the UK
and automatically adjust for British Summer Time.
When a trigger is compiled, its generated p-code is saved in the database and is only
loaded and invoked when the DBMS generates a suitable event notification
following an operation of interest. Such invocation is implicit from the perspective
of the trigger developer and completely transparent from the perspective of the
SERVER-SIDE PROGRAMMING 233
database client. It is this transparency that makes PL/SQL triggers an ideal mecha-
nism for auditing purposes.
The user issuing the DML operations is unaware that their actions are being
monitored, and, if necessary, recorded. The user who issues the update operation
shown in Figure 10.61 is completely oblivious to the fact that this operation gives
rise to a series of event notifications, each of which initiates the row-level trigger
contracts_au_trigger.
UPDATE contracts
SET salary=salary*1.10
WHERE club = 'Beauty United'
AND left IS NULL;
Such a user is completely unaware that the before and after details of each row
affected by the update operation, together with the identity of the user initiating
the change, and the date and time of each update, are recorded and held in an audit
log. Figure 10.62 demonstrates the contents of such an audit log following the
update operation of Figure 10.61.
11 rows selected.
Figure 10.62 Implicit PL/SQL insert statements affect the auditlog table.
234 SYSTEMS BUILDING WITH ORACLE
A trigger can respond to more than one type of event notification. However, it is
only possible for a trigger to be associated with events that affect one specific SQL
table.
In order to distinguish which DML operation is responsible for firing a PL/SQL
trigger, an event identifying function, a type of conditional predicate, is provided
for each of the three DML types. Updating is set to true if an update operation has
been executed. Inserting is set to true if an insert operation has been executed.
Deleting is set to TRUE if an delete operation has been executed (Figure 10.63).
DECLARE
v_log_details VARCHAR2(100);
BEGIN
IF UPDATING THEN
v_log_details :=
('U Original Record: ' || :OLD.UFA_ID || :OLD.CLUB ||
:OLD.JOINED || :OLD.LEFT || :OLD.SALARY ||
' Updated Record: ' || :NEW.UFA_ID || :NEW.CLUB ||
:NEW.JOINED || :NEW.LEFT || :NEW.SALARY );
ELSIF INSERTING THEN
v_log_details :=
('I Inserted Record: ' || :NEW.UFA_ID || :NEW.CLUB ||
:NEW.JOINED || :NEW.LEFT || :NEW.SALARY);
ELSE
v_log_details :=
('D Deleted Record: ' || :OLD.UFA_ID || :OLD.CLUB ||
:OLD.JOINED || :OLD.LEFT || :OLD.SALARY);
END IF;
INSERT INTO AuditLog (datetime, user_id, details)
VALUES (LOCALTIMESTAMP, USER, v_log_details);
END contracts_aiud_trigger;
/
triggers are a useful mechanism where the details of the rows affected by a DML
event are not of interest, but the event itself is.
The creation of a single record per DML operation that records the user, time of
operation and nature of operation may be all that is required from an audit trigger.
Another example would be to use a before-statement-level trigger to ensure that
certain users are only allowed to carry out DML operations on a particular table at
specific times during the day. Such a trigger would abort those operations
attempted by that set of users outside of those times.
It is possible to apply multiple triggers to a single DML operation. For instance, it
is perfectly acceptable to have a before insert row-level trigger, an after insert row-
level trigger, a before insert statement-level trigger, and an after insert statement-
level trigger, all associated with insert operations to the same SQL table. In essence,
each PL/SQL trigger registers interest in the same event, and will be notified when
such an event takes place. However, the notifications of that event will arrive in a
pre-defined order, thus ensuring that the triggers themselves are fired in sequence.
Table 10.2 documents that sequence.
A single SQL table can potentially have 12 separate trigger types defined upon it,
four for each of the three DML operation types: insert, update and delete. Moreover,
multiple instances of each specific type (with distinguishing names) can be created
and fired. It would be perfectly possible to have two, or more, before update row-
level triggers, each interested in a separate attribute of the same SQL table.
Such multiple instances of same-type triggers have no pre-defined order of
execution. In order to minimize unnecessary programming complexity they are
probably best avoided. Indeed, there are only a limited number of circumstances
where it would be useful to have more than one trigger for any particular DML
operation.
236 SYSTEMS BUILDING WITH ORACLE
10.10 Î EXERCISE
(i) Implement the season–points –club example in this chapter. Extend the func-
tionality to provide a sorted league table with points and goals for and against.
CHAPTER 11
DATABASE ARCHITECTURE
An Oracle database is a logical storage unit. It has a name and its data is stored in
several physical files, spread across the host machine’s disks. The files are used for
user data, for data recovery and to store control information. The files have a
different internal format according to their purpose and they belong exclusively to
the named database.
A database manager like Oracle is not just a single piece of software. It consists of
a number of different programs, which run as separate processes on the host
machine. Each process has a specific function within the general objectives of effi-
ciently storing and presenting the data, securing and protecting transactions and
ensuring reliability and availability
Most Oracle installations run in a client–server configuration: programs running
on client machines communicate with the database host through a network.
Providing an interface for remote access to the data management functions and
hence to the data itself is a vital component of the overall package. The interface
should be able to deal with any number of different hardware choices and network
standards and provide a seamless entrance for user programs.
237
238 SYSTEMS BUILDING WITH ORACLE
11.1 Î TABLES
The data area of an Oracle installation may consist of one or more physical files on
the disk. The sizes of these files are set in the installation parameters. As the data
grows, further physical files can be added. Oracle views each of the physical files as
belonging to a tablespace. A tablespace consists of one or more physical files and
can span one or more disks. There will be a tablespace for system use and a
tablespace for user data (Figure 11.1).
Within a tablespace there is a logical division known as a segment. There are four
kinds of segment, each with different internal organization. Tables are held in a
data segment. Indexes are stored in an index segment. Within the system
tablespace, Oracle uses a rollback segment for data security. A temporary segment
may be created to hold intermediate structures during a query. Segments can span
physical files within the same tablespace.
A segment consists of extents. An extent is a contiguous allocation of data
blocks: the fundamental unit of storage. The block size in bytes is specified in the
installation parameters. Once established, it is constant.
When a table is created, a logical segment is created for it. An initial extent,
consisting of a number of data blocks, is allocated to the segment. Rows in the table
are stored in a data block. The block size may be large enough to contain several
rows, or the row may be large and span two or more neighbouring blocks.
As the number of rows in the table grows, further extents are allocated to the
table segment. Eventually an extent consisting of blocks from another physical file
within the tablespace may be allocated. Segments can span files on different disks
within the same logical tablespace.
Because rows are not necessarily inserted into a table in one operation, the
extents that make up a segment may not be contiguous on the disk. The segment
maintains a map of its extents. Each extent contains a map of its contiguous data
blocks (Figure 11.2).
When a table is created in Oracle, initial space is allocated for the storage of a
number of rows. Each row occupies the same number of bytes, which approximates
to the sum of the sizes of each of its attributes. The initial storage space is allocated
by default, but may be set in the create command. When the number of rows
DATABASE ARCHITECTURE 239
increases to fill the initial space, an extension extent is allocated to the segment.
The size of the extension extent is set by default, but again may be varied in the
create command. Further extension space will be allocated as the table grows until
the entire tablespace is occupied. At this time, an additional physical file must be
added to the tablespace by the DBA.
The Oracle system keeps track of space allocation for each user table and maps
the allocated space so that the table may be viewed by the user as a logical block of
data, even if the table spans several physical files. The space allocated for the initial
and extension extents is configurable. Bigger sizes mean fewer extents in the table
segment and less mapping work when performing queries; it may also mean wasted
space for small tables.
11.2 Î INDEXES
When a table is created with a primary key, Oracle automatically creates an index
for the table in an index segment. The index, like any user object, occupies an
240 SYSTEMS BUILDING WITH ORACLE
initial extent, and if its size is large may need extension extents to be added. Oracle
uses indexes based on binary tree structures.
The index makes use of a special pseudocolumn, ROWID, that Oracle maintains
for every table. This indicates the row number of each table row mapped over the
entire table’s space allocation (Figure 11.3).
Name ROWID
AC Compassion 16
Beauty United 1
Bountiful Athletic 10
Forgiveness FC 4
Freedom United 14
Grace Town 19
Honesty United 3
Knowledge City 17
Liberation United 8
Nirvana City 9
... ...
A binary tree is a data structure used for sorting and searching. Each data item
occupies a node in a tree. A node has two pointers, left and right, pointing to two
sub-trees. Any data item that contains a value less than a higher node in the tree is
placed in the left sub-tree. A data item containing a higher value is placed in the
right sub-tree. The shape of the tree reflects the order of insertion and may have
asymmetrical sub-trees.
The root of the tree in Figure 11.4 is occupied by the key data item Liberation
United. Searching for Grace Town, go left at Liberation and right at Forgiveness. The
direction is determined by a comparison of the search data and the node data.
Follow the left pointer if the search data is less than the node data; follow the right
pointer if it is greater.
This tree has four levels between its root and the leaf nodes. Any node can be
found by a maximum of four comparisons. Knowing the depth of the tree and
therefore the work involved in finding a key is important in estimating optimal
strategies for queries.
Figure 11.5 shows how a binary tree could be implemented in an index structure.
The index consists of rows and columns like a table. The first column is the ROWID
of the index rows, the second is the key value and the third is a value for the ROWID
DATABASE ARCHITECTURE 241
Liberation United,
ROWID
AC Compassion, Bountiful Athletic, Grace Town, Knowledge City, Freedom United, Purity City, Thrift City, Virtue Thursday,
ROWID ROWID ROWID ROWID ROWID ROWID ROWID ROWID
in the base table, Clubs. Two further columns, Left and Right, contain ROWIDs to
other rows in the index.
Starting at the root, Liberation United, follow the Left value (4) to Foriveness FC.
Then follow the Right value (6) to Grace Town. This has a Right value of (7), pointing
to Honesty United, which has zeros in Left and Right and must be a leaf node. Several
clubs have values for Left and Right, indictaing that they have not yet been placed
in the tree. Inserting those clubs will create a new level in the tree and the existing
zeros of some leaf nodes will change.
In addition to the automatic creation of an index following the designation of the
Name attribute of the Clubs relation as a primary key, other indexes can be manually
created on other attributes or sets of attributes. Any such indexes will be automati-
cally maintained as new rows are added and if any key attribute values are amended.
The creation of additional indexes will aid some of the queries needed in the UFL
application, but they will add to the maintenance overhead of the application.
Users may not query or use indexes directly. That is open only to the Oracle query
optimizer mechanism.
The tree in this example is a simple binary tree. A balanced binary tree, a B*-tree,
has symmetrical sub-trees. A B*-tree is more efficient for searching. Branch nodes
in a B*-tree show only a key range, indicating the contents of the two sub-trees. The
actual key values and corresponding ROWIDs are kept in the leaf nodes. Because
the sub-trees are symmetric, all leaf nodes are at the same level below the root. All
searches involve the same number of comparisons and cost the same number of
disk I/Os. See Section 15.1.1 for a further discussion of B*-tree indexes.
11.3 Î METADATA
The distinguishing feature of relational databases from their predecessors was the
unity of metadata with data. The hierarchical and Codasyl databases keep struc-
tural information about the record and its position in terms of direct disk
addresses. Relational databases keep this information in a data dictionary, using
the same table structures as the data to which it refers.
Although the metadata or data dictionary schema is complex, it is understandable in
terms of the relational model. Oracle keeps information about the existence of tables in
a table called Tables and of the columns of user tables in a table called Columns.
These data dictionary tables are stored in the same physical files as user tables but
in a segment of the file reserved for system use. Unless the DBA designates them
publicly searchable, they are not normally visible to users.
Figure 11.6 shows that user smithw has created a table called Clubs with four
attributes. User markc has also created a table called Clubs, but the attributes do not
have the same names and do not have precisely the same types.
There is not a problem; the two tables are stored quite separately and, because the
two users’ objects are completely segregated by Oracle’s security mechanisms, the
two tables do not interfere with each other. smithw’s updates will be applied only to
the table owned by him and markc cannot access tables not owned by him.
DATABASE ARCHITECTURE 243
Indexes are recorded in a table called Indexes. This also has a compound foreign
key linking its tuples to the base user table.
Figure 11.7 shows that user smithw has created an index called Clubs_PK, based
on the primary key (Name attribute) of his Clubs table. He has created similar
primary key indexes for the tables Players and Goals. He has also created an index
called StadMan based on the Clubs table but with a compound key of Stadium and
Manager in that order.
Within a tablespace, there will be data segments for each table, index segments for
each index and a rollback segment where the database manager stores recovery
data during transactions. The rollback segment is used to provide read consistent
data for transactions and to re-establish a consistent database for failed
transactions.
Figure 11.8 shows the physical files that go to make up an Oracle database. Each
database is logically divided into one or more tablespaces. One or more datafiles
are explicitly created for each tablespace. The combined size of a tablespace’s
datafiles is the total storage capacity of the tablespace. SYSTEM tablespace has 100
MByte storage capacity, while USER tablespace has 400 MByte. The combined
storage capacity of the database in this example is the total storage capacity of the
tablespaces – 500 Mbyte.
Redo log
OR1.ctl OR2.ctl
2 MByte 2 MByte
If user data segments and index segments fill the whole 400 MByte then further
physical files will need to be created and added to the USER tablespace.
The most effective way of organizing these files is to place them all on separate
disks each with their own I/O controller. This allows a degree of parallelism for a
single job if a large table can be stored within the same tablespace, spread across
several disks and reduces contention if several jobs require access to tables stored
DATABASE ARCHITECTURE 245
on different disks. This will enhance performance. Security for the database is also
improved if the tablespaces and the redo log are segregated. Mirroring the redo log
provides an additional level of protection.
Every Oracle database has one or more physical datafiles. A datafile can be associ-
ated with only one database. A database’s datafiles contain all the database data.
The data in a datafile is read, as needed, during normal database operation and
stored in memory caches or buffers.
An Oracle database has a set of two or more redo log files. The set of redo log files
for a database is known as the redo log. A redo log is made up of redo entries, each
of which is a group of change vectors describing a single transaction applied to the
database. The primary function of the redo log is to record all changes made to the
data and to the rollback segment. If data modifications cannot be permanently
stored because of a disk failure, the changes can be obtained later from the redo log
and work is never lost.
Every Oracle database has a control file. A control file contains entries that
specify the physical structure of the database. It contains the database name and
the names and locations of the datafiles and redo log files.
Every time an instance of an Oracle database is started, its control file is used to
identify the database and redo log files that must be opened for database operation
to proceed. If the physical makeup of the database is altered, the database’s control
file is automatically modified by Oracle to reflect the change.
the parse tree and execution plan for the corresponding statement. Multiple user
processes that issue identical SQL statements share a single parse tree and execu-
tion plan, leaving more shared memory for other uses.
The SGA and the set of Oracle background processes (Table 11.1) constitute an
Oracle instance. At this point, no database is associated with these memory struc-
tures and processes.
Database Writer Writes modified blocks from the database buffer cache to
the datafiles. Writes occur only when more data needs to
be read into the SGA and too few database buffers are
free.
Log Writer Redo log entries are generated in the redo log buffer of
the SGA, and written sequentially into the online redo log
file.
Checkpoint A checkpoint writes all pending modifications in the SGA
to the datafiles. This process signals the database writer at
checkpoints and updates the control files to show the most
recent checkpoint.
System Monitor Performs crash recovery when a failed instance restarts.
Also coalesces free extents within the tablespaces to make
free space contiguous and easier to allocate.
Process Monitor Performs process recovery when a user process fails,
cleaning the cache and freeing any locked resources.
Dispatcher Present when a multi-threaded server is used. Each
dispatcher routes requests from connected user processes
to the shared servers and returns responses.
Mounting a database associates the database with that instance. The instance
finds the database control files and opens them. The control files are read to get the
names of the database’s datafiles and redo log files.
Opening a mounted database makes it available for normal database operations.
Oracle opens the online datafiles and the online redo log files. The database is now
available for any valid user to connect and to issue SQL statements (Figure 11.9).
Server processes are created, once an instance is established, to handle requests from
user processes. A server process interacts with the stored data to carry out requests
of the associated user process. For example, if a user queries some data that is not
DATABASE ARCHITECTURE 247
Database
buffer cache PMON
Redo log
buffer
Shared pool
LGWR
DBW Multi-
threaded Redo log
server
Dispatcher
Data files
User
process
Network User
Checkpoint Control
process
file
User
process
already in the database buffers of the system global area, the server reads the data
blocks from the datafiles into the SGA and returns the results to the user process.
Oracle can be configured to vary the number of user processes per server. In a
dedicated server configuration, a server process handles requests for a single user
process. A new server is launched as each new user process connects.
A multi-threaded server configuration allows many users to share a single server,
minimizing the number of server processes and maximizing the utilization of
available system resources. When a multi-threaded server is configured, a
dispatcher is launched as a separate background process.
The dispatcher is specific to the network protocols of the user process. It
multiplexes the communication channels with connected users across the
network, performs any necessary data type conversion and manages the connect
status of the user processes. The dispatcher works at the presentation and session
248 SYSTEMS BUILDING WITH ORACLE
levels of the OSI network model. (For a discussion on the multi-threaded mecha-
nism, see Section 12.1.1.)
The following example illustrates an Oracle configuration where the user and asso-
ciated server process are on separate machines (connected via a network).
Access Control Determine whether user who issued this statement has
authorization to access the schema objects in the
query
Syntactic Checks Check that the query conforms to SQL syntax. Are the
clauses well-formed and are SQL functions correctly
spelt
Semantic Integrity Confirm the existence of elements referred to in the
Control selection set, the join or restriction conditions. Ensure
that the query will not offend against integrity rules
Query Decomposition Construct alternative algebraic expressions that will
and Optimization render the required result and choose the most effi-
cient plan
Access Plan Store the plan, map table reads, index usage, tempo-
Management rary tables, buffer sorts, merges and writes to action
the preferred plan.
Execution Control Perform the query, checking for consistency violations
caused by updates, insertions and deletions
Concurrency Control Engage control structures (locks and lock lists) for
transaction isolation. Preserve read consistency or
serializable transaction mode
Logging For write operations, maintain the database buffers,
the rollback buffers and the redo log buffers
Intranet
Local area network,
clients and server
use TCP/IP. No
Internet connection
Internet
Clients and host at
different locations
connected directly to
Internet the Internet
Extranet
Two or more LANs
at different locations
Internet connected via a
secure channel
through the Internet
The seven layers of the OSI network model provide a conceptual framework for
communications across a network. Each node on a network must provide within its
networking software the functionality contained within the OSI stack if communi-
cations are to be established and run reliably.
TCP/IP is overwhelming the protocol of choice for open systems networking.
Where a mixture of hardware platforms and different operating systems must
communicate transparently across networks constructed from different physical
media, TCP/IP has the advantage of being independent of equipment and software
vendors.
DATABASE ARCHITECTURE 251
Figure 11.11 shows the OSI/ISO seven layer model for network services. TCP/IP
provides standard services that approximate to the transport and network layers.
Network equipment manufacturers are generally responsible for compliance with
international standards operating below IP. These might be LAN specifications
such as Ethernet or Token Ring, which regulate packet structures and the physical
interfaces that connect nodes and interconnect networks. Often, their software,
which provides a link between IP and the network media, is bundled with oper-
ating systems like Windows XP or Linux.
Application vendors are responsible for ensuring that the functionality of the
presentation and session layers above TCP, are provided by their products (Figure
11.12). Oracle has developed software which is positioned between the application
and the network stack and which provides presentation and session layer services.
A user process communicates with a server process through the program interface.
The interface provides presentation and session services for both the client and the
server.
On the client side, the software may be bound to the application when it is
compiled and standard Oracle libraries are linked to the executable. Alternatively,
it may be present in external drivers (libraries) such as ODBC or in Oracle-provided
Java classes such as JDBC. On the server side, separate processes receive client
requests and route them to the appropriate server process (Figure 11.13).
Embedded 1 Multi-
library/
Application Listener Dispatcher Threaded
ODBC/
JDBC Server
11.10 Î EXERCISES
(i) Investigate structures other than B-trees that can be used for indexes. Write a
report which discusses their advantages or disadvantages over B-trees.
(ii) Write a program to load data items from a file into a simple B-tree. Investigate
the various orders these data can be output using a tree walk.
PART 3
IMPLEMENTING THE
SYSTEM
CHAPTER 12
CLIENT–SERVER MODELS
The first section of this book introduced the idea of separating the common func-
tions for data storage and retrieval from the interface and business logic functions.
This results in smaller application programs communicating with a database
manager and requesting data access services.
This raises two questions: how do the applications communicate their requests
to the DBMS and how does the DBMS handle requests arriving simultaneously
from the applications?
257
258 SYSTEMS BUILDING WITH ORACLE
COBOL application
Database manager
COBOL application
COBOL application
COBOL application
‘suspended’ or ‘wait’ queue. Jobs that are ready to run are kept in the ‘run’ queue
from where they are given their time slot on the CPU.
The information needed to allow a process to be stopped and restarted is called
the process context. The queues are made up of process context blocks that
describe the current state of the process. A PCB contains the code segment – the
memory address or addresses where the program is loaded, an instruction counter
showing which instruction is next to be loaded to the CPU, the addresses of the
program’s data segments and so on. The scheduler runs on the CPU every time a
process is ‘stopped’ in order to perform queue management. Allocating CPU time
to a process consists of loading the process context into the registers of the CPU
and passing control to the next instruction in the process context. This is called a
context switch. The scheduler manages the queues, moving jobs between queues
and from the run queue to the CPU according to the scheduling algorithm (Figure
12.2).
act on them and place results back in the reserved segment for the application to
pick up. The memory segment acts like a letterbox, enabling a dialogue between an
application and the DBMS.
Clearly, each such dialogue can either take place in separate segments or else
access to a shared segment has to be controlled so that one application’s request
does not interfere with another’s returned results. The use of semaphores and
monitor functions in operating systems allows shared segments of memory to be
managed efficiently and safely.
The operating system scheduler maintains a run queue of distinct processes using
a linked list of process context blocks. By analogy, it should be possible for a single
process, required to handle concurrent service requests, to maintain a similar
internal queue structure consisting of the contexts of each of the service requests
within the single process. Within a process, each of these service contexts is called
a thread of control.
The thread contexts are managed internally by the process, much as the sched-
uler manages the process queues. Within the CPU time slice allocated by the oper-
ating system, a dispatcher thread allocates time to each of the user threads that it is
currently managing. There will be at least one thread for every concurrent service
request, together with the manager thread (dispatcher). Each of the thread
contexts will contain information about the process’s common data areas, the
addresses of any thread specific data and a program counter showing which
instruction within the process is to be loaded next. Processes capable of handling
multiple service requests by this method are said to be multi-threaded.
It can now be seen how the DBMS software and many copies of the user applica-
tion can be resident in the same machine; the applications and the DBMS commu-
nicate and the DBMS can handle multiple concurrent requests from the
applications. Until the early 1980s, all of these processes would have been resident
in the same mainframe. Although the mainframe had plenty of memory to accom-
modate the processes, a single CPU still had to run all the applications as well as the
DBMS.
The development of cheap personal computers and of faster and more reliable
network standards led system architects to consider removing the user applications
to individual PCs and to communicate requests remotely to a dedicated high-
performance machine where the DBMS would not have to share CPU time directly
with users.
In the 1980s the key phrase client–server came into vogue. This simply meant that
the server software is positioned on one machine and multiple clients on several
other machines. The client software is responsible for the user interface and for
most of the business logic. It communicates with the server through a network. The
server, based on a multi-threaded architecture, receives service requests from each
260 SYSTEMS BUILDING WITH ORACLE
of the client machines and responds to the client through the network 1. The key
difference is that the server machine’s CPU is relieved of the work now associated
with the client processes. As the PC gained power, it became able to support a
graphical user interface (GUI). Machines hosting the server software can have
special I/O devices: fast channel hard drives, large memory caches, designed to
augment their performance in meeting client requests.
Clearly, a new mechanism was required to facilitate communication between the
client and the server. The first solution was the Remote Procedure Call (RPC). In
order to understand how the remote procedure call works, consider Figure 12.3.
#include <stdio.h>
This simple program has a main function that receives control when the program
executes. It has a function, add(), which is called in the main function. The func-
tion add() takes two integer parameters and returns a single integer, the result of
adding the two input integers. Compiling this program will end not just in the
transformation of the simple statements to machine code.
A mechanism is inserted by the compiler to enable the line of control to jump to
the address of the subroutine, retaining access to the two input parameters as well
as information on how to return to the calling position (Figure 12.4).
The program makes use of a stack structure to communicate between the main
function and the add() function. The data is marshalled to enable the procedure
call.
Making a remote procedure call uses this same principle but there are two addi-
tional problems. The addresses of variables cannot be passed, since the client
program and the server program are running on two machines with different and
1 Bruce Nelson originated the idea of the Remote Procedure Call (RPC) in his PhD thesis in
1981. He later co-authored a paper on the subject: Birrell and Nelson (1984).
CLIENT–SERVER MODELS 261
autonomous memory spaces. Allowance must be made for the fact that the two
machines may have different processors with different data representations. An
integer on one may occupy 64 bits, while on the other it may occupy 128 bits.
The marshalling for a remote procedure call makes use of a standard data type
called external data representation (XDR). Parameters are passed as values rather
than addresses and are converted to XDR before transmission across the network.
There is a defined protocol within RPC of call and response so that a simple
dialogue where the add() function is part of the server might look like the exchange
illustrated in Figure 12.5.
It should be noted that in this exchange the client, having sent the request, waits
for the server response. The client is said to have made a blocking call. This is a
synchronous mode of operation.
When the client does not need to wait for a response but moves on, leaving the
server to complete the request or when the client is itself multi-threaded and
spawns a thread to make the call while the main line of control moves on, the call is
non-blocking and the mode of operation is asynchronous.
Most UNIX or Linux systems feature a tool call Rpcgen that takes a specification
for server functions written in a syntax similar to the C language. It produces
headers and C program source for a simple client and for a lightweight (multi-
threaded) server. The specification file used by Rpcgen can be thought of as an
interface definition for the client and the server.
262 SYSTEMS BUILDING WITH ORACLE
Client Server
Translate values of a and b into
machine independent structures (XDR)
Assemble translated values and proce-
dure request into transmission block
Establish connection to server
Respond to connection request
Transmit block to server; wait for
return message
Receive transmission block
Translate data from XDR to machine-
dependent format; call local procedure
The programmer then completes the client logic and the code for the actual
server functions. A sample client–server program written in C can be found on the
web site associated with this book
When the client and the server are both Java programs, a variation of the RPC is
used. This is called Remote Method Invocation (RMI) and it is specific to the Java
language.
The RMI architecture is more complex than RPC because of the nature of Java as
an interpreted language and because of the Java object model.
CLIENT–SERVER MODELS 263
When designing a client and server that will use RMI 2, the first stage is again a
definition of the interface: the methods that will be called from the client. Method
signatures (declarations) are specified in Java in an interface file. This is compiled
using the RMI compiler (rmic) which produces a stub class and a skeleton class
(Figure 12.6).
Client Server
The stub and skeleton provide methods for marshalling the data and for estab-
lishing a communications channel across a network. There is no need for data type
conversion since all Java programs use the same basic types, but there is a need for
objects identified by memory pointers to be dereferenced from one memory space
to the other and for complex structures to be serialized for transmission across the
network as byte streams.
The client classes and the server classes are written in Java with references to the
stub or skeleton classes as appropriate and compiled with the Java compiler, javac.
Each Java class actually runs in a Java Virtual Machine (JVM). The JVM is the
active process recognized by the operating system. Network communication can
only be made between processes. The client JVM cannot call to the server JVM
directly. The client class will be trying to invoke a method within a server class that
is hidden from the operating system within the server JVM.
An additional process is therefore needed to identify the specific JVM (there
could be several in a multi-user machine) in which the requested method resides. A
process called the RMI registry runs on the server machine. It is assigned a specific
port number and listens for requests on that port.
When a server class starts up, it registers its presence with the local RMI registry.
The client invokes a method request via the registry process (using the machine
address and the registry port number). The request is then passed on to the appro-
priate JVM process that registered the method. The skeleton class associated with
the server class then establishes a network channel with the stub class associated
with the calling client.
From then on, the dialogue is similar to the RPC protocol. The server class
receives the request along with associated parameters, acts on it and responds with
a result. If the request is successful, this may be a return value or a stream of values.
If the request causes an error, the response will be an exception message that must
be handled by the client.
This way of invoking a remote method depends on the server class registering the
method at it starts up. If the server has not already started, it may be started auto-
matically through a procedure known as activation.
The client needs no changes in order to use activation. The server class is made
activatable so that if it not already instantiated and registered it may be started
when a request is received.
An additional process, the RMI daemon (rmid) must be running as well as the RMI
registry. The rmid provides a JVM from which other JVM instances may be
spawned.
A special setup class is written for each activatable class on the server. The setup
class creates all the information necessary to instantiate the activatable class,
without actually creating an instance of the remote object. The setup class passes
information about the activatable class to rmid, registers a remote reference and a
name with the rmi registry, and then exits.
The client makes the request in the normal way. The rmi registry has an activation
entry for the requested method so passes it to the rmid, which starts up a new JVM
invoking the server class with it. A communication channel between the client and
the newly activated class is opened through the stub and the skeleton classes as
before.
and would even need code modifications in order to work properly with that data-
base software. This became quite a problem for third-party software houses whose
accounting, personnel, purchase order or CAD systems were expected to use
Oracle, Ingres, Sybase or SQL Server according to their customers’ preferences.
Eventually the database vendors agreed on a standard, issued as the X/OPEN Call
Level Interface and eventually confirmed in SQL:1999. CLI is primarily aimed at a
distributed transaction processing (DTP) environment with data sources resident
in a variety of database systems. The aim of CLI is that it should comply with the
functionality defined within SQL92. Further revisions to the standard will bring it
up to compliance with SQL:1999. OCLI is a minimum standard. Many database
vendors have additional functions that extend CLI functionality (Figure 12.8).
User application
Figure 12.8
It is possible to write a program that accesses a database at the Call Level but such
an application, despite the X/OPEN standard, will probably still need to be sepa-
rately complied and linked with vendor-specific libraries. The CLI standard has
reduced the necessary modifications (if any) in adapting an application to use
different products.
Alternatively, a higher level of functionality, at some intermediate stage between
SQL and the Call Level, can be defined as an API. The client is written to comply
with this and a run-time service is provided to translate the intermediate level to
the Call Level. This the concept behind ODBC and JDBC.
266 SYSTEMS BUILDING WITH ORACLE
Application
Oracle ODBC driver SQL Server ODBC driver Informix ODBC driver
This architecture means that the application does not need to be changed in order
to access Oracle, Informix or SQL Server. Access is achieved using the correct ODBC
driver that translates the ODBC calls it receives from the client application into Call
Level (RPC) functions specific to the remote database for which it was written.
The Driver Manager is a dynamic link library (DLL) that manages communication
between applications and drivers. It maintains a list of data source names that
correspond to the different databases to which applications on the client machine
might connect. These names indicate the type of database, its connection details
and which driver to load for applications requesting connection to the datasource.
CLIENT–SERVER MODELS 267
The application needs only to be linked to the Driver Manager and calls ODBC
functions in the Driver Manager, not the driver. The application identifies the data
source name in making such calls and the Driver Managers loads the correct driver
into memory on the client machine.
The Driver Manager passes ODBC calls from the application to the driver. Drivers
are also dynamic link libraries that implement the functions in the ODBC API.
Each driver is specific to a particular database management system. A driver for
Oracle cannot directly access data in an Informix DBMS.
The driver translates ODBC calls passed to it from the application into OCI func-
tions and forwards them to the database. Returned data or messages are packaged
as the returns of ODBC functions and forwarded to the application. As far as the
database server is concerned, it is receiving OCI calls with appropriate parameters.
The translation process and the actual nature of the client application are hidden
from it.
Multiple drivers can coexist, which allows the application simultaneous access
to more than one data source. This makes ODBC a potential vehicle for imple-
menting a heterogeneous distributed database. The application uses ODBC in such
a manner that the application is not tied to any one DBMS, although it might be
tied to a limited number of DBMSs that provide similar functionality (Table 12.1).
In addition to the tasks performed at the application level the ODBC driver
undertakes data conversion between the client and the server and maintains the
connect session with the DBMS. It thus provides presentation and session level
services above a network stack such as TCP /IP.
Using ODBC requires the Driver Manager and the vendor specific driver to be
present on the client machine. The driver manager is distributed with most
Windows variants. The vendor is usually responsible for providing the driver.
268 SYSTEMS BUILDING WITH ORACLE
In the end though, the database can only receive CLI requests. Sun defines four
types of driver that can provide the translation between the high-level JDBC
request and the database. These are described in Table 12.2.
Software vendors like Borland, who offer the JBuilder Java developer tool,
include a type 1 JDBC –ODBC bridge with their package. This has the advantage of
compatibility with a Windows environment that includes the ODBC driver
3 The JDK Version 2.0 Enterprise Edition (J2EE) is available from http://java.sun.com/j2ee/
docs.html (July 2003).
CLIENT–SERVER MODELS 269
manager. When the application is deployed, a specific ODBC driver must also be
installed on the client machines and the datasource names separately
administered.
Oracle and other database vendors offer type 2 JDBC–OCI drivers. When the
application is deployed, these have a complex installation requirement for special
libraries and networking software. The number of communications phases
involved in using this type of driver can have performance implications, but the
speed of the native code library usually compensates for this (Figure 12.10).
Type 3 drivers are generally written by the client development team for specific
circumstances that prevent the use of other types of driver. A type 3 driver commu-
nicates with an external server, possibly on a remote machine using a protocol
such as RMI or IIOP 4. This server then performs the database access or may even
forward the request to another process to perform the access. Because of their
complexity and the need to write bespoke software, type 3 drivers have not
achieved significant usage.
Type 4 drivers are written in pure Java and specific versions are supplied by
each database vendor. Figure 12.10 shows that this driver type has the simplest
Database
Type 2 driver vendor
DLL CLI
JDBC classes
Remote
Type 3 driver middleware
package CLI
Type 4 driver
CLI
configuration. Presentation and session layer services are included within the
driver as well as the translation facility for CLI calls. This may account for their
relative popularity. There are doubts about performance for Java-only drivers,
although advances in price/performance ratios for client workstations make this
factor increasingly less significant. The driver software libraries must be included
as part of the deployment package with the application.
The ability to interface a Java application with more than one vendor database
makes JDBC a simple vehicle for the construction of a distributed database. In a
situation where the databases are owned and maintained by separate organiza-
tions, there may be situations where the pooling of information would be advanta-
geous (Figure 12.11).
Application
JDBC API calls
Oracle JDBC driver SQL Server JDBC driver Informix JDBC driver
For example, the motor insurance industry exchanges data about claims in order
to combat fraud. If a central association were given limited access to each member’s
database, then a relatively simple Java application could summarize claims infor-
mation. This would enable queries to show all the claims involving the same
vehicle, regardless of underwriter.
jdbc:<subprotocol>:<subname>
jdbc:odbc:MyAccessDatabase
jdbc:<subprotocol>:<<host>:<port>:<sub-subname>>
jdbc:oracle:thin:@orion.uel.ac.uk:1521:bisora
import javax.sql.*;
public JDBCConnection()
throws SQLException, ClassNotFoundException {
Class.forName("oracle.jdbc.driver.OracleDriver");
The driver manager has a method for constructing a connection, thus ensuring
that the connection is associated with a particular driver. The URL supplied as the
first parameter tells the getConnection method what type of driver to use, the loca-
tion of the database host, the port number of the listener and the database name.
Additional parameters give the username and password. Statements are associated
with a connection so the connection class has a method, createStatement that calls a
statement constructor.
...
Connection conn = DriverManager.getConnection
("jdbc:oracle:thin:@orion.uel.ac.uk:
1521:bisora", "scott", "tiger");
Statement stmt = conn.createStatement();
The ResultSet class provide a flexible structure, similar to an array but with several
important differences. Firstly, the ResultSet is dynamically configured on assign-
ment with the same column names and data types as the columns in the returned
data. The columns in the ResultSet may be referred to by name or by an index: 1, 2, 3
etc.
Secondly, although the ResultSet can contain as many rows as are returned by the
query, access to the data by the program is limited to one row at a time. The
ResultSet is the mechanism for overcoming the impedance between set-oriented
SQL and the record-oriented programming language.
The ResultSet may have different mechanisms for moving through the returned
rows. These are determined by properties of the statement to which the ResultSet is
assigned. The default behaviour is to use the concept of a cursor indicating the
current row. Immediately following the query execution the cursor may be vali-
dated by calling ResultSet.next(). This returns a Boolean, indicating whether the
operation was successful (the query returned at least one row) and, if it was, estab-
lishes a cursor pointing at the first row.
The contents of this current row may now be accessed in the program. Calls to
ResultSet.next() move the cursor forward, enabling program access to the subse-
quent rows. When the cursor is moved beyond the last row, it returns false. By
default, the cursor only moves forward. The alternative is to make the cursor
scrollable, in which case it may be moved forwards and backwards.
Unless specified, the data returned by the query is treated as a snapshot taken at
execution time. Any subsequent changes caused by other users updating or
inserting rows will not be reflected in the ResultSet. The ResultSet may be made
sensitive to concurrent changes in the database. These changes will reflect
committed transactions by other users and uncommitted changes made in the
274 SYSTEMS BUILDING WITH ORACLE
running program. The ResultSet can also be configured to make non-SQL updates or
to be read only.
Figure 12.15 shows how the options are implemented through the statement.
Both examples will make the ResultSet scrollable when it is constructed. In the first
fragment, the ResultSet is made insensitive to concurrent changes and read-only,
disallowing non-SQL updates. In the second example, it will reflect concurrent
changes and non-SQL updates are enabled.
...
Statement stmt = conn.createStatement(
ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.executeQuery("SELECT id, type FROM PARTS");
...
...
Statement stmt = conn.createStatement(
ResultSet.TYPE_SCROLL_SENSITIVE,
ResultSet.CONCUR_UPDATABLE);
ResultSet rs = stmt.executeQuery("SELECT id, type FROM PARTS");
...
JDBC 2.0 thus gives two ways to make updates in the database: a standard SQL
statement and a ResultSet update.
Figure 12.16 shows both alternatives. The first consists of a single program line
and can clearly contain SQL to change just one row, as shown, or many thousands
of rows. The second method involves several program lines and may be adapted to
support a graphical user interface where multiple rows are retrieved and data in the
ResultSet underlies a grid display, allowing the user to browse the rows and interac-
tively update values in individual cells of the grid.
Often the values of attributes in a query are not known before run-time. These
values may be set interactively by the user, who may not have the necessary skills
or permissions to supply the full statement. In such a case, the query may be
prepared with placeholders inserted where the interactive values should be intro-
duced. The example in Figure 12.17 shows the construction of a JDBC prepared
statement with two placeholders (the question marks). The actual values are then
inserted using the statement’s type sensitive methods. The first of these assigns the
contents of part_Id to the first placeholder as an integer; the second assigns the
contents of partType to the second placeholder as a string. Although, in this case,
the values are embedded in the program code, they could just as easily have come
from user input through the program’s GUI.
stmt.execute();
stmt.close();
conn.close();
Once the statement is executed, the statement and the connection are closed.
This is good programming practice. Although the JVM’s garbage collection
routines will usually recover the unassigned remnants of a finished program,
explicit recovery is generally quicker and more reliable.
276 SYSTEMS BUILDING WITH ORACLE
Most database systems allow procedural code to be stored in the database itself. The
PL/SQL language is used by Oracle and may be stored as a procedure or function
body within a package. The performance experiment in Section 17.6 uses a package
called makeCustomer that contains one function and one procedure.
Each time it is called, the procedure fillCustomer will create 50,000 rows in the
customer table and associate these with 200,000 orders and 2,000,000 rows in the
lineitems table.
The fragment of Java code in Figure 12.18 uses the cs.execute() method because
the procedure contains multiple SQL statements, both queries and inserts. None of
the queries returns any data to the caller, so there is no need to associate the state-
ment with a ResultSet.
JDBC
CallableStatement cs = conn.prepareCall("{call
calls
MakeCustomer.Fillcustomer}");
cs.execute();
12.7 Î SUMMARY
ODBC and JDBC offer specific services for connection to a database and translate
local program calls to their specialized and proprietary APIs into remote CLI calls to
the database.
ODBC can be considered as specific to a Windows environment which, given
the spread of Microsoft desktop systems, may not be much of a limiting factor.
However, Windows is available for different CPU architectures, and client
programs that use ODBC must be rewritten and recompiled for those CPUs.
JDBC is only applicable to Java clients. Any machine with a JVM will run the
unamended client. Using a type 4 JDBC driver with such a client will ensure
complete portability. Doubts about the performance of an interpreted language
such as Java are being dispelled by advances in moderately priced hardware.
Multi-tasking More than one user process resident in main memory. Access to
CPU managed by OS scheduler using process contexts
ODBC Microsoft standard API. Converts API calls in the client to CLI
format. Transported to server as RPC calls
12.9 Î EXERCISES
FURTHER PL/SQL
The discussion on PL/SQL was left at the end of Chapter 10, having introduced the
block structure of the language and the various forms this structure may take. The
simplest form is the anonymous block that has no name and is not stored on the
server, although, for convenience, the client may store it in an external file.
Stored procedures and stored functions do have names and are stored as data items
in a schema of the database. A package is also named and retained within the data-
base. It generally consists of several procedures, functions, declared variables and
cursors that share a related purpose and that may be referenced from within the
package according to the scoping rules.
Triggers are always stored; because their activation relies on database events, it is
essential that they are permanently available whenever the event occurs.
This chapter completes the survey of PL/SQL programming structures and tech-
niques, illustrating more complex use of the language elements and introducing
further constructs which give the seasoned programmer full control over the data-
base operation.
All of the PL/SQL introduced here and in the previous discussion may be written,
tested and stored in the database to create a library of standard routines. These are
accessible by the front-end applications developer and offer an economic, reliable
and powerful bank of reusable code.
One particular situation in which multiple trigger types are essential is in avoiding
the so-called mutating table problem. This problem generally occurs when a row-
279
280 SYSTEMS BUILDING WITH ORACLE
level trigger attempts to reference a data item in the trigger’s related SQL table
through a select statement rather than from the before-image or after-image of the
current row.
A row-trigger is created to assure a complex legal contract drawn up between a
club and its leading goal scorer. It ensures that he continues to be the league’s
highest paid player whenever the salary of any other league player is increased, or
any new player is registered in the league. This causes a mutating table problem to
occur. The update or insertion of other rows in the contracts table fire the trigger
but the trigger seeks to update the leading goal scorer’s contract salary not the
player’s contract responsible for the event. An example of such a trigger is given in
Figure 13.1. Its associated output, following an appropriate update request, is
demonstrated in Figure 13.2.
DECLARE
v_max_sal NUMBER;
v_ufa_id CHAR(8);
BEGIN
SELECT ufa_id, salary INTO v_ufa_id, v_max_sal
FROM Contracts
WHERE salary = (SELECT MAX(salary) FROM Contracts
WHERE Contracts.left IS NULL);
IF :NEW.salary >= v_max_sal THEN
DBMS_OUTPUT.PUT_LINE('Player ' || v_ufa_id ||
' requires a salary in excess of ' || :NEW.salary ||
' to ensure his contractual advantage!');
END IF;
END contracts_biu_trigger;
In the vast majority of cases, a mutating table problem is the result of a row-level
trigger attempting to read or update the table that is the subject of the triggering
event. In Figure 13.1, an attempt is made to read the contents of the contracts table
from within a row-level trigger. This trigger is itself activated via inserts or updates
to that table. Such an operation is forcibly aborted by the DBMS in an attempt to
maintain data integrity.
The reason for such an apparently over-restrictive exclusion is that DML opera-
tions often affect more than a single row of a table, and, as such, require an exclusive
FURTHER PL/SQL 281
update contracts
set salary=salary*1.1;
update contracts
*
ERROR at line 1:
ORA-04091: table SB.CONTRACTS is mutating, trigger/function may not
see it
ORA-06512: at "SB.CONTRACTS_BIU_TRIGGER", line 6
ORA-04088: error during execution of trigger
'SB.CONTRACTS_BIU_TRIGGER'
lock on the table until the operation has been fully completed. To read the affected
table with a row-level trigger would potentially expose a partially updated view of
the table, and in doing so would break the isolation requirement of transactional
processing.
There is a clash of operational granularity between the conflicting parts of the
trigger. Statement-level triggers are not affected by such restrictions, in that they
either occur before an exclusive lock is taken out by the DML triggering operation,
or after all rows have been suitably altered by the operation and the exclusive lock
released.
The solution to the salary problem is to use a combination of before and after
statement-level triggers, a row-level trigger and a package. The syntax of the
before-statement-level trigger is given in Figure 13.3. It finds the current highest
paid league player, and, using the mutating_util package, records details of that
player’s current contract in an in-memory PL/SQL table.
The before row-level trigger, documented in Figure 13.4, retrieves details of the
current highest league player from the PL/SQL table, as saved in the before state-
ment-level trigger, and compares that maximum salary with the new updated
salary of the player whose contract is currently being processed.
If the previous maximum salary is no longer the maximum, and the player whose
salary is in excess of that maximum is not the leading goalscorer whose contract
must be honoured, then a flag is set in the PL/SQL package to indicate that an
adjustment will be required to the salary of the leading goalscorer.
Figure 13.5 shows the after statement-level trigger that will be fired when all asso-
ciated inserts or updates have completed. This trigger checks, via a call to the
package, whether the previous maximum salary has been exceeded, and if it has,
calculates the salary that the leading goal scorer will have to receive if his contract
is still to be honoured.
The package used by the various triggers is documented in Figure 13.6. Procedure
putrow is used by the before-statement trigger to store the salary of the highest paid
player prior to the DML update.
The procedure setflag is used by the row level trigger to set a Boolean variable if
this salary will need adjustment. It uses the after image of the update from the
transaction log rather than a separate query to determine this. Because of this, it
cannot cause a mutating table error. The after image is available because the trigger
is part of the same transaction as the update. The transaction isolation principle is
not breached, although the user initiating the update may not be aware of the
triggers.
FURTHER PL/SQL 283
The after statement-level trigger then uses the function getflag to pick up this
Boolean and the function getrow to return the original highest salary. It finds the
new maximum salary and synchronizes the star player’s salary.
The output produced by executing the original update request is shown in Figure
13.7. The triggering update statement ensures that all players have the same
percentage salary rise; therefore there is no change as to which player receives the
highest salary, and therefore there is no contractual problem to report to the user.
Figure 13.8 shows the output produced by the after statement-level trigger when
a contractual problem does occur due to an update or insert statement changing the
highest-paid league player.
Any trigger that causes a mutating table error will normally have to be broken
down into this kind of interrelated trigger and package structure.
284 SYSTEMS BUILDING WITH ORACLE
PROCEDURE setflag
IS
BEGIN
v_update_flag := TRUE;
END setflag;
UFA_ID SALARY
---------- ----------
VT432167 998250
1 row selected.
UPDATE contracts
SET salary=salary*1.1;
24 rows updated.
UFA_ID SALARY
---------- ----------
VT432167 1098075
1 row selected.
This section considers the remaining basic constructs of the PL/SQL language and
describes a number of advanced features that have recently been introduced into
the language.
UPDATE contracts
SET salary=salary*1.1
WHERE ufa_id != 'VT432167';
22 rows updated.
UFA_ID SALARY
---------- ----------
AT245876 1098075
BT46286 1098075
UT236965 1098075
VT432167 1098075
4 rows selected.
carriage return/line feed. Figure 13.9 shows an example of the use of single-line
comments.
Multiple-line comment blocks begin with a slash-asterisk (/*) and end with an
asterisk-slash (*/). All lines appearing between these, whether valid PL/SQL syntax
or not, will be treat as comments, and as such will not be passed to the compiler.
Figure 13.10 shows an example of the use of multiple-line comments in a PL/SQL
program.
CASE v_position
WHEN 'GOALKEEPER' THEN v_g_count = v_g_count + 1;
WHEN 'DEFENDER' THEN v_d_count = v_d_count + 1;
WHEN 'MIDFIELD' THEN v_m_count = v_m_count + 1;
WHEN 'FORWARD' THEN v_f_count = v_f_count + 1;
ELSE DBMS_OUTPUT.PUT_LINE('No such position allowed');
END CASE;
The variable following the case keyword is referred to as a ‘selector’, and it is the
value of the selector that is matched against the various when constants. Once a
match is made, the statements following the related then are carried out and
control is then passed to the statement following the end case. If when constants are
mistakenly duplicated, only the first is executed. There is no fall through, as is
sometimes the situation with similar expressions in other programming languages.
An alternative format for the case statement is the one shown in Figure 13.12,
where the selector is replaced with a Boolean evaluating expression replacing the
when constants.
CASE
WHEN v_position = 'GOALKEEPER' THEN v_g_count = v_g_count + 1;
WHEN v_position = 'DEFENDER' THEN v_d_count = v_d_count + 1;
WHEN v_position = 'MIDFIELD' THEN v_m_count = v_m_count + 1;
WHEN v_position = 'FORWARD' THEN v_f_count = v_f_count + 1;
ELSE DBMS_OUTPUT.PUT_LINE('No such position allowed');
END CASE;
PL/SQL also allows the use of a modified case statement as an assignment expres-
sion. Figure 13.13 illustrates the use of such a case assignment.
v_position :=
CASE i_selection
WHEN 1 THEN 'GOALKEEPER'
WHEN 2 THEN 'DEFENDER'
WHEN 3 THEN 'MIDFIELD'
WHEN 4 THEN 'FORWARD'
ELSE 'NOT ASSIGNED')
END CASE;
PL/SQL offers the for-loop and the while-loop as alternatives to the more basic loop
iteration expression.
The for-loop allows for a fixed number of iterations as opposed to the infinite iter-
ations of the standard loop construct. An example of the for-loop statement was
used in the procedure matchdates in Figure 10.55, an extract of which is shown in
Figure 13.14.
FURTHER PL/SQL 289
PL/SQL offers three constructs that can contain and manipulate single-
dimensioned collections of data. The first of these is known as an associative
290 SYSTEMS BUILDING WITH ORACLE
CASE v_position
WHEN 'goalkeeper' THEN NULL;
WHEN 'defender' THEN v_d_count = v_d_count + 1;
WHEN 'midfield' THEN v_m_count = v_m_count + 1;
WHEN 'forward' THEN v_f_count = v_f_count + 1;
ELSE DBMS_OUTPUT.PUT_LINE('No such position allowed');
END CASE;
array, although these have previously been referred to as both PL/SQL tables and
index-by tables. The second collection type is a nested table, and the third is called
a variable-size array, although this name is commonly shortened to varrays. This
section will present a brief description of each of these three collection types.
Figure 10.52 first introduced the concept of a PL/SQL table as a variable used to
hold an indexed set of elements in memory. An associative array is a simple exten-
sion of the PL/SQL table structure with the addition of two new types of indexing.
DECLARE
TYPE associative_array_type1 IS TABLE OF DATE
INDEX BY PLS_INTEGER;
matchdates_table associative_array_type1;
TYPE associative_array_type2 IS TABLE OF Clubs%ROWTYPE
INDEX BY VARCHAR2(25);
club_table associative_array_type2;
v_date DATE := '14-AUG-2003';
BEGIN
Select * INTO club_table('Beauty United')
FROM Clubs WHERE Clubs.name = 'Beauty United';
FOR v_week IN 1..38 LOOP
matchdates_table(v_week):= v_date;
v_date := v_date + 7;
END LOOP;
END;
Originally, a PL/SQL table had only the option to index by binary_integer; this is
now extended to include indexing by pls_integer and varchar2.
A pls_integer is a more efficient implementation of a signed integer type, and in
general, acts as a substitute for the binary_integer type. A varchar2 index allows PL/
SQL tables to be referenced by character data, particularly useful if creating a
virtual collection based on a character-keyed SQL table. Associative arrays are
sparse collections; elements are created as necessary without maintaining any pre-
defined order between elements. They are specific to PL/SQL, and cannot be used as
column data in an SQL table. Figure 13.17 shows examples of the definition and
population of two separate associative arrays.
PL/SQL nested tables are similar to associative arrays except that they can be used as
column data within SQL tables and can thus be made permanent. However, unlike
associative arrays, the indexing of nested tables is non-definable, and uses a fixed
sequential integer index. Nested tables require the use of special constructor opera-
tions in order to be given an initial set of elements. Accessing an element is also
invariably sequential, as direct access would rely on the element’s subscript having
a direct equivalence to the element’s data values.
A nested table, even though it is initially constructed as a dense set of elements, is,
in fact, also a sparse collection. Individual elements can be physically deleted
without regaining space or index keys. Figure 13.18 shows an example of a PL/SQL
anonymous block that defines, populates and accesses a nested table collection.
The constructor of a nested table is a function based on the declared type of that
table. In the declaration section of Figure 13.18, bankhols is constructed using its
type, dates_table_type, as its constructor. As there is not an element list associated
with this declaration statement, bankhols will initially be constructed as a null
element nested table. That initial declaration is then overridden in the execution
section of the PL/SQL block.
The bankhols collection is reconstructed using the dates_table_type constructor
with a fully populated set of DATE elements as its input parameters. An element
(i.e. a row) of a PL/SQL nested table is accessed via its subscript within the collec-
tion set. In Figure 13.18, bankhols(i) refers to element i of the collection set
bankhols.
PL/SQL collections have a set of predefined methods associated with them. These
methods are used to query the collection set and to extract specific elements or
information pertaining to the collection as a whole. Section A.8 presents a list of
these methods together with a brief description of their purpose.
The Figure 13.18 example uses the COUNT method in order to find the maximum
number of elements that need to be processed.
292 SYSTEMS BUILDING WITH ORACLE
DECLARE
TYPE dates_table_type IS TABLE OF DATE;
bankhols dates_table_type; -- nested table is NULL
v_club VARCHAR2(25);
v_match_date DATE;
v_count NUMBER := 0;
CURSOR c_clubs IS
SELECT name FROM CLUBS;
CURSOR c_matches IS
SELECT matchdate FROM MATCHES
WHERE homeclub = v_club or awayclub = v_club;
BEGIN
-- Constructor used to populate nested table with elements
bankhols := dates_table_type ('01-JAN-2004', '09-APR-2004',
'12-APR-2004', '03-MAY-2004', '31-MAY-2004', '30-AUG-2004',
'25-DEC-2004', '26-DEC-2004', '27-DEC-2004');
OPEN c_clubs;
LOOP
FETCH c_clubs INTO v_club;
EXIT WHEN c_clubs%NOTFOUND;
OPEN c_matches;
LOOP
FETCH c_matches INTO v_match_date;
EXIT WHEN c_matches%NOTFOUND;
FOR i IN 1..bankhols.COUNT LOOP -- COUNT is a collection method
IF v_match_date = bankhols(i) THEN -- elements are accessed
v_count := v_count + 1; -- via their subscripts
EXIT;
END IF;
END LOOP;
END LOOP;
CLOSE c_matches;
DBMS_OUTPUT.PUT_LINE
(v_club || ' plays matches on ' || v_count || ' bank holidays');
END LOOP;
CLOSE c_clubs;
END;
13.3.3 VARRAYS
PL/SQL varrays are somewhat similar to PL/SQL nested tables in that: they both can
be used as elements within an SQL table; require constructors; and elements of
each are referenced by their positional subscripts. However, varrays differ in several
important aspects from nested tables: they always contain a dense set of elements. A
nested table can have any of its elements deleted; varrays are stored directly as in-
line objects within an SQL table, whereas nested tables are stored indirectly via an
association to a system generated store table. Varrays, unlike nested tables, main-
tain their element order when transferred between PL/SQL and SQL. In essence,
varrays operate like arrays and nested tables operate like sets.
Figure 13.19 shows the simple change necessary to convert the PL/SQL nested
table example of Figure 13.18 into a working PL/SQL varray example.
DECLARE
TYPE dates_table_type IS VARRAY(10) OF DATE;
bankhols dates_table_type; -- VARRAY is NULL
...
BEGIN
...
END;
The major change to the collection type declaration is that it now requires a
dimension to indicate the initial maximum number of elements that can be held
within it. In the example, the dates_table_type is declared as holding a maximum of
10 elements. Each element of the bankhols varray is then initiated to null. If the size
of the varray had been set to less than eight (the number of dates actually assigned),
then the error shown in Figure 13.20 would occur.
DECLARE
*
ERROR at line 1:
ORA-06532: Subscript outside of limit
ORA-06512: at line 8
The PL/SQL cursors introduced earlier in this chapter are generally all that are
required for most simple PL/SQL applications. There are a number of extensions
that have been created to enable the construction of more complex applications.
These include: the ability to simplify the sequential processing of table rows
retrieved from SQL; the use of run-time definable cursors, known as cursor vari-
ables; and the ability to retrieve cursors as part of an attribute list of another PL/
SQL cursor, known as a cursor expression.
Figure 13.21 demonstrates the declaration and usage of a simple cursor. The cursor
is statically defined to represent a memory structure that will contain the result set
of a predetermined select statement. To process this result set it is first necessary to
open the cursor, then serially fetch each row, and finally to remove the cursor from
memory by issuing a close statement against it.
DECLARE
CURSOR c_club IS
SELECT * FROM clubs;
v_club Clubs%ROWTYPE;
BEGIN
OPEN c_club;
LOOP
FETCH c_club INTO v_club;
EXIT WHEN c_club%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(v_club.name);
END LOOP;
END;
/
Open, fetch until end and close can be considered a standard pattern of cursor
operation, and is indeed all that is required of many cursors. Based on this knowl-
edge a simpler mechanism for the serial processing of a cursor has been created,
and this is demonstrated in Figure 13.22.
The return type of the fetch is implicitly declared to be a record structure that will
contain attributes equivalent to those found in the cursor’s select clause. This
cursor record is automatically assigned the name immediately following the for
clause of the cursor for-loop. By using the name of the attribute scoped by the cursor
record’s name, e.g. c_club_rec.name, it is possible to access attributes of the fetched
FURTHER PL/SQL 295
DECLARE
CURSOR c_club IS
SELECT * FROM clubs;
BEGIN
FOR c_club_rec IN c_club LOOP -- Cursor FOR Loop
DBMS_OUTPUT.PUT_LINE(c_club_rec.name);
END LOOP;
END;
row. However, such row attributes can only be referenced within the limited scope
of the loop.
Cursor for-loops can be used with implicit cursors as well as explicit cursors. This
ability further reduces the amount of coding required to handle simple serial
processing of select statement result sets. Figure 13.23 shows the previous example
rewritten as an implicit cursor.
BEGIN
FOR c_rec IN (SELECT * FROM clubs) LOOP
DBMS_OUTPUT.PUT_LINE(c_rec.name);
END LOOP;
END;
/
A cursor is an in-line memory structure that holds the result set of a select opera-
tion. A cursor variable contains a pointer to an area of memory that contains a
memory structure that holds such a result set. Such a reference variable is a highly
flexible construct as it can be easily altered to point to any suitable result set. Figure
13.24 shows the use of a cursor variable to process each row of the Clubs table.
A cursor variable type is declared using ref cursor and an optional return type. A
cursor variable is declared as an instance of such a type. The cursor variable is then
assigned to point to a result set generated via an implicit cursor attached to an open-
for statement.
Figure 13.25 shows how easy it is to change the cv_clubs cursor variable to point
to the memory location of different result sets, and then to process those result sets
accordingly. The two FETCH statements in the example use the cv_clubs cursor
296 SYSTEMS BUILDING WITH ORACLE
DECLARE
TYPE cursor_ref_type IS REF CURSOR RETURN Clubs%ROWTYPE;
cv_clubs cursor_ref_type; -- Cursor Variable
v_club Clubs%ROWTYPE;
BEGIN
OPEN cv_clubs FOR SELECT * FROM Clubs;
LOOP
FETCH cv_clubs INTO v_club;
EXIT WHEN cv_clubs%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(v_club.name);
END LOOP;
CLOSE cv_clubs;
END;
variable to access indirectly the memory location of each result set created by the
two implicit cursors.
As cv_clubs is an instance of cursor_ref_type, itself defined as a cursor ref with a
return type of clubs%rowtype, it can be associated with any cursor (implicit or explicit)
that contains a set of Clubs records. This type of cursor variable is referred to as
strongly typed in that it can only reference result sets that match a specific return
type. Cursor variables may based upon cursor ref types that do not have explicit
return types. This allows greater flexibility, but they are considered weakly typed as
they permit the possibility of type mismatch exceptions occurring at run-time.
Cursor variables are primarily of use as in–out parameters to procedures
contained within packages. They can be generically defined in the block that calls
the procedure, dynamically associated with an appropriate select statement in the
procedure’s body, and passed back in the calling block.
DECLARE
TYPE cursor_ref_type IS REF CURSOR RETURN Clubs%ROWTYPE;
cv_clubs cursor_ref_type;
v_club Clubs%ROWTYPE;
BEGIN
OPEN cv_clubs FOR SELECT * FROM Clubs;
LOOP
FETCH cv_clubs INTO v_club;
EXIT WHEN cv_clubs%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(v_club.name);
END LOOP;
CLOSE cv_clubs;
OPEN cv_clubs FOR SELECT DISTINCT C.*
FROM Clubs C, Players P
WHERE C.name = P.club
AND EXISTS (SELECT COUNT(*)
FROM Players
WHERE Players.club = C.name
AND Players.nationality = 'UTOPIA'
HAVING COUNT(*) >4);
LOOP
FETCH cv_clubs INTO v_club;
EXIT WHEN cv_clubs%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(v_club.name);
END LOOP;
CLOSE cv_clubs;
END;
set serveroutput on
DECLARE
TYPE cursor_ref_type IS REF CURSOR;
cv_players cursor_ref_type;
v_club Clubs.name%TYPE;
v_player Players.surname%TYPE;
AC Compassion
Beauty United
----CONSTANTINE
----DICKENS
----DIOCLETIAN
----GREGORY
----HEGEL
----KANT
----MORE
----PAUL
----PLATO
----SOCRATES
----THACKERAY
Bountiful Athletic
Forgiveness FC
Freedom United
Grace Town
...
Triggers, based on DML statements, were introduced in Section 10.8; this section
briefly examines four additional categories of PL/SQL triggers. View-level triggers
substitute alternative, user-defined, processing in place of DML events on views.
Nested table triggers permit the replacement of DML events on individual collec-
tion set elements that form part of an SQL view. Schema-level triggers allow DDL
statements to act as triggering events. Database-level triggers respond to certain
database-level control events.
SET squad_no = 14
*
ERROR at line 2:
ORA-01779: cannot modify a column which maps to a non key-
preserved table
PL/SQL view-level triggers, known as Instead Of triggers, are used to handle the
problems associated with issuing DML statements against SQL views.
As explained in Section 13.1, the reason why DML statements are illegal on a
view’s non key-preserved base table(s) is that the intention as to which of the base
300 SYSTEMS BUILDING WITH ORACLE
1 row updated.
Under the object relational extension contained in SQL:1999 a view can contain a
nested table as a virtual, non-atomic column type, and it is possible to construct a
trigger that responds to DML events at the row-level of that nested table.
As an example, rather than only holding details of a single contract associated
with a particular player, as in the players_v view of Figure 13.28, it is possible to
FURTHER PL/SQL 301
hold details of all the contracts associated with a player in a single view attribute.
Figure 13.29 shows the definition of such a view, and Figure 13.30 the contracts_t
nested table type definition it is based upon.
To reference rows of a nested table, in order to carry out DML operations upon
them, it is necessary to create a temporary table structure in memory to map its
contents. SQL provides the TABLE construct for this purpose. Figure 13.31 shows
the use of such a structure in order to insert a new contract element into the
players_v nested table view.
However, as the contents of a view’s nested tables are not directly updatable via
DML operations on that view, PL/SQL Instead Of triggers have to be constructed to
enable such nested table updates to take place.
Figure 13.32 shows the result of trying to execute the insert of Figure 13.31
directly without a trigger.
Figure 13.34 shows the insert now being intercepted by the new trigger that
successfully carries out the insert on the view’s contracts attribute.
1 row created.
Figure 13.35 shows that a select on the view, players_v yields 1 row. The nested
table, contacts_t is an attribute of the view and has a set of two values for Diocletian.
A more orthodox select on the base table, contracts shows that Diocletian has two
rows in the table.
SURNAME
-------------------------
CONTRACTS(CLUB, JOINED, LEFT, SALARY)
DIOCLETIAN
CONTRACTS_T(CONTRACT_T('Beauty United', '01-AUG-01', NULL, 650000),
CONTRACT_T('Pity City', '01-AUG-98', '31-JUL-99', 20000))
1 row selected.
3 rows selected.
Events include all of the standard SQL DDL statements: a catch-all schema event,
DDL, can also be subscribed to, and is fired whenever any DDL command is issued
against its associated schema.
Figure 13.36 shows a schema-level trigger that prevents the dropping of any
tables associated with Liga Utopia by anyone other than the DBA.
Additional events associated with schema-level triggers are logon, logoff, suspend
and servererror. These events allow the monitoring of schema access and the
logging of any failed attempts at SQL operations against a schema’s data objects.
Figure 13.37 shows the trigger required to record all logons/logoffs from a partic-
ular database schema. A logon trigger can only fire after the user has logged on, and
the logoff trigger can only be fired just before the user has logged out. Figure 13.38
provides the code for a trigger that records all attempts to insert rows with dupli-
cate primary keys into any SQL table belonging to a particular schema.
Database-level triggers are those that are concerned with the whole of the database
rather than just a specific schema belonging to that database. Events associated
with database-level triggers are logon, logoff, servererror, startup and shutdown. The
first three event types are the same as their schema-level counterparts, except that
they are not restricted to the event occurring on a particular schema of the
database.
The last two database event types occur after the database has been successfully
started, or just before the database is shutdown. Database-level triggers facilitate the
306 SYSTEMS BUILDING WITH ORACLE
monitoring of database access and allow for the logging of any failed attempts at
SQL operations against any of a database’s objects. Figure 13.39 shows an example
of a pair of startup and shutdown database-level triggers, together with some test
output.
13.7 Î EXERCISE
(i) Implement the PL/SQL examples shown in this chapter. Create alternative
examples to illustrate your understanding of these PL/SQL features.
CHAPTER 14
IMPLEMENTING IN JAVA
All I know most surely about morality and obligations, I owe to football.
Albert Camus
The initial requirements analysis for the Liga Utopia, using UseCase diagrams, was
introduced in Chapter 2. As well as the UseCases, a class diagram was developed
that was converted to an entity–relationship diagram (ERD). In subsequent chap-
ters, the UFL case study has been used to develop a normalized database implemen-
tation and to illustrate the SQL language, including PL/SQL.
All of this can now be brought together in the development of a first-cut solution
for the UFL. This will be done using an integrated development environment (IDE)
called JBuilder. This tool has been developed by Borland1and is aimed specifically
at software engineering for the Java language. This chapter illustrates the use of
JBuilder Version 7, Enterprise Edition. This version offers a large number of addi-
tional classes to those supplied with the basic Java Development Kit (JDK), avail-
able from the original developers of the Java language, Sun Microsystems Inc.2
An IDE provides facilities for the automatic grouping of program source files in
directories, the generation of code using a visual designer and the maintenance of
relationships between source files, ensuring internal consistency to the project.
There is a make system that ensures that the project is recompiled if a source file has
been changed since the last compilation, a run-time environment and provision
for the configuration of a deployment package when the developed software is
released.
The UseCase diagrams indicated the requirement for a number of interactive
screen forms where UFL staff can enter and maintain data in the database. These
are divided into three main types.
1 http://www.borland.co.uk/
2 http://www.sun.com/ and http://www.java.com/
308
IMPLEMENTING IN JAVA 309
The simplest type is for the input or update of data in a single table. This may be
done with checks on foreign keys, but there is generally only one row displayed in
the form.
More complex is the need to display and maintain a one-to-many relationship in
a form. This would be necessary, for example, in a form to enter results for matches,
noting the goal scorers and the time and nature of their goals. The same would hold
true for a form to record the team selection for each club in a match.
Finally, there are reports that need to generated, often in response to user-sup-
plied parameters that will affect the output of the report. The report forms may
need complex SQL statements and formatted output to printers.
Figure 14.1 shows a structure diagram of the screen forms resulting from inspec-
tion of the UseCase diagrams. The user must first pass through a login form and
establish a valid connection to the database. Thereafter each of the three form
types, Tables, Relationships and Reports, is available for use. Within each category,
detailed forms are provided for the individual tasks required in the specification.
This will form the basis for the design of the UFL application.
Log in
Tables
Players
Clubs Reports
Discipline
Results
League table
Relationships
Matches
Team sheets
Contracts
In order to give an insight into the programming requirements for JDBC objects,
the simplest forms will be designed in JBuilder using interface components that are
not data-aware. The database connection and the various SQL statements will be
added manually to the Java code generated by the interface designer. The more
complex relationship forms and the report forms will be created using the data-
aware components available in the Enterprise Edition.
310 SYSTEMS BUILDING WITH ORACLE
The first task in the construction of the UFL application is to create a project.
Selecting File |New Project on the JBuilder menu starts the Project Wizard. Its first
screen is illustrated in Figure 14.2. The project name is ufl and this has been trans-
lated into a directory under the user’s home where all the subdirectories belonging
to the project will be kept. The next screen allows the user to set project paths; the
default values are normally accepted. The third screen allows the insertion of
project text such as the author’s name, project description and copyright notice
that will be embedded in all generated code.
Once the project has been created, the application wizard can be used to generate
the outline Java code for the ufl application. Select File |New and choose the appli-
cation icon. The wizard’s first screen is illustrated in Figure 14.3. The package (or
class hierarchy) has been named ufl and the main application class will be UFLApp.
The next screen (Figure 14.4) shows how to specify the principal frame within
the application. When UFLApp is run, it will instantiate this frame class. UFLFrame1
will automatically register with the Windows system, taking the display properties
set by the user and adding standard frame components (glyphs) allowing for mini-
mizing, maximizing and closing the window. Registering with Windows will also
allow events from the keyboard or the mouse to be communicated to the frame.
Check the boxes to generate a menu bar, a status bar and an about box.
IMPLEMENTING IN JAVA 311
The third step is to create a run-time configuration (Figure 14.5). This will even-
tually contain a detailed specification for any special conditions required when the
application is compiled and run. To start with, this specifies the source files needed
to make the project and, because it is an application, that it should be run in a stan-
dard Java Virtual Machine.
The application wizard generates the Java code for the classes specified. Figure 14.6
shows the resulting JBuilder screen. The window is divided into three panes with a
toolbar and a menu bar. The top left pane is the project pane and shows the files
associated with the project. Each of the three classes created by the wizard, UFLApp,
UFLFrame1 and UFLFrame1_AboutBox, has a separate Java file.
The structure pane is at the bottom left and shows details of the UFLFrame1 class.
The methods and attributes of UFLFrame1 are listed here. If the code is edited, any
errors revealed by JBuider’s interactive syntax checker will also be shown in this
pane.
The large pane on the right is the content pane. It has six tabs that enable a dif-
ferent type of editor to be applied to the class. Each editor shows a different view.
The figure shows the Source editor, and the Java code generated by the wizard for
the UFLFrame1 class is displayed.
Selecting the UML tab reveals a class diagram for UFLFrame1, showing the class
from which it is derived and the classes associated with its attributes. Figure 14.7
IMPLEMENTING IN JAVA 313
shows this diagram and also that the structure pane has changed to show the class
inheritance, associations and dependencies of the components that go to make up
UFLFrame1
The Design tab reveals the visual designer. At its centre is the current UFLFrame1.
Above this is a tabbed palette of the various components that can be added to the
design.
To the right is the property inspector, which allows values for the visual or event
properties for a selected component to be inserted or amended. Again, the structure
pane changes to show a tree diagram of the components of the frame
The code generated for the frame and its corresponding visual design are very
basic. The status bar is there but is barely distinguishable because it contains no
default text and has the same background colours as the surrounding frame. The
menu bar is not shown in this diagram since it has a specialist editor.
Figure 14.8 shows the Design view and the result of selecting the status bar and
using its property inspector to add default text and to change the font of that text
and its foreground and background colours. Any component can be selected either
by clicking the mouse button directly on it in the Design editor or by clicking on
314 SYSTEMS BUILDING WITH ORACLE
the component in the structure diagram on the left. The values in the property
inspector are changed by typing the value, by selecting it from a drop-down list or
by choosing values from a pop-up dialog. Type the word Status: in the text field of
the status bar’s property inspector; choose a background and a foreground colour
from the drop-down lists in those fields; and specify the font from the pop-up
dialog which appears when the mouse is clicked in the font field.
Selecting Run|Run Project from the JBuilder menu causes all the Java files to be
compiled and the resulting class files to be stored in the correct subdirectory as
specified in the project. The UFLApp class is then launched in a Java Virtual
Machine. UFLApp immediately instantiates UFLFrame1.
Figure 14.9 shows the running application. UFLFrame1 has registered with the
Windows system and has the normal minimize, maximize and close glyphs. The
frame can be resized by dragging the edges with the mouse. The status bar with its
default text is now clearly visible and the menu bar with two options, File and Help
is displayed.
IMPLEMENTING IN JAVA 315
The menu bar works. Select Help|About. The About box pops up with information
about the application, copied from the project setup. Select File |Exit: the applica-
tion closes.
Notice that a fourth pane has opened in the main JBuilder screen. This is the
message pane. It displays error messages from the compiler, run-time errors from the
JVM and so on. Once the application has exited and messages have been reviewed, it
can be closed with the X glyph (the close box) to give more screen space to the
content pane. Closing it while the application is active kills the running application.
The process of adding the screen forms detailed in Figure 14.1 can now begin. Each
of these forms will consist of a number of basic components, such as labels,
textfields, comboboxes and buttons.
A label is used to place some text on the form to indicate the purpose of an inter-
active component. A textfield is an interactive component that allows the user or
the program to enter and edit text. A button component is capable of detecting a
316 SYSTEMS BUILDING WITH ORACLE
mouse click and then performing some action that has been programmed into it. A
combobox has a drop-down list of available choices and allows the user to select
one of them by clicking with the mouse. The selected item is placed in the selection
textfield. The combobox is a combination of a button, a listbox and a textfield.
Figure 14.10 illustrates these four types of components taken from the Swing
palette in JBuilder’s design editor.
In order to present an orderly user interface, these components must be grouped
and some form of layout imposed. A Java container is capable of holding these
simple components and has a layout manager associated with it.
Placing the components for the Login form in a container will segregate them
from the components of the Players form, which can also be placed in a separate
container, and so on for each of the forms. Each of the containers can have a
different layout manager.
There are several classes in the Swing Containers tab of the palette. The UFL appli-
cation will use two of them. The simplest is the JPanel. This is a simple rectangle.
The contentPane, which was added automatically to the application when it was
IMPLEMENTING IN JAVA 317
has been added. Using a stack of containers in this way allows a variation in layouts
as different functions of the program are activated.
Add a JPanel to appPanel and rename it Login. Add three JTabbedPanes to appPanel
and rename them Tables, Relationships and Reports respectively. Select Tables in the
component tree add three JPanels to it. Rename them Players, Clubs and Discipline
respectively. Select Relationships and add three JPanels. Rename them matchResults,
Teamsheets and Contracts. Select Reports and add two JPanels. Rename them Results
and leagueTables.
For all theses added components use the property inspector to set opaque to True,
their layout to null and their constraint (the tab entry ) the same as their Name. The
main display framework of the UFL application is now complete and the current
design is shown in Figure 14.12.
Notice the component tree. Make sure that Login, Tables, Relationships and
Reports are at the same level and directly connected to appPanel. The property
IMPLEMENTING IN JAVA 319
inspector for Tables is also shown. Attributes for Name, constraints, layout and
opaque have been set. The constraints appear as the titles of the tabs.
Before the detailed design of the forms can begin, there is an important task to
perform. The JDBC libraries must be made available to JBuilder. The file
classes12.zip is part of the Oracle installation and contains the JDBC version 1.2
class library. Find the path to this file in your installation or else download a copy
from the Oracle web site. Select Tools|Configure Libraries from the JBuilder menu
and the library wizard is started. Select the Add button at the bottom of the left
pane and use the New Library dialog to install the fully qualified path to
classes12.zip as a library entry called OracleJDBC in the JBuilder library path. Figure
14.13 shows this dialog.
320 SYSTEMS BUILDING WITH ORACLE
Figure 14.14 shows the design of the Login form. It consists of four JLabels that
contain string constants set in the text attribute of their property inspectors.
There are two JTextfields, where the user will enter a user name and a service, and
a JPasswordField, where the password will be entered. This latter component is a
special form of the JTextField; whatever is entered is masked with asterisks for
added security. There are two JButtons: Clear and Connect.
The components have had their foreground colours, default text, font sizes,
styles and text alignments set in the property inspectors. Their position and sizes
on the panel can be set by dragging with the mouse. Selecting more than one
component at a time can be done by holding the space key and left-clicking. The
property inspector for multiple selections shows the common attributes that can
be set for the selection. Right-clicking the selection exposes the alignment pop-up.
The selection can be lined up vertically and horizontally, spaced evenly and made
the same size.
Once the connection to Oracle has been made, the user has no way to expose the
Players form and get on with some work. Additional menu items must be placed in
the menu bar to permit this.
IMPLEMENTING IN JAVA 321
Menus have a special design editor (Figure 14.15). Click on the design tab in
JBuilder’s editor pane and then double click on JMenuBar1 in the component tree.
The menu designer is opened.
There is a placeholder for an additional item at the end of the existing menu
items. Double click on this and give it the name Database. Use the down arrow to
enter sub-menu items: Tables, Relationships and Reports.
Each of these will be used to activate the appropriate JTabbedPanel. Drag and
drop the Database item to place it next to the File item. Set its enabled attribute to
false. Until the connection to Oracle is made, the user is prevented from accessing
this menu item.
322 SYSTEMS BUILDING WITH ORACLE
Clicking on the Login form’s Connect button in the running application has abso-
lutely no effect, as the button cannot detect the click and has no response
programmed into it. In order to detect events like mouse clicks and key presses, the
component must have a listener installed in it. Fortunately, this is relatively simple
and involves using the events tab of the button’s property inspector. Select the
Connect button in the design editor and the event tab in the property inspector.
Double-click in the mouseClicked attribute. The editor moves automatically to the
Source pane, revealing the Java code it has generated (Figure 14.16). In fact, it has
created two code fragments. One creates a mouseClicked listener and the second an
empty action method called ConnectmouseClicked. The listener is specific to the
mouse click event and does not, for example detect the mouse cursor passing over
the button. The action method is installed in the listener, and whenever the
Connect button is clicked the action method will be called.
There are two things to do before Java code to create a connection to Oracle can
be written in the action method. First, although JBuilder has been made aware of
IMPLEMENTING IN JAVA 323
action method
Connect.addMouseListener(new java.awt.event.MouseAdapter() {
public void mouseClicked(MouseEvent e) {
Connect_mouseClicked(e);
}
});
...
void Connect_mouseClicked(MouseEvent e) {
the location of the JDBC class library, the program must be made aware of those
classes. Add a line to the import statements at the top of the file (Figure 14.17).
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import com.borland.jbcl.layout.*;
// add a line to import the JDBC classes
import java.sql.*;
void Connect_mouseClicked(MouseEvent e) {
String connect_string = "jdbc:oracle:thin:@loopback:1521:";
try {
Class.forName("oracle.jdbc.driver.OracleDriver");
conn=DriverManager.getConnection(connect_string
+Service.getText(),
userName.getText(),
String.valueOf(passWord.getPassword()));
statusBar.setText("Status: Connected to Oracle");
jMenu1.setEnabled(true);
}
catch(Exception f){
statusBar.setText("Status: Not Connected to Oracle");
System.out.println(f.getMessage());
}
}
failure or an SQL error. If an exception is raised in the try section, control falls
straight through to the catch section with the exception as a parameter.
If all goes well, statusBar will have its contents changed to show that a connec-
tion has been made, the Database menu item will be enabled and control will jump
to the end of the catch section. In this case, that is the end of the method.
If an exception is raised, the catch section changes statusBar to show a warning
and the exception’s message is printed in the JVM’s monitor pane.
The Clear button can now have a listener installed and some very simple code can
be written for the action method (Figure 14.20).
Figure 14.21 shows the running application. The connect button has been
clicked and the status bar confirms that the connection to Oracle has been made.
The Database menu item is enabled.
IMPLEMENTING IN JAVA 325
void Clear_mouseClicked(MouseEvent e) {
userName.setText("");
passWord.setText("");
Service.setText("");
}
Listeners must now be installed in the three sub-menu items with action
methods to instruct appPanel’s layout manager to show the corresponding Tables,
Relationships or Reports tabbed panels. For these sub-menu items, the event to be
detected is actionPerformed.
Figure 14.22 shows the code inserted into the action method created for
jMenuItem1 when the actionPerformed event is selected for detection in the property
inspector. The Tables tabbed panel is brought to the top of the stack of panels by
the CardLayout.show() method. The switch statement detects which of Tables’ three
panels, Players, Clubs or Discipline, is currently selected, and performs setup action
accordingly.
326 SYSTEMS BUILDING WITH ORACLE
void jMenuItem1_actionPerformed(ActionEvent e) {
cardLayout1.show(appPanel,"Tables ");
switch (Tables.getSelectedIndex()){
case 0: setupPlayers();
break;
case 1:setupClubs();
break;
case 2:setupDiscipline();
break;
default:break;
}
}
void setupPlayers(){
statusBar.setText("Status: Maintain Players Table ");
Save.setEnabled(false);
String query= "select ufa_id from players";
try {
Statement stmt1=conn.createStatement();
ResultSet rs=stmt1.executeQuery(query);
playerSelect.removeAllItems();
while (rs.next())
playerSelect.addItem(rs.getString(1));
}
catch(Exception f){
System.out.println(f.getMessage());
}
}
void Tables_stateChanged(ChangeEvent e) {
switch (Tables.getSelectedIndex()){
case 0: setupPlayers();
break;
case 1:setupClubs();
break;
case 2:setupDiscipline();
break;
default:break;
}
}
void playerSelect_actionPerformed(ActionEvent e) {
String query1=" select * from players where ufa_id =?";
try {
PreparedStatement stmt2=conn.prepareStatement(query1);
stmt2.setString(1,(String)playerSelect.getSelectedItem());
ResultSet rs=stmt2.executeQuery();
if (rs.next()){
Surname.setText(rs.getString(2));
Forename.setText(rs.getString(3));
DoB.setText(rs.getDate(4).toString());
Squad_No.setText(rs.getString(5));
Position.setText(rs.getString(6));
Nationality.setText(rs.getString(7));
Club.setText(rs.getString(8));
statusBar.setText("Status: Player selected. Edit fields ");
Save.setEnabled(false);
}
}
catch(Exception f){
System.out.println(f.getMessage());
}
}
The statement is then executed, assigning its returned row to the ResultSet, rs. If rs
holds a row, its columns are assigned to the textfields in the Players panel. The date
of birth needs special processing. It is extracted from rs as a Date object. The Date
IMPLEMENTING IN JAVA 329
class has a method toString that converts a Date into a String and formats it as yyyy-
mm-dd.
The Clear2 button re-establishes the Players form. If the user gets into a tangle
whilst editing a player’s attributes, clicking on the button triggers a mouseClicked
event. Add the event listener for the Clear2 button and call setupPlayers from the
action method (Figure 14.27). This reinitializes the form.
void Clear2_mouseClicked(MouseEvent e) {
setupPlayers();
}
The Save button must be enabled when editing activity is detected in any of the
textfields. Install a keyPressed event listener in each of the editable fields. A sample
of the single line of code for each of the action methods is shown in Figure 14.28.
void Surname_keyPressed(KeyEvent e) {
Save.setEnabled(true);
}
With the Save button enabled, code must be written for its mouseClicked event
handler. Install the appropriate event listener – the code for its action method is
shown in Figure 14.29.
The only thing left to do now in the Players form is to implement the New button
to create a new player. The intention is that clicking this button will create a pop-
up dialog where the new player’s attributes may be edited. Inserting a new player
also means that an initial contract must be created.
The newPlayer dialog will also demonstrate further use of the combobox to vali-
date user inputs and a method of creating a randomized ufa_id that is guaranteed to
be unique.
The newPlayer dialog will be a separate class from UFLFrame and will be created using
the dialog wizard. Click File |New on the main JBuilder menu and double-click the
dialog icon. The wizard is started. Change the class name to newPlayer and click Next.
The outline Java program for the dialog is created and added as a project file.
330 SYSTEMS BUILDING WITH ORACLE
void Save_mouseClicked(MouseEvent e) {
String query= "update players set surname=?,
forename=?,dateofbirth=?,squad_no=?,
position=?,nationality=? where ufa_id=?";
try {
PreparedStatement stmt=conn.prepareStatement(query);
stmt.setString(1,Surname.getText());
stmt.setString(2,Forename.getText());
stmt.setDate(3,java.sql.Date.valueOf(DoB.getText()));
stmt.setString(4,Squad_No.getText());
stmt.setString(5,Position.getText());
stmt.setString(6,Nationality.getText());
stmt.setString(7,(String) playerSelect.getSelectedItem());
stmt.executeQuery();
Save.setEnabled(false);
}
catch(Exception f){System.out.println(f.getMessage());}
}
The design process for the dialog is similar to that followed for the various forms
in UFLFrame1. The layout is shown in Figure 14.30.
Double click on the file newPlayer.java in the project pane. The design tab shows
a blank rectangle. Add a JPanel to UI-this in the component tree. Set its layout
manager to null or to XYLayout. This panel will be the principal container for the
dialog. Start by copying all the components; labels, buttons, textfields and
comboboxes from the Players form to panel1 in the dialog. Use a multi-select copy
and paste. The components will be copied with the same names and visual proper-
ties. These will be reflected in the source code for the dialog. Duplicated compo-
nent names can be used because the dialog is a different class from UFLFrame1.
Delete the Position, Nationality and Club textfields. Replace them with
JComboBoxes and rename the new components accordingly. Delete the playerSelect
combobox and replace it with a JTextField. Rename this to Ufa_ID. Add text fields
startDate and Salary with labels. Add labels for the two sections of the form. The
two etched border boxes that group the components are two overlaying trans-
parent panels. There are three buttons: Save, Generate and Cancel.
The source code generated by the dialog wizard now needs several amendments
to integrate newPlayer into the UFL application. First, two import statement must be
added so that the dialog can use JDBC and the random number generator in the
java.util library (Figure 14.31).
Second, the constructors generated for the dialog must be changed. Because
newPlayer is a separate class, the connection to Oracle will not automatically be
available in the scope of the class. It can, however, be passed as an additional
IMPLEMENTING IN JAVA 331
parameter to the constructor. There are two constructors for newPlayer, following
the inheritance rules for the JDialog class.
The first takes three parameters, the dialog’s parent, its title and whether it is
modal3 or not. The constructor then goes on to call the constructor for its super-
class, passing on the parameters it has received. Add a fourth parameter, as shown
3 Modal: the frame or dialog must be closed before any other window in the application can
be made active. User focus is captured.
332 SYSTEMS BUILDING WITH ORACLE
public newPlayer() {
this(null, "", false, null);
}
in Figure 14.32. This will not be passed on to the superclass and has scope only
within the constructor.
The secondary constructor allows the dialog to be created with no parameter
values. Default values are sent in a call to the primary constructor. Add a null
default, to harmonize with the signature of the prime constructor, which now
requires four input parameters. The prime constructor will be called from
UFLFrame1, passing the value of the existing connection object.
Because the scope of the parameter is limited to the constructor, its value is
immediately transferred to conn1, a global Connection class attribute in newPlayer.
Add the declaration for conn1 to the attributes of newPlayer, it will later be used to
create SQL statements in the dialog.
In fact, the connection is used almost immediately. Code the fillFields method
(Figure 14.33) and issue a call to it at the end of the dialog’s jbInit method. The
fillFields method populates the Position, Nationality and Club comboboxes. Using
comboboxes restricts the user to choosing only valid entries for these fields. The
DoB and startDate field are initialized to today’s date. This gives the user a template
to enter these data fields in the correct format. The other textfields on the form are
set to blanks.
The method generateUfa_ID should also be coded (Figure 14.34) and a call to it
inserted just after the fillFields call in jbInit. This method generates the new player’s
ufa_id using a random number generator. There are two for-loops. The first gener-
ates two random alphabetic uppercase characters as the prefix of the ufa_id. The
second generates five numeric characters as the suffix. Each of these characters is
concatenated with the initially empty string, test, in each round of the loops.
IMPLEMENTING IN JAVA 333
void fillFields(){
Position.removeAllItems();
Nationality.removeAllItems();
Club.removeAllItems();
try {
Statement stmt=conn1.createStatement();
String query="select allowed from positions";
ResultSet rs=stmt.executeQuery(query);
while (rs.next()) Position.addItem(rs.getString(1));
query="select distinct nationality from players";
rs=stmt.executeQuery(query);
while (rs.next()) Nationality.addItem(rs.getString(1));
query="select name from clubs";
rs=stmt.executeQuery(query);
while (rs.next()) Club.addItem(rs.getString(1));
query="select SYSDATE from dual ";
rs=stmt.executeQuery(query);
if (rs.next()){
DoB.setText(rs.getDate(1).toString());
startDate.setText(rs.getDate(1).toString());
Surname.setText("");
Forename.setText("");
Squad_No.setText("");
Salary.setText("");
}
}
catch(Exception f){ System.out.println(f.getMessage());
}
}
The first loop, which iterates twice, uses rand, the random number generator,
through its nextInt method, to give a positive number between 0 and 65,000. The
generated numbers should be evenly distributed within the range. The modulus
operator, %, gives the remainder after its first operand is divided by the second
(Figure 14.35).
So, the modulus 26 of the initial random number is calculated. This yields an
even distribution of numbers in the range 0 to 25. The character ‘A’ is 65 in the
ASCII coding scheme; adding 65 to the result of the modulus will give a number
between 65 and 90. This corresponds to the letters ‘A’ to ‘Z’. Casting the number to
a char type, converting it to a String and concatenating it to the test string builds the
prefix of the ufa_id.
334 SYSTEMS BUILDING WITH ORACLE
void generateUfa_ID(){
String test="";
String query="select surname from players where ufa_id=?";
boolean success=false;
int x;
char c;
Random rand= new Random();
PreparedStatement stmt;
ResultSet rs;
while (!success){
test="";
for (x=0;x<2;x++)
test=test +
String.valueOf((char)((rand.nextInt(65000)%26)+65));
for (x=0;x<5;x++)
test=test +
String.valueOf((char)((rand.nextInt(65000)%10)+48));
try {
stmt=conn1.prepareStatement(query);
stmt.setString(1,test);
rs=stmt.executeQuery();
if(!rs.next())
success=true;
}
catch(Exception f){
System.out.println(f.getMessage()); }
}
Ufa_ID.setText(test);
}
29%3=2
28%3=1
27 %3=0
A similar thing is done in the second loop, except that the second modulus
operand is 10 and the added number is 48. This gives a number in the range 48 to
57: the ASCII characters ‘0’ to ‘9’. This loop iterates five times, building the ufa_id’s
numeric suffix.
The string test is then used as a parameter to the SQL query. If the ResultSet is
empty the generated ufa_id does not exist in the database, the variable success is
turned to true and the outer while-loop ends. If the result returns a row, the gener-
ated ufa_id is already assigned to another player. Another key is generated until the
Boolean variable success can be turned to true. The generated ufa_id is then guaran-
teed unique, placed in the textfield and the method ends.
Figure 14.36 shows the dialog running. The size of the pop-up frame can be
adjusted with the preferred size (in pixels) property of the container, pane l1. The
next task is to code action methods for the three buttons on the form.
The Cancel button closes the dialog, returning control to its parent, UFLFrame1.
Create a mouseClicked listener for Cancel (Figure 14.37).
The Generate button calls fillFields and GenerateUfa_ID to reinitialize the form for
a second or subsequent new player, and re-enables the Save button. Install a
mouseClicked listener for Generate (Figure 14.38).
void Cancel_mouseClicked(MouseEvent e) {
this.dispose();
}
void Generate_mouseClicked(MouseEvent e) {
fillFields();
generateUfa_ID();
Save.setEnabled(true);
}
The Save button must initiate two SQL insert statements: one to add a row to the
Players table and one to add a row in the Contracts table (Figure 14.39). It is essential
that if one of them fails, so does the other. There cannot be a player without an
initial contract and there cannot be a contract without a corresponding player.
The answer, of course, is to place both statements in a single transaction. The
connection object, conn1, has a method, setAutocommit(boolean). It is used here to
set Oracle’s autocommit feature off. The first statement will implicitly open the
transaction. There is an explicit commit to end it.
If an exception is raised, control drops through to the catch section. Notice that a
second, nested try/catch structure is needed here to call conn1.rollback(). JDBC state-
ments that have the potential to raise remote or SQL exceptions can only be called
inside a try section that can catch the exceptions.
If the original exception were caused by a network failure, then the rollback
would also fail, raising another remote exception, but the transaction is lost
anyway since the session has ended and autocommit is off. If the original failure
were caused by an SQL error, such as a constraint being disobeyed, the rollback
would succeed. Either way, the integrity of the database is secure.
Return to the Players form and install a mouseClicked listener for the New button
(Figure 14.40).
The action consists of only two lines. First, an instance of newPlayer called dlg is
created. Notice that the connection is supplied as the fourth parameter. The parent
is this, the UFLFrame1, the title is “New Player” and modal is true. The next line
shows dlg. The show method is blocked and does not complete until the dialog is
closed because the dialog is modal.
The Players form and its associated dialog, newPlayer, are now complete.
IMPLEMENTING IN JAVA 337
void Save_mouseClicked(MouseEvent e) {
String query= "insert into players(ufa_id, surname, forename,"+
"dateofbirth, squad_no, position, nationality, club) "+
"values(?,?,?,?,?,?,?,?)";
try {
conn1.setAutoCommit(false);
PreparedStatement stmt=conn1.prepareStatement(query);
stmt.setString(1,Ufa_ID.getText());
stmt.setString(2,Surname.getText());
stmt.setString(3,Forename.getText());
stmt.setDate(4,java.sql.Date.valueOf(DoB.getText()));
stmt.setString(5,Squad_No.getText());
stmt.setString(6,(String) Position.getSelectedItem());
stmt.setString(7,(String) Nationality.getSelectedItem());
stmt.setString(8,(String) Club.getSelectedItem());
stmt.executeQuery();
query= "insert into contracts(ufa_id, club, joined, salary)"+
"values(?,?,?,?)";
PreparedStatement stmt1=conn1.prepareStatement(query);
stmt1.setString(1,Ufa_ID.getText());
stmt1.setString(2,(String) Club.getSelectedItem());
stmt1.setDate(3,java.sql.Date.valueOf(startDate.getText()));
stmt1.setString(4,Salary.getText());
stmt1.executeQuery();
conn1.commit();
conn1.setAutoCommit(true);
Save.setEnabled(false);
}
catch(Exception f){
try {conn1.rollback(); conn1.setAutoCommit(true);
}
catch(Exception d){
System.out.println(f.getMessage()); }
System.out.println(f.getMessage());
}
}
void New_mouseClicked(MouseEvent e) {
newPlayer dlg=new newPlayer(this, "New Player", true, conn);
dlg.show();
}
The matchResults form is used to maintain a 1:N relationship between the Matches
and Goals tables in the database. The result of a match can be inferred by counting
the goals for each team in a match. Because each match will often be associated
with more than one goal, the details of each goal (scorer, time of scoring and goal
type) must be displayed in a multi-row container. Although it is perfectly possible
to program this directly in Java, it is easier to use the data-aware components
supplied as part of the JBuilder Enterprise Edition.
The matchResults form makes use of components from the DataExpress and
dbSwing tabs in the Design editor’s palette. These components rely on a different
database connection from that demonstrated in the Login and Players forms.
Select the matchResults panel in the structure tree and click on the design editor
tab. Click on the Database component. The cursor changes to a crosshair, indi-
cating that the component has been selected. Click anywhere on the editor or
structure panes. The Database component is added to the structure tree in the Data
Access branch. Select the Database component and use its property inspector to
configure it. Change its name to uflOracle and set its connection property as shown
in Figure 14.41. Change the values to conform to your local Oracle installation.
You may need to pre-configure automatic access to Oracle.jdbc.driver.
OracleDriver through the OracleJDBC library tag. Use the JBuilder menu
Tools|Enterprise Setup dialog to do this.
Change the settings in the Database configuration dialog until clicking on the Test
Connection button gives a Success response.
The data-aware components are divided into two types: those that access databases
or text files directly to retrieve data and those that provide a viewport of the
retrieved data in a form. Components in the DataExpress tab are largely of the first
type and components in the dbSwing tab are of the second type
Components that read the database directly are based on the DataSet class. A
tableDataSet reads a complete database table; a queryDataSet reads the rows
returned in a single or multi-table SQL query. The dataset copies the data in the
IMPLEMENTING IN JAVA 339
returned rows into its structure, which can be thought of as a dynamic two-dimen-
sional array. The columns in the dataset conform to the selection set of the query;
each row in the dataset represents a returned row from the query. If the dataset is
updatable, read-intending-write locks are left on the rows in the database; if the
dataset is read-only then read locks (if supported) are left in the database.
The rows in the dataset can be inspected or changed and new rows added
through an appropriate viewport. At some point, the changes and insertions may
be saved or, in java-speak, resolved. The rows in the dataset are copied back into the
database in write operations that replace previous values or add rows. The dataset
identifies the source of each of its cells by its table name, its column name and the
ROWID of the data in the table. This information is established when the dataset is
opened. The dataset can be thought of as a snapshot of the data returned by its
query.
Because the dataset is a client-side memory structure it can be manipulated in
much the same way as an array. It has a cursor, or pointer to the current row, that
has focus through the viewport. The cursor can be moved programmatically to the
first row (using the dataset’s first() method) and each row examined in a loop struc-
ture that moves the cursor to the next row until the end of the dataset. The cursor is
moved forwards using the dataset’s next() method or backwards using the previous()
method. The viewport automates this process to give a visual presentation of the
rows without the need for programmer intervention.
Two Datasets can be linked together in a Master/Detail relationship. This
synchronizes the detail dataset to the master. Whenever a row in the master dataset
340 SYSTEMS BUILDING WITH ORACLE
receives focus, the focus in the detail dataset is moved to a row that matches the
row value equality specified in the relationship.
The matchResults form uses five datasets linked in master/detail relationships to
present the goals scored in a particular match. Figure 14.42 shows the finished
matchResults form.
The datasets in the form are connected to visual components or viewports. Each
dataset is presented through a jdbTable component that shows the rows and
columns from the dataset and also marks the current row through a highlight. The
jdbTable enables each cell to be edited, the changes are reflected in the underlying
dataset: its rows are marked as changed until the changes are resolved (saved) back
to the database.
Each jdbTable is contained in a TableScrollPane that gives a vertical and hori-
zontal scrollable area to display just a few rows from the jdbTable.
In this form, the user selects a particular season from the scroll pane positioned
at the top left of the form. Whenever focus in this component changes, a master/
detail relationship synchronizes the match dates for that season in the scroll pane
to its right (Figure 14.43). Selecting a particular match date in this way triggers a
similar response, again synchronized through a master/detail relationship, to
show the matches played on that particular date. Selecting a match shows the goals
scored by the home team’s players in the scroll pane at lower left and the goals
scored by the away team players in the scroll pane on the right.
IMPLEMENTING IN JAVA 341
homeGoals
awayGoals
Every time the data changes in these last two panes, a component event causes
the match score to be calculated. This calculation allocates each player’s goals or
penalties to his team, while own goals are allocated to the opposing side.
The three scroll panes at the top of the form are read-only or non-resolvable.
Editing fixtures is no business of this form. The two goal panes are fully editable:
existing rows can be changed or deleted and new rows can be added.
Each of these panes has a jdbNavToolBar linked to it. This automates the editing
process. The icons can be clicked to move focus in the pane, add a new row, save
changes or refresh the dataset by taking a new snapshot if edits have got into a
tangle.
Select the editor pane and click on the QueryDataSet component in the DataExpress
tab. The cursor changes to crosshairs. Drop the QueryDataSet on the structure tree.
The dataset is added to the Data Access branch.
Double-click on the dataset to expose the dataset design editor. Change its name
in the property inspector and click on the ellipsis in the query property to expose
the query editor (Figure 14.44).
Specify the uflOracle database connection and make sure that the Execute query
immediately box is checked. The SQL statement can either be typed in directly or it
can be built interactively with the SQL Builder tool. Press the Test Query button
until it executes successfully. Then press the OK button.
Notice that, instead of just selecting all the season values from the Seasons table,
the query joins the Seasons table with the Matches table. The query only gives
seasons that have corresponding matches in the database. The distinct operator
restricts the query so that only one season is returned whether there is only 1
match or 200 matches associated with it.
The dataset editor now shows the columns in the dataset. Set the dataset’s resolv-
able property to false and clear the resolve order property (Figure 14.45).
342 SYSTEMS BUILDING WITH ORACLE
SQL statement
select distinct seasons.season
from a27ws.seasons, a27ws.matches
where seasons.season= matches.season
order by seasons.season;
Column properties are shown in the editor. The list icon enables different
column properties to appear in the editor. The caption and width properties can be
set later when the dataset is connected to a viewport.
The matchDates dataset is established in the same way. Its query is given in Figure
14.46.
The distinct restriction ensure that only one row is returned for each combina-
tion of season and matchdate. Set the other properties for matchDates as for Seasons,
to ensure that it is read-only. The one difference here is that a master/detail rela-
tionship with the Seasons dataset will be set up to synchronize the rows displayed
in matchDates’ viewport (Figure 14.47).
Click on the ellipsis in the MasterLink property of matchDates and the MasterLink
editor pops up. Specify Seasons as the master dataset and the season column of each
dataset as the link columns. Test the link to ensure that it is working and click on
OK. The season column in matchDates is needed to make the link, but it will not be
displayed in the viewport. The dataset hides detail link columns ensuring they are
not editable. All edits and insertions to the detail dataset are kept linked to the
master in this way.
IMPLEMENTING IN JAVA 343
The Matches dataset has a simple query, shown in Figure 14.48. The master/detail
relationship with matchDates is created with the season and matchdate columns as
the link. Because Matches is the detail dataset, these two columns will not appear in
its viewport.
The homeGoals dataset has a rather more complex query. This is shown in Figure
14.49. The Goals table is joined with Teamsheets to retrieve the club that each
goalscorer was playing for at the time of the match.
This will also ensure that any inserted goalscorer was actually playing in the
match. The join with Players is necessary to retrieve the Surname column. This is
344 SYSTEMS BUILDING WITH ORACLE
likely to be a better accuracy check for the user than the ufa_id. The awayGoals
dataset has the same query. Two datasets are needed so that edits and insertions
can make use of independently positioned current row cursors. The homeGoals
dataset is linked to Matches firstly through Match_ID and secondly by a link based
on Matches.HomeClub and homeGoals.Club. Figure 14.50 shows the MasterLink
dialog for homeGoals.
Although the awayGoals dataset has the same underlying query and thus is a
snapshot of the same rows in the database, its master/detail relationship is based
on Match_ID and Matches.AwayClub with awayGoals.Club.
This ensures that the same row snapshots will never be displayed in their respec-
tive viewports and thus updates and insertions will not cause lock conflicts. Since
locks are always at the row level and updates are resolved in the dataset through the
ROWID, this must be the case.
The tabbed panel, Relationships, contains a JPanel called matchResults and this will
be the basis of the results entry form.
Click on the jdbTableScrollPane icon in the dbSwing palette tab and draw it on the
panel. Repeat this four times and arrange the scroll panes as shown in Figure 14.51.
346 SYSTEMS BUILDING WITH ORACLE
The scroll panes are containers, like JPanel. They offer additional functionality, as
their name suggests; vertical and horizontal scroll bars appear when their contents
fill more than the area set for them in the design editor
Click on the jdbTable icon and click inside the first TableScrollPane. Rename the
jdbTable to seasonMaster. Initially, seasonMaster resembles an empty spreadsheet. It
has one column called A and one row called 1. Use the property inspector to assign
the dataset Seasons to seasonMaster. The display changes: seasonMaster displays the
first row of the dataset assigned to it and seasonMaster’s column has acquired a
name from the dataset. Change seasonMaster’s foreground font to serif|18. The text
of the values causes the column to widen, making the column header visible.
The database and dataset components are active in the editor. That is how the
data rows have been displayed. The jdbTable component is not active. Therefore,
the dataset’s focus or current row cannot be changed. The viewport to the dataset is
static. This display of the data is a handy way of checking that the datasets and the
IMPLEMENTING IN JAVA 347
The form for the maintenance of the relationship between the matches and Goals
tables in the database is now largely complete. Using data-aware components has
greatly simplified the creation of a moderately complex form. So far, there has been
no need to change the generated Java code; everything has been accomplished by
setting properties within the components.
void homeGoals_dataChanged(DataChangeEvent e) {
calcScores();
}
void awayGoals_dataChanged(DataChangeEvent e) {
calcScores();
}
The calcScores method is listed in Figure 14.54. Its basic strategy is to loop
through all the rows of the homeGoals dataset, allocating goals to counters for
either the home side or the away side. At the end of the loop the counters contains
the score, which can be copied to the form.
There are two problems with this. Firstly, look at the query for the dataset; it
contains all the goals for all the matches. The restriction to a particular match is
provided, through the master/detail relation, only in its viewport: the jdbTable.
There will have to be some sort of filter applied to the dataset as the loop progresses
through its rows so that only goals for the match in question are processed for the
score.
IMPLEMENTING IN JAVA 349
void calcScores(){
int homeCount=0;
int awayCount=0;
BigInteger m_id;
DataRow club = new DataRow(Matches);
Matches.getDataRow(club);
homeClubLabel.setText(club.getString("HOMECLUB"));
awayClubLabel.setText(club.getString("AWAYCLUB"));
m_id=club.getBigDecimal("MATCH_ID").toBigInteger();
DataSetView score=homeGoals.cloneDataSetView();
if (score.rowCount()>0){
score.first();
DataRow scoreRow = new DataRow(score);
do{
score.getDataRow(scoreRow);
if(m_id.equals(scoreRow.getBigDecimal("MATCH_ID").toBigInteger()) ){
if ( club.getString("HOMECLUB").equals(scoreRow.getString("CLUB")))
if (scoreRow.getString("TYPE").equals("own goal"))
awayCount++;
else
homeCount++;
else
if (scoreRow.getString("TYPE")=="own goal" )
homeCount++;
else
awayCount++;
}
}while(score.next());
}
homeScore.setText(Integer.toString(homeCount));
awayScore.setText(Integer.toString(awayCount));
score.close();
}
Secondly the dataset’s current row will be changed in the viewport if the loop
uses the dataset’s first() and next() methods to scan its rows. This will produce a
distracting visual ripple on the screen.
The first problem is solved if the match_id can be extracted from the current row
of the Matches dataset. Each of the homeGoals rows can be selected if they have the
same match_id.
In order to do this, a DataRow object must be instantiated, using the Matches
dataset as a parameter to its constructor. The resulting DataRow, named, club in the
350 SYSTEMS BUILDING WITH ORACLE
program, has a column structure derived from Matches and can hold the values
from one row of the dataset. The club object receives the values from Matches’
current row as a result of the call to the getDataRow method of the Matches dataset.
Values from club are then used to set text in the JLabels on the form indicating
the home team and the away team, and to assign the match_id to a local variable for
later comparison in the dataset loop. The columns of the club object are indicated
by string parameters to its getString or getBigDecimal methods.
The answer to the second problem is to make a copy of the homeGoals dataset.
Moving the current row cursor up and down the copy will have no effect on the
viewport. The copy, named score in the program, is instantiated from the
DataSetView class by a call to the dataset’s cloneDataSetView method. As well as
constructing the score object, the dataset’s rows are copied into it.
The score datasetview is tested to make sure there are some goals to assess and, if
so, score’s cursor is positioned at score’s first row and a DataRow called scoreRow is
derived from the column structure of score. The current row is copied into scoreRow
as the first action of each iteration.
Three nested if statements test the values in scoreRow for being associated with
the particular match, for being a home goal or an away goal and for being an own
goal or not. The local variable counters are incremented accordingly in each
iteration.
Finally, at the bottom of the datasetview, the next() method returns false and the
loop drops through. The local score counters are used to update the form and the
method ends by closing score to enable it to be efficiently garbage collected.
The matchResults form is now complete.
The match results report will list all the fixtures for a given season. Each match
listed will show the goalscorers and the final result. The screen design is shown in
Figure 14.55.
A jdbNavComboBox called selectSeason will provide the means for selecting the
season for the report. The items in the list portion of the combobox are taken
straight from a nominated column of a dataset. Create the QueryDataSet called
getSeasons. It is based on a very simple query (Figure 14.56).
Assigning this dataset to selectSeason and nominating the column season enables
the design editor to show the first row of the dataset in the combobox, just as in the
previous section. When the application is active, the combobox is populated auto-
matically and any of the season values can be selected to specify the report.
The other components in the Results panel are three buttons called Query, Word
and Clear3, a label for the combobox and then a JScrollPane (a Swing container)
holding a JTextPane called reportText. The JTextPane will hold preliminary results
from the report query and the JScrollPane provides scrollbars when the text
contents cannot be displayed in the available area.
IMPLEMENTING IN JAVA 351
The Query button has a mouseClicked event listener installed in the events tab of
its property inspector. The program code for the listener’s action method is shown
in Figure 14.57.
This method seems to be long and complicated, but structurally it is really quite
simple. A JDBC prepared statement is created with a query string that joins the
players, goals, matches and teamsheets database tables. The statement has one
352 SYSTEMS BUILDING WITH ORACLE
void Query_mouseClicked(MouseEvent e) {
String rep; BigInteger m_id;
java.sql.Date m_date=java.sql.Date.valueOf("1901-01-01");
String homeClub=""; String awayClub=""; int homeCount=0;
int awayCount=0;
try {
PreparedStatement stmt =conn.prepareStatement("select surname,
players.ufa_id, type, minute,
matches.match_id, homeclub, awayclub,
teamsheets.club, matchdate
from players, goals, matches, teamsheets
where players.ufa_id=goals.ufa_id
and matches.match_id=goals.match_id
and teamsheets.match_id=goals.match_id
and teamsheets.ufa_id=goals.ufa_id
and matches.season=?
order by matchdate, match_id, minute");
stmt.setString(1,selectSeason.getSelectedItem().toString());
ResultSet rs=stmt.executeQuery();
while(rs.next()){
if (m_date.equals(rs.getDate(9)))
m_date=rs.getDate(9);
else{
m_date=rs.getDate(9);
if (!homeClub.equals(""))
reportText.setText(reportText.getText()+" "+HomeClub
+" "+String.valueOf(homeCount)+" "+awayClub +" "
+String.valueOf(awayCount)+"\n\n");
homeClub="";
reportText.setText(reportText.getText()+rs.getDate(9).toString()+"\n");
}
if (homeClub.equals(rs.getString(6))){
homeClub=rs.getString(6);
awayClub=rs.getString(7);}
else{
if (!homeClub.equals(""))
reportText.setText(reportText.getText()+"
"+homeClub+" "+String.valueOf(homeCount)+"
"+awayClub +" "+String.valueOf(awayCount)+"\n\n");
homeClub=rs.getString(6);
awayClub=rs.getString(7);
homeCount=0;
awayCount=0;
reportText.setText(reportText.getText()+"
"+rs.getString(6)+" vs "+rs.getString(7)+"\n");
}
if (homeClub.equals(rs.getString(8))){
if (rs.getString(3).equals("own goal"))
awayCount++;
else
homeCount++;
reportText.setText(reportText.getText()+"
"+rs.getString(1)+" ("+rs.getString(4)+") "+rs.getString(3)+"\n");
}
else{
if (rs.getString(3).equals("own goal"))
homeCount++;
else
awayCount++;
reportText.setText(reportText.getText()+"
"+rs.getString(1)+" ("+rs.getString(4)+") "+rs.getString(3)+"\n");
}
}
if (!homeClub.equals(""))
reportText.setText(reportText.getText()+"
"+homeClub+" "+String.valueOf(homeCount)+" "+awayClub +"
"+String.valueOf(awayCount)+"\n\n");
Word.setEnabled(true);
Query.setEnabled(false);
}
catch(Exception f){
statusBar.setText("STATUS: "+f.getMessage());
}
}
Notice that the component does not have a method for appending text, only a
setText() method. Each text insertion is therefore concatenated with the previous
contents to build up the report listing as an insertion rather than a replacement.
Figure 14.58 shows the application running with the Reports tabbed pane
exposed. The Results panel is shown and the Query button has just been clicked. The
reportText pane displays part of the raw report produced by the button’s action
method. The headers and sub-headers and the summary lines have been produced
by the if statements in the method.
In order to enable the fullest formatting and text manipulation for the report
text, clicking on the Word button will write the report contents to a temporary file
and call Microsoft Word with the file as a parameter to the call. The action method
for the Word button’s action method is shown in Figure 14.59.
This method uses the File class, so an additional import statement for java.io.*
will need to be added at the beginning of the program. The File class has a method
for creating a temporary file whose filename is guaranteed to be unique. Two
parameters are supplied: a prefix and a suffix. A random number is inserted between
these to create the unique filename in the default temp directory associated with
the current user.
A common method of inputting or outputting data in Java is through data
streams. The streams-based classes inherit from primitives that actually perform
very low-level interface management. A FileOutputStream is designed specifically
for file output and inherits device control from OutputStream. It provides a write()
method which outputs data one byte at a time. A PrintStream object is created from
an OutputStream-related object and provides higher level output methods print()
IMPLEMENTING IN JAVA 355
void Word_mouseClicked(MouseEvent e) {
File repFile;
String rep;
PrintStream outfile;
try {
repFile= File.createTempFile("ufl",".txt");
outfile= new PrintStream(new FileOutputStream(repFile));
outfile.println(reportText.getText());
outfile.close();
rep=repfile.getCanonicalPath();
statusBar.setText("STATUS: Report file \
"+repfile.getCanonicalPath());
Runtime.getRuntime().exec("C:\\Program Files \\Microsoft \
Office\\Office10\\winword.exe /n \"" + rep +"\"");
}
catch (Exception f){
statusBar.setText("STATUS: "+f.getMessage());
}
}
and println() which can more easily send large data objects to the OutputStream’s
destination.
The method code above shows first how a new FileOutputStream is associated
with the File object, repFile, and then used as a parameter for the construction of a
PrintStream object. This object, outfile, has high-level output methods, is associated
with file I/O interfaces and, finally, has a guaranteed unique name. The contents of
reportText are sent to the file in one statement and the file is then closed.
The next line of code copies the name of the temporary file to the status bar as a
signal of success. Notice that all of this code is in a try/catch structure because the
file operation can raise I/O exceptions.
Finally, the file is passed as a parameter to the launching of an external process,
the Microsoft Word program. This is achieved by using the exec method of the
application’s Runtime object. Each application has one such object and additional
instances cannot be constructed. The call is made via the Class’s getRuntime()
method, which returns the application’s Runtime object. This is then use to make
the exec call.
Notice the rather complicated way in which the parameters to the exec method
must be manipulated. The call to the Word program must supply a fully qualified
pathname to the executable file. This contains several \ characters, which are
themselves a special character in Java strings. To enable their presence in the
string, a \\ doublet must be used. The file parameter to Word must be enclosed in
356 SYSTEMS BUILDING WITH ORACLE
double quotes ("). To enable a " within a string, it must be preceded by a \. Figure
14.60 shows the report in Microsoft Word.
Finally, the Clear3 button’s action is enabled. Install a mouseClicked listener in
the button’s property and write the action method to set reportText’s contents to an
empty string. Notice that clicking on the Query button deactivates it and activates
the Word button. Clicking on the Clear button should activate the Query button
and deactivate the Word button as well as clearing the text in reportText.
The matchResults report and the UFL application are now complete. Or at least,
complete as a demonstration. It is suitable to show as a first-cut solution to a client,
but it still has a number of empty panels and there is something wrong with the
query underlying the match results report. Can you see what it is? It is connected to
the fact that there is no 0–0 draw in the sample results.
IMPLEMENTING IN JAVA 357
14.7 Î EXERCISES
ADVANCED DATABASE
ISSUES
CHAPTER 15
UNDERLYING STRUCTURES
15.1 Î INDEXES
Indexes are used to provide fast access paths during the execution of queries. The
optimizer will generally choose to use an index in place of a full table scan if the
index key is an attribute within the selection restriction or if it is a join attribute.
When a primary key constraint is specified on a table, an index is automatically
created. Indexes on attributes other than the primary key can be created once the
table is populated with data. Each index entry contains the index key value
together with a ROWID that points directly to the position of the row in the base
table that contains the key value.
The index is created, using a data structure which supports fast access to the
index entry and thus to the table row, when it is probed with a search value. Oracle
provides several such structures that are appropriate for use in different circum-
stances, taking into account the number of rows in the base table, the distribution
of key values, whether the keys are unique or duplicated and whether the base table
is subject to constant update and insertion or relatively stable.
361
362 SYSTEMS BUILDING WITH ORACLE
<Nirvana City
Nirvana City
The most commonly used index structure is the balanced binary tree. Section 11.2
illustrated the concept of binary trees in indexes. In fact, Oracle uses a balanced
binary tree that equalizes access times to any row.
Figure 15.1 shows that the branch blocks of a B*-tree index contain pointers to
lower level branch blocks, enabling a probe for a key value. The actual indexed data
values, together with a ROWID for the base table, are held in the leaf blocks. The
leaf blocks are doubly linked, allowing a descent to the leaf block at the far right or
far left and then a range index scan across the leaves.
A unique index will contain only one ROWID for each key value. In a non-unique
index, the ROWID is included with the data value as the sort key. Duplicated key
values will thus occupy the same or adjacent leaf nodes. The leaf nodes are of fixed
block size. When a leaf block fills up the tree is reorganized, adding a further
branch level and doubling the number of leaf nodes.
All leaf blocks of the tree are at the same depth, so retrieval of any record from
anywhere in the index takes approximately the same amount of time. B*-tree indexes
automatically stay balanced. B*-trees provide excellent retrieval performance for a
UNDERLYING STRUCTURES 363
wide range of queries, including exact match and range searches. Inserts, updates and
deletes are efficient, maintaining key order for fast retrieval. B*-tree performance is
good for both small and large tables, and does not significantly degrade as the size of a
table grows.
A reverse key index reverses the bytes of each key value (except the ROWID). This
can prevent performance degradation in a parallel server environment where
modifications to the index are concentrated on a small set of leaf blocks. Conten-
tion between CPUs for this concentration of leaf nodes can be reduced by reversing
the keys of the index: the insertions become distributed across all leaf keys in the
index. This can make some OLTP applications faster if there is a great deal of
concurrent update activity.
However, a reverse key arrangement eliminates the ability to run an index range
scan. The keys are not ordered lexically, so adjacent keys are not stored in adjacent
blocks. Only fetch-by-key or full-index scans can be performed.
The purpose of an index is to provide pointers (a ROWID) to the rows in a table that
contain a given key value. In a bitmap index, a bitmap for each key value is used
instead.
Each bit in the bitmap corresponds to a possible ROWID. If the table contains
100,000 rows, the bitmap will have 100,000 bits. If a bit is set, then the row with
the corresponding position in the table is the key value. A mapping function
converts the bit position to an actual ROWID, so the bitmap index provides the
same functionality as a regular index even though it uses a different representation
internally. If the number of different key values is small, bitmap indexes are very
space-efficient.
Since there is a bitmap for every distinct key value, this type of index is only effi-
cient where there are a very few unique key values. Bitmap indexes are suitable for
decision support systems where the data is relatively stable. Rows that satisfy some,
but not all, conditions in a selection predicate are filtered out before the table itself
is accessed.
Bitmap indexes are most efficient with low cardinality columns: columns in
which the number of distinct values is small compared with the number of rows in
the table. If a data warehouse application holds a table with one million rows, of
which a column has 10,000 distinct values, this column is a candidate for a bitmap
index. The index will have 10,000 entries, and each entry, corresponding to one
distinct key value, will have one million bits indicating the presence or otherwise
of the particular key value.
Figure 15.2 shows a table containing the results of a customer survey. Four of the
attributes have very restricted domains of allowed values. Job has three possible
values, Region has three, Gender has only two and Income has four.
364 SYSTEMS BUILDING WITH ORACLE
Creating four indexes on each of these attributes would result in the structures
shown in Figure 15.3. Each entry (or ‘bit’) in the bitmap corresponds to a single row
of the customer table. The value of each bit depends upon the values of the corre-
sponding row in the table. The bitmap region=‘north’ contains a 1 as its first bit:
this is because the region is ‘north’ in the first row of the customerSurvey table. The
bitmap region=‘north’ has a 0 for its other bits because none of the other rows of the
table contain ‘north’ as their value for region.
Since bitmap indexes for the same table will have the same size maps, AND and
OR conditions in the WHERE clause of a query can be quickly resolved by
performing the corresponding Boolean operations directly on the bitmaps before
mapping the result to ROWIDs.
An analyst investigating demographic trends of the company’s customers might
ask, ‘How many of our professional customers live in the south or east regions?’.
This query can be processed with great efficiency by merely counting the number of
1s in the resulting bitmap. No table access has been necessary. The logical OR of
bitmap 1 with bitmap 2 in Figure 15.4 produces intermediate bitmap 3. This is ANDed
with bitmap 4 to give the result, bitmap 5. Count the 1s to give the final result.
15.2 Î TABLES
It is possible to vary the standard interleaved block structure of table storage. The
principal motivation for this will be improve query performance. When table
structures are changed to aid one particular query, care must be taken that other
queries do not have their execution plans compromised.
UNDERLYING STRUCTURES 365
Key Bitmap
Professional 0 1 1 0 0 1
Manual 1 0 0 0 1 0
Other 0 0 0 1 0 0
Key Bitmap
north 1 0 0 0 0 0
south 0 1 0 0 1 1
east 0 0 1 1 0 0
Key Bitmap
female 0 1 1 0 1 1
male 1 0 0 1 0 0
Key Bitmap
bracket 1 1 0 0 0 0 0
bracket 2 0 0 1 0 1 0
bracket 3 0 0 0 0 0 1
bracket 4 0 1 0 1 0 0
An index-organized table differs from an ordinary table in that the data for the
table is held in its associated index. Changes to the table data, such as adding new
rows, updating rows or deleting rows, result only in updating the index.
Instead of maintaining two separate storages for the table and the B*-tree index,
the database system only maintains a single B*-tree index which contains both the
366 SYSTEMS BUILDING WITH ORACLE
Operation Map
1 Region='south' 0 1 0 0 1 1
2 Region ='east' 0 0 1 1 0 0
3 1 OR 2 0 1 1 1 1 1
4 Job ='professional' 0 1 1 0 0 1
5 3 AND 4 0 1 1 0 0 1
Count(*)
--------
3
Figure 15.4
key value and the associated column values for each row. The actual data is stored
in the B*-tree index rather than a ROWID pointer.
Index-organized tables are suitable for accessing data by the primary key.
Secondary indexes can still be built to provide efficient access by non-key attributes.
Index-organized tables provide faster key-based access to table data for queries
that involve exact match or range search, or both. The storage requirements are
reduced because key columns are not duplicated as they are in an ordinary table
and its index.
15.2.2 CLUSTERS
Clusters are an optional method of storing table data. A cluster is a group of tables
that share the same data blocks because they share common columns and are often
used together.
Players and Contracts share the ufa_id column. If these two tables were clustered,
Oracle would physically store all rows for each ufa_id from both the Players and
Contracts tables in the same data blocks.
Disk I/O is reduced and access time improves for joins of clustered tables. In a
cluster, a cluster key value is the value of the cluster key columns for a particular
row. Each cluster key value is stored only once each in the cluster and the cluster
index, no matter how many rows of different tables contain the value. Therefore
less storage might be required to store related table and index data in a cluster than
is necessary in the non-clustered table format.
UNDERLYING STRUCTURES 367
To identify data that would be better stored in clustered form, look for tables that
are related via referential integrity constraints and tables that are frequently
accessed together using a join. There is a reduction in the number of data blocks
that must be accessed to process such joins.
The cluster key is the column that the clustered tables have in common. This key
is specified when creating the cluster. Tables that are subsequently added to the
cluster must share this key. A cluster index must be created on the cluster key after
the cluster has been created. To locate a row in a cluster, the cluster index is used to
find the cluster key value, which points to the data block associated with that
cluster key value.
Select...
From players
where ufa_id = 'BR546732' I/O
I/O
I/O BR546732, I/O
Rowid
Players table
ufa_id other data
Index on ufa_id
BR546732
FT246517
VT432167
UT23685
...
store the key value, and another I/O performed to read or write the row in the table
or cluster. An index-organized table would not require the final I/O, since the full
row data is kept in the leaf nodes of the index (Figure 15.6).
Select...
From players 272
where ufa_id = 'BR546732' one I/O direct to UT23685
hash cluster
BR546732
The mathematical transformation used in hash cluster ing is called a hash func-
tion. This is chosen to generate an even distribution of numeric values, called hash
values, based on the shared key values of the clustered tables, which may not be
evenly distributed. To find or store a row in a hash cluster, Oracle applies the hash
function to the row’s cluster key value; the resulting hash value corresponds to a
data block in the cluster. Oracle then reads or writes the block directly. Usually
only one I/O operation is necessary to read or write a row in a hash cluster.
The hash function must be chosen so that under normal conditions two rows
with different key values do not produce the same hash key. If they do, a collision is
said to have occurred.
The most common technique is to use modulus arithmetic on a prime number
with the key value as the denominator. A prime number has no factors, so dividing
it by a key value (or a number derived from an alphanumeric key value) always
leaves a remainder. That remainder is the result of a modulus operator (Figure
15.7).
17 mod 3=2
101 mod 4 =1
The prime number must be larger than the number of key values (the number of
rows in the table) so as to allow the possibility of each key value hashing to a
UNDERLYING STRUCTURES 369
different result. The storage space for the table is pre-allocated with the number of
blocks equal to the prime number minus 1. The remainder cannot be 0 and it
cannot equal the prime number. The job of a hash function is to provide the
optimum even distribution of rows among the available hash values of the cluster.
To achieve this goal, a hash function must minimize the number of collisions.
If a collision occurs and there is no space in the original block allocated for the
hash key, an overflow block must be allocated to hold the new row. The likelihood
of this happening is largely dependent on the average size of each hash key value
and corresponding data.
A single-table hash cluster can provide fast access to individual rows in a table, as
shown in Figure 15.6. If more than one table is clustered on a shared key, a hash
cluster will give improved performance for joins between those tables, provided
the join predicate is the shared key. A hash-clustered table may exhibit worsened
performance for queries requiring sort-ordered access since the rows are kept in
random order with respect to the key. Such access would require a full table scan
and a sort operation in buffers.
All join operations take place between two relations, since the relational algebra’s
join is a binary operator. The two relations remain unchanged and a third (joined)
relation is the result. If there are more than two tables in the selection domain then
the resulting third relation is joined with the next table until join operations are
complete (Figure 15.8).
The method of joining the relations depends on the selection made by the query
optimizer. Oracle has four principal methods available and they are used in
different situations. The decision will be based on the underlying storage structure
of the tables involved in the selection domain; whether indexes have been
370 SYSTEMS BUILDING WITH ORACLE
specified for attributes involved in the join condition; the statistical distribution of
keys in the tables; and the numbers of rows in the participating tables.
The simplest method of joining two relations is arbitrarily to pick one of them,
usually the largest, as the outer or driving table. The other is known as the inner
relation. These names refer to their position in the loops.
The outer relation, or a projection of it, is read into an SGA buffer and a projec-
tion of the join attribute and ROWID from the inner table is read into a separate
SGA buffer. For each row of the outer relation, a scan is made of the entire inner
relation, searching for a match on the join condition. If a match is found then
attributes from the outer relation and attributes from the inner relation, found
using the ROWID, are placed in the intermediate relation. Which attributes are
thus projected will depend on which are required for subsequent joins or which are
required for the final selection set.
The smaller relation is chosen for the inner loop because it is scanned in full for
each of the outer loop iterations. Using an index to read the inner table rows is only
possible if the index already exists and if its key is the join attribute. Otherwise, a
full table scan of the inner table is necessary. Projecting just the join attribute and
ROWID of the inner table into buffers reduces memory usage in the SGA.
Nested loops can be used for equijoins or theta join s where the join attributes are
tested for any of the comparison operators.
A sort merge consists of a two-stage operation. First, a full table scan of each table is
performed, projecting into separate buffers at least the join attributes, together
with any other attributes that will be needed in subsequent operations.
UNDERLYING STRUCTURES 371
The two buffers are then sorted by the join attribute and the two sorted buffers
are scanned in one pass. If the join attributes satisfy the join condition, attributes
are projected into the result. If they do not match then the buffer whose sort
attribute has the lowest value least is scanned until a match occurs. The scan there-
fore progresses through the two buffers, advancing its position in each according
to the relative values of the join attribute.
A sort merge can only be performed for an equijoin, where the two join attributes
are being tested for equality. Figure 15.10 gives sample pseudo-code for a sort
merge.
Oracle can perform a cluster join only for an equijoin that equates the cluster key
columns of two tables in the same cluster. In a cluster, rows from both tables with
the same cluster key values are stored in the same blocks, so Oracle only accesses
those blocks.
Figure 15.11 shows the execution plan for a simple select query in which the
Players and Contracts tables are stored together in the same cluster: Step 2 accesses
the outer table (Players) with a full table scan. For each row returned by Step 2, Step
3 uses the Players.ufa_id value to find the matching rows in the inner table
(Contracts) with a cluster scan. A cluster join is nothing more than a nested loop
join involving two tables that are stored together in a cluster. Since each row from
372 SYSTEMS BUILDING WITH ORACLE
1
Nested loops
SELECT *
FROM players a, contracts b
WHERE a.ufa_id=b.ufa_id;
2 3
Full table scan Players Cluster scan Contracts
the Players table is stored in the same data blocks as the matching rows in the
Contracts table, Oracle can access matching rows more efficiently.
Oracle can only perform a hash join for an equijoin. Oracle performs a full table
scan on each of the tables, storing key value and ROWID in buffers. A hash struc-
ture is then built in memory for the inner table. The hash structure is then probed
for each key value in the outer table buffer. Matched key values are placed in the
result relation.
The hash structure is constructed in much the same way as a hash cluster. A hash
function is applied to the key value. This yields a result that determines that key
value’s position in the hash structure.
This join method has advantages over a sort merge in that only the smallest
buffer needs to be pre-processed to produce the hash structure. The larger, outer
table buffer can be processed in serial order. Further, the construction of the hash
structure is likely to be more CPU and memory efficient than a sort operation.
Figure 15.12 shows the execution plan for a join between Players and Contracts.
Steps 2 and 3 perform full table scans of the Players and Contracts tables. Step 1
builds a hash table out of the rows coming from Step 2 and probes it with each row
coming from Step 3.
With the cost-based approach, the optimizer generates a set of execution plans
based on the possible join orders, join operations and available access paths. The
optimizer then estimates the cost of each plan and chooses the one with the lowest
cost. The optimizer estimates costs for disk, memory and CPU usage.
UNDERLYING STRUCTURES 373
1
Hash
SELECT *
FROM players a, contracts b
WHERE a.ufa_id=b.ufa_id;
2 3
Full table scan Players Cluster scan Contracts
The cost of a nested loops operation is based on the cost of reading each selected
row of the outer table and each of the matching rows of the inner table into
memory. The optimizer estimates these costs using the statistics in the data dictio-
nary. The cost of a sort –merge join is based largely on the cost of reading all the
sources into memory and sorting them.
The optimizer also considers other factors when determining the cost of each
operation. A smaller sort area size is likely to increase the cost for a sort–merge join
because sorting takes more CPU time and I/O in a smaller sort area.
A larger multi-block read count is likely to decrease the cost for a sort –merge join
in relation to a nested loops join. If a large number of sequential blocks can be read
from disk in a single I/O, an index on the inner table for the nested loops join is less
likely to improve performance over a full table scan.
For join statements with outer join conditions, the table with the outer join oper-
ator must come after the other table in the condition in the join order. The
optimizer does not consider join orders that violate this rule.
Indexes Key values extracted from a file or table along with a pointer to
a source record or row. Keys are organized to provide efficient
searches
Reverse key Key values are inserted after reversing the bytes in the value.
indexes Randomizes the distribution of clustered values. Useful for
range-partitioned parallel searches
374 SYSTEMS BUILDING WITH ORACLE
Bitmaps Each row has an entry in the bitmap. The entry is 1 if the row
value conforms to the indexing criteria, 0 if not. Predicate
(where clause) connectors AND, OR and NOT performed using
bit-wise operators
Clustered Related (equijoin) rows from two or more tables are stored in
clustered blocks. Join rows from the tables will always be found
in the same block. Blocks allocated sequentially
Hash Related rows from two or more tables are stored in the same
clustered block. Position of the block is determined by applying a hash
function to the common attribute(s).
Join methods Algorithm for joining two relations to form and intermediate or
final result
Nested Largest relation read into buffer; smaller relation’s join attribute
loops and ROWID read into buffer. For each row of large, scan all
rows in small; if join condition matched, project necessary
attributes into intermediate relation
Sort merge Project join attribute and ROWID for each relation into buffer.
Sort each buffer on join attribute. The two buffers are then
scanned in tandem. Matching entries are placed in third buffer.
Attributes projected, using ROWIDs in the third buffer, to
intermediate result
Cluster join Only used for already clustered tables. First table is scanned
block by block; attributes from second table taken from the
same block
Hash join Both tables scanned; join attribute and ROWID projected into
two hash structures. Smallest hash structure is scanned. At each
entry, larger structure is probed with the hash value to
determine match.
15.5 Î EXERCISES
(i) Write a Java program to construct external (to the database) bitmap indexes
from a single table. Use your bitmaps to perform count aggregations as
described in this chapter.
UNDERLYING STRUCTURES 375
(ii) Write a Java program to create a new table by joining two large tables or files
using an external:
G Sort merge
G Hash join
G Devise a method of quantifying performance.
CHAPTER 16
The majority of database applications process many small concurrent update trans-
actions. A bank’s automated teller machine (ATM) system, which must allow for
thousands of withdrawal, deposit and transfer transactions to happen concur-
rently, is a typical example of such an application. This type of processing is known
as online transaction processing (OLTP), and databases used to support such
processing need to meet the strict ACID properties of transaction processing (see
Chapter 8). OLTP is often measured in terms of transaction throughput: the
number of committed transactions per second.
OLAP generally consists of read-only queries that do not require the same level of
transactional support. OLAP applications are invariably associated with data ware-
houses and large collections of data, where they are used to provide easy naviga-
tion and investigation of historical data to enable organizations to make informed
tactical and strategic decisions (Table 16.1).
At the operational level of business, users are concerned with the finest granu-
larity of data. The highest level of detail is concerned with individual customer
accounts, invoices and shipping information. Tactical managers are more inter-
ested in aggregated historical data to assist in planning decisions such as customer/
service agent ratios, reorder levels that optimize inventory without compromising
delivery and so on. Strategic information has the lowest granularity of data and is
376
DECISION SUPPORT SYSTEMS 377
OLTP OLAP
Updates operational data Analysis of historical data
Stores detailed data Stores summarized data
Repetitive processing Ad hoc complex queries
Predictable usage pattern Subject oriented
Transaction driven Fast response time required
Application oriented Analysis driven
Algorithmic decisions Supports strategic decisions
Usually small changes Usually read-only
Large number of transactions Low transaction throughput
Many operational users Few ‘managerial’ users
Analytic applications use historical data to answer many questions about enter-
prise performance. In a group of companies, the information can be analyzed on a
company-by-company basis or it can be consolidated for the group.
Planning applications allow organizations to estimate future performance. They
generate new data using predictive analytical tools. They use sophisticated statis-
tical methods to identify trends from historical company data and may use
external data to assess economic, climate, demographic or other environmental
factors on likely organizational performance (Table 16.3).
Financial analysis systems allow organizations to evaluate past performance,
forecast revenue and spending, establish profit goals and monitor change effects
on the financial plan. Demand planning assists organizations in predicting market
demand based on factors such as sales history, special promotions and pricing.
Product demand forecasts determine manufacturing goals.
All OLAP systems rely on a consolidated view of transactional data known as a
multi-dimensional conceptual view. Figure 16.1 shows a simple example from
378 SYSTEMS BUILDING WITH ORACLE
the UFL schema. Match statistics over several seasons have been transformed into
summaries of each club’s goalscoring record, analyzed by type of goal and season.
The cube shape shows the three principal dimensions, but there could be further
detail. The Club dimension, for example, could be subdivided into players.
s
Bountiful Athletic
ub
Cl Pity City
Goal
Knowledge City
Goals
Wisdom United Own
Goal
23 27 32 40
Pen
3 5 3 7
6 5 12 9
2001/2 2003/4
2002/3 2004/5
Seasons
dimension might reveal whether there is a relationship between high scoring and
salary for forwards or between salary and clean sheets for goalkeepers.
Typical OLAP queries might include aggregation: summarizing the data across
one or more dimensions.
AGGREGATION Find total Goals (for all clubs, seasons and goal types)
ROLLUP Given total goals per player, find total goals per club
Drill down is the disaggregation of summarized data either along the same
dimension, using the hierarchy, or across another dimension.
DRILL DOWN Given total goals per club; find total goals per player
Given total goals per club; find total goals per club per
season
A data warehouse is a database that is designed for query and analysis rather than
for transaction processing. It often contains historical data derived from transac-
tion data, but it can include data from other sources. A separate OLAP server segre-
gates analysis workload from transaction workload and enables an organization to
consolidate data from several sources.
In addition to a database, a data warehouse environment includes an extraction,
transformation and loading (ETL) solution, an online analytical processing (OLAP)
engine, client analysis tools, and other applications that manage the process of
gathering data and responding to user requests.
A common way of introducing data warehousing is to refer to the characteristics
of a data warehouse as set out by William Inmon (1996) (Table 16.4).
Data warehouses are designed for data analysis. To learn something specific
about a company’s performance, a warehouse can be built that concentrates on
that aspect, say, manufacturing or customer service. The ability to define a data
warehouse by subject matter makes the data warehouse subject-oriented.
380 SYSTEMS BUILDING WITH ORACLE
A typical data warehouse query scans thousands or millions of rows. Data ware-
houses often use denormalized or partially denormalized schemas to optimize
query performance. The process of normalization is designed to protect update,
deletion or insertion operations from creating inconsistencies in the data. If the
data is static once it is in the data warehouse, the denormalization, or pre-calcula-
tion, to prepare the data for lengthy read-only queries is permissible.
Figure 16.2 shows a simple architecture for a data warehouse. End users directly
access data derived from several source systems through the data warehouse.
Special tools must be used to load the data and create the materialized views before
the OLAP tools on the client side may begin to make queries.
A more complex model is shown in Figure 16.3. Data from transactional or
external sources is brought together in a staging database. Here the integration
process and part of the pre-calculation phase can be carried out by using ETL tools.
The type of database server holding the OLAP data also has architectural implica-
tions. A multi-dimensional OLAP (MOLAP) server uses the multi-dimensional data
model directly. MOLAP products are specialized database applications, dedicated
to serving decision support systems. Client tools query the multi-dimensional
database (Figure 16.4).
Considerable conversion work must be carried out during the loading phase, but
performance is likely to be enhanced because the data model and the query para-
digm have no impedance mismatch. A multi-dimensional database uses the idea of
a data cube to represent the dimensions of data available to a user. For example,
sales could be viewed in the dimensions of product, model, geography, time, or any
DECISION SUPPORT SYSTEMS 381
Analysis
Load
Load Reporting
Data warehouse
Load
Data mining
Analysis
Load
Load Reporting
Staging Data warehouse
Load
Data mining
Relational
database/
legacy system End user OLAP
MOLAP server tools
Result set
True
multi-dimensional
database
other additional dimension. In this case, the subject orientation, sales, is known as
the measure attribute of the data cube and the other dimensions are seen as
feature attributes. Hierarchies and levels can be created within a dimension (for
example, city and postcode levels within a regional hierarchy).
If the data warehouse is based on a relational database (ROLAP), the data can still
be viewed multi-dimensionally and the SQL language can be embedded in client
tools (Figure 16.5). Processing the materialized views or other schema constructs
will be less efficient.
Proprietary MOLAP servers naturally use proprietary schema constructs and each
vendor’s model is different. Multi-dimensional schemas based on relational tech-
nology have common cross-product characteristics and there are several alterna-
tive designs
Data warehouses and data marts are conceptually different – in scope. However,
they are built using the exact same methods and procedures.
A data warehouse (or mart) is way of storing data for later retrieval. This retrieval
is usually used to support decision-making in the organization. That is why data
warehousing is considered almost synonymous with decision support systems. It is
possible to customize the data warehouse architecture for different groups within
the organization. Data marts are systems designed for a particular line of business.
Figure 16.6 shows a data mart architecture where data from a main organizational
DECISION SUPPORT SYSTEMS 383
Relational
database/
legacy system End user OLAP
ROLAP server tools
SQL
Data request
Result set
Result set
Relational
database with
multi-dimensional
views
Purchasing
Load
Staging Data warehouse
HR systems
Load
Marketing
data warehouse has been divided into sub-systems for the purchasing, HR and
marketing departments. Segregating the data in this way may mean that it can be
restructured so that specialist OLAP tools, designed for a particular business func-
tion, can be applied to the data.
The aim is to build a schema that presents a view of transactional data as summa-
ries or aggregations. These data are called measures. Measures are numeric values
that are measurable and additive.
Just tracking measures is not enough, however. The measures need to be looked
at using by conditions: sales by department, goals by club, defects by factory. These
by conditions are called dimensions. There is almost always a time dimension, but
there may be any number of other dimensions.
Therefore, in designing a star schema (defined in Figure 16.8), the first order of
business is usually to determine the measure and how it should be seen.
When mapping dimensions into tables, there are a few rules to keep in mind.
First, all dimension tables should have a single-field primary key. This key is often
just an identity column, consisting of an automatically incrementing number. The
value of the primary key is meaningless; the information is stored in the other
attributes. These other fields contain the full descriptions of the desired view. For
example, if there is a product dimension (which is common) there should be attrib-
utes in it that contain the product description, the category name, the sub-category
name etc. These fields do not contain codes that link to other tables. Because the
fields are the full descriptions, the dimension tables are often fat; they contain
many large fields.
The product dimension, for example, contains individual products. Products are
normally grouped into categories, and these categories may well contain sub-cate-
gories. For instance, a product with a product number of X12JC may actually be a
refrigerator. Therefore, it falls into the category of major appliance, and the sub-
category of refrigerator. There may be more levels of sub-categories where this
product could be classified.
The example in Figure 16.7 shows a hierarchy in a dimension table. This is how
the dimension tables are built in a star schema; the hierarchies are contained in the
individual dimension tables. No additional tables are needed to hold hierarchical
information.
Storing the hierarchy in a dimension table allows for the easiest browsing of the
dimensional data. In the above example, choosing a Product could hardly be
simpler. Drilling down into the Product dimension schema could be done by
choosing a Product and examining the Categories or Sub-Categories. There is no
need to join to an external table for any of the hierarchical information.
A star schema consists of a fact table and dimension tables for each of the dimen-
sions in the multi-dimensional view. Hierarchies for each dimension are repre-
sented within the dimension table as its attributes.
DECISION SUPPORT SYSTEMS 385
Product Dimension
Product ID
Product Code
Product Name
Category
Sub-Category
The dimension tables are linked to the SalesFact table through a foreign key. In
the simplified example in Figure 16.8, there are three dimension tables related to
the fact table. The star schema gets it name from the grouping of the dimension
tables around a single fact table like the points of a star (Figure 16.8).
Drilling down Sales for the product and time dimensions needs a simple join
between three of the tables in the star schema (Figure 16.9).
Product ID Time ID
Product ID
Product Code DayofMonth
Time ID
Product Name Month
Location ID
Category Quarter
Amount
Sub-Category Year
Location
Dimension
Location ID
Country
City
Store
select sum(SalesFact.Amount)
from TimeDimension, ProductDimension SalesFact
where ProductDimension.ProductID = SalesFact.ProductID
and TimeDimension.TimeID = SalesFact.TimeID
and ProductDimension.Category='Beachwear'
and TimeDimension.Month=3
and TimeDimension.Year=2003;
select sum(SalesFact.Amount)
from TimeDimension, ProductDimension SalesFact
where ProductDimension.ProductID = SalesFact.ProductID
andTimeDimension.TimeID = SalesFact.TimeID
and ProductDimension.Sub-Category='Bermudas'
and TimeDimension.Month=3
and TimeDimension.Year=2003;
A snowflake schema breaks the hierarchies in the dimension table into separate
more normalized structures. Each point of the star takes on a more complex shape.
This leads to more complex and slower response times. Figure 16.11 represents the
beginning of the snowflake process. The category hierarchy is being broken out of
the ProductDimension table. Sub-categories are represented by a recursive relation-
ship. Since the purpose of OLAP is to speed queries, snowflaking is usually not
advisable. It may be justified if the data is subject to update.
The fact table holds the measures, or facts. The measures are numeric and addi-
tive across some or all of the dimensions. The fact table is potentially very large. If
sales data has been collected on a daily basis for ten years and there are 500
different products and 200 different locations, then the fact table could be holding
365,000,000 records.
One of the most important decisions in building a star schema is the granularity
of the fact table. The granularity, or frequency, of the data is usually determined by
the time dimension. The finer the granularity, the more records will be in the fact
table. Changing from storing daily to monthly receipts will reduce the size of the
fact table by a factor of 30, but it will also reduce the scope of the drill down
operations.
DECISION SUPPORT SYSTEMS 387
Location
Dimension
Location ID
Country
City
Store
One of the major benefits of the star schema is that the low-level transactions are
summarized to the fact table grain. This greatly speeds the queries that are
performed as part of the decision support.
The calculation above results in an estimate for the maximum size of the fact
table. Experience shows that not every product sells on every day at every store.
Where there is missing data, adding in zero values would not add anything to the
useful aggregations for this OLAP system. The fact table is left sparsely populated.
A data warehouse must be loaded regularly so that it can serve its purpose. The data
must consist of timely data so that the queries and analysis reflect the most accu-
rate and appropriate information.
Data from one or more operational systems needs to be extracted and copied into
the warehouse. The process of extracting data from source systems and bringing it
into the data warehouse is commonly called ETL, which stands for extraction,
transformation and loading. ETL refers to a broad process, and not three well-
defined steps.
Data warehouse environments have an additional burden in that they have not
only to exchange but also to integrate, rearrange and consolidate data from many
systems, thereby providing a new unified information base for business intelligence.
388 SYSTEMS BUILDING WITH ORACLE
(a)
Purity FC
Goals
Bountiful Athletic
Beauty United
25 20 5 11
(b)
d
ite
WINSTON PLATO
y
EDWARD KANT
ut
a
PIERRE DIOCLETIAN
Be
n
ia
(c) Goals
et
cl
io
D
rre
6
e
Pi
Goals
14DEC-03
21DEC-03
26DEC-03
28DEC-03
04JAN-04
2
03/04
03/04
A major requirement of any OLAP application is to provide users with the ability to
visualize data at different levels of aggregation. For example, in the UFL system it
may be necessary to view a total count of goals scored by each club over a particular
season, a count of goals scored by each of the club’s players within that season, and
details of the matches in which those goals were scored. An example of such aggre-
gation is shown in Figure 16.12.
Figure 16.12(a) shows the highest level of aggregation required: the total number
of goals scored by each club per season. Beauty United scored a total of 11 goals in
the 2003/04 season. Figure 16.12(b) shows Beauty United’s goal scorers in the
2003/04 season, together with the total number of goals each of them scored.
Figure 16.12(c) shows the matches in the 2003/04 season in which the Beauty
United player Pierre Diocletian scored, together with the number of goals he scored
in each of those matches. It is easy to see that of Pierre’s six goals in Beauty United’s
total of eleven in the 2003/04 season, two were scored in the match played on 4
January 2004. The base data upon which the aggregations of Figure 16.12 are built
are the individual tuples of the Goals table (Figure 16.13).
2 rows selected.
The select statement in Figure 16.14 returns the total count of goals scored,
summarized by club and season. It is important to check which club a player was
playing for when he scored a goal. If this were not done, all goals scored by that
player would mistakenly accrue to their current club!
10 rows selected.
Figure 16.15 shows the SQL statement that will yield the total count of goals
scored, summarized by player, club and season.
In order to show the player’s full name it is necessary to concatenate their fore-
name and surname in the group by clause as well as in the selection set.
To view the total count of goals scored, summarized by match, player, club and
season, the select statement shown in Figure 16.16 is required.
To display both the total number of goals per player per club per season, and the
total number of goals per club, per season, it is necessary to merge the results of the
first two select statements. This can be achieved through the union construct.
Figure 16.17 shows an example of how this mechanism enables different levels of
aggregation to be handled within a single SQL query.
DECISION SUPPORT SYSTEMS 391
Remember, in a union operation the structure of each result set has to be iden-
tical. Therefore, in the example shown in Figure 16.17 it is necessary to add a
‘NULL’ column in the club-level aggregation select in order to represent the
concatenated player name and thus ensure compatibility with the player-level
aggregation select.
27 rows selected.
12 rows selected.
Until SQL:1999, this rather complicated use of the union construct was the only
way in which to achieve multiple levels of aggregation within a single SQL query.
SQL:1999 introduced new aggregating clauses and operations, among which was
the rollup clause. This clause is an extension of the group by clause and is used to
produce varying levels of summary information. Figure 16.18 shows how the use of
the rollup clause greatly simplifies the previous statement, yet still manages to
produce equivalent levels of aggregation.
The rollup clause is an optional part of the group by syntax and takes a bracketed
list of grouping columns as a parameter. The grouping columns must appear in the
selection set. All non-aggregated columns in the selection set must appear some-
where in the group by section of a statement.
Its purpose is to cause the output of additional levels of aggregation. Figure 16.18
shows a statement where all the non-aggregated columns from the selection set
appear in the rollup clause. As well as the usual group by aggregated rows being
output, there is summary information provided for the total of goals per club per
season, the total of all goals per season, and the total of all goals across all seasons.
Each grouping column placed within the rollup clause will generate an additional
level of summary information. The statement in Figure 16.18 has three grouping
columns in the rollup clause and so produces three additional levels of aggregation
on top of the standard aggregation associated with the group by clause. Level three
summarizes total goals per club, per season. Level two summarizes total goals per
season. Level one shows the grand total of all goals for all clubs, across all seasons.
There may be occasions when fewer levels of aggregation are needed. It may be
desirable to remove the level one grand total of goals across all seasons. To do this,
it is necessary to carry out two levels of aggregation, rather than the previous three,
in the rollup clause. This is achieved by removing the level one aggregation
column, matches.season, from the rollup clause, and placing it back in the standard
group by clause, as shown in Figure 16.19. The group by clause, with its rollup option,
still contains all non-aggregated columns.
394 SYSTEMS BUILDING WITH ORACLE
This statement calculates total goals per season per club per player, total goals
per season per club and total goals per season, across all clubs. However, total goals
across all clubs and all seasons are no longer calculated.
Rollup provides varying levels of aggregation across a single dimension of multi-
dimensional data. In the previous example, totals were calculated for the number
of goals scored per team, and per team-player per season. What if the total number
of goals per player were also required? Remember, players can transfer between
clubs. The total numbers of goals that they score per season, and across their career,
will not necessarily match the totals they score for a particular club.
What is required is a way of aggregating the base data across all possible dimen-
sions. SQL:1999 provides the cube extension to the group by clause precisely for
DECISION SUPPORT SYSTEMS 395
13 rows selected.
this purpose. Figure 16.20 shows the SQL statement required to calculate total
goals across all permutations of seasons, clubs and players.
Figure 16.21 presents the various aggregations resulting from the use of the cube
option with all dimensions of the three non-aggregated columns. Pierre Diocletian
appears seven times in this output.
Each entry shows a different analysis of his scoring record: across all seasons,
across all seasons by club, for a particular season, for a particular season by club.
He has scored 10 goals while playing in the Utopia League: four goals for Purity
FC and six goals for Beauty United. In the 2002/03 season he scored four goals, all
for Purity FC. In 2003/04 he scored six goals, all for Beauty United.
396 SYSTEMS BUILDING WITH ORACLE
Like the rollup operation, the cube operation can also be partially implemented.
Taking a summary column out of the cube clause back into the standard group by
clause removes levels of summarization. This is shown in Figure 16.22.
The removal of season from the cube clause ensures that both players and clubs
are always aggregated in relation to a particular season. This results in the dropping
398 SYSTEMS BUILDING WITH ORACLE
14 rows selected.
of aggregated rows showing goals scored per club across all seasons, and goals
scored per player across all seasons.
As well as the rollup and cube clauses, SQL:1999 has added a third optional
clause to the group by operation. Grouping sets allows for the output of selected
levels of aggregation only. The SQL statement shown in Figure 16.23 calculates the
total number of goals scored per season, per club, and per player, only. The detail of
such aggregated totals is automatically removed from the statement’s output.
Data mining tools use an automated approach to explore and bring to the surface
complex relationships in very large datasets. Data mining tools seek to identify
associations between data items, to establish correlation patterns and to make
predictions (Table 16.5).
There are two principal aims. A dataset may be queried to discover links in behav-
iour. Prediction about future behaviour may be deduced from past behaviour
(Table 16.6).
There are several kinds of exploratory analysis: association rules, sequential
patterns, classification, clustering, Bayesian networks for inferring causality,
sequence similarity and visualization. They are all based on well-established math-
ematical and statistical techniques and rely upon being applied to very large
datasets to give meaningful results (Figure 16.24).
DECISION SUPPORT SYSTEMS 401
The rule is tested in the database and produces two measures of how determin-
istic is the left-hand side over the right-hand side (Figure 16.25).
LHS ® RHS
Support: % of transactions containing all items in LHS and RHS of the
rule.
Confidence: The % of all of the LHS transactions that also contain all RHS
items.
Care must be taken when testing a rule. Association tests correlation not
causality. Even more caution should be expressed about transitive rules(Figure
16.26).
Unless there is strong support and confidence for the first two rules, they may be
dependent on non-overlapping transaction sets. A special promotion on motor
insurance would not lead to increased engine oil sales.
Classification and regression are perhaps the most popular techniques in data
mining at the moment. A predictive model is generated, based on historical data.
The model then predicts new cases. The development of the model is known as
data training. The data used for data training must therefore include a large number
of cases where conditions and outcomes are known (Table 16.7).
There are several techniques used in predictive operations. The most notable are
decision trees, Bayesian methods and neural networks. They all build their model
from the training data. These techniques are used for everything from call centre
operators deciding on your telephone loan application to currency speculators
coming to a judgement on the future value of the euro.
The entire preceding chapter serves to indicate the type of question that might be
answered through OLAP, the kind of systems architectures that might support
those question and the extensions to SQL that enable the queries. There is,
however, neither a precise definition nor a specification of a minimum OLAP envi-
ronment nor any universally accepted function or performance benchmark for
comparing vendor products. OLAP definitions are anything but exact.
Earlier chapters have indicated that accepted ideas in computing are either
founded on mathematical principles, like the relational model and the derived
language SQL, or they are defined through an authoritative standard such as the
Codasyl database or the many extensions to SQL included in SQL:1999.
DECISION SUPPORT SYSTEMS 403
Where neither of these conditions exists, the definition is largely in the hands of
the marketing departments of the major vendors. Naturally, such definitions vary
from vendor to vendor and are often skewed to favour a particular product. OLAP
definitions should be approached with some caution and scepticism.
E. F. Codd published a white paper (Codd and Associates, 1993) which is often
quoted as a definition of OLAP and a source of critical features for judging OLAP
products. Codd’s OLAP rules proved to be controversial due to being vendor-spon-
sored rather than mathematically based.
The OLAP white paper included 12 rules or tests for OLAP compliance. They were
followed by another six rules in 1995 and the rules were restructured into four
groups or features (Table 16.8).
The basic features start with the Multi-dimensional Conceptual View, which is,
perhaps, the most accepted criterion and involves modelling the data, not as two
dimensions as in the relational model, but as any number of dimensions as
required by the application. This leads to non-normalized data as it clusters associ-
ated data together.
Intuitive Data Manipulation would enable direct actions on cells in the view,
without recourse to menus or multiple actions. Codd gives no real rationale for this
and this feature adds little value to the evaluation process. Perhaps products
should give users the choice of how they approach the data.
In proposing Accessibility as a basic feature, the OLAP engine takes on the role
of middleware, sitting between heterogeneous data sources and the user front-end.
Most products can achieve this, but often with more data staging and batching
than vendors like to admit.
The Batch extraction feature requires that products offer their own staging data-
base for OLAP data as well as offering live access to external data. In effect, this is an
endorsement of multi-dimensional data staging plus partial pre-calculation of
large multi-dimensional databases and transparent access to underlying detail.
Today, this would be regarded as the definition of a hybrid OLAP, which is
becoming a popular architecture.
Required support for OLAP analysis models specifies that OLAP products should
support all four analysis models that were described in the white paper. These
models may be characterized as:
G parameterized static reporting
G slicing and dicing with drill down
G ‘what if?’ analysis
G goal seeking
Treatment of missing values specifies that all missing values are to be ignored
by the OLAP analyzer regardless of their source. This relates to the previous test and
is probably an almost inevitable consequence of how multi-dimensional engines
treat all data.
The reporting features, Codd’s third group, starts with a requirement for Flexible
reporting, ensuring that the dimensions can be laid out in any way that the user
requires in reports. Most products are capable of this in their formal report writers,
although this is not always the case for interactive viewers.
Uniform reporting performance requires no significantly response degrada-
tion resulting from increases in the number of dimensions or the size of the data-
base. Nowhere is it mentioned that the performance must be fast, merely that it be
consistent. Experience suggests that merely increasing the number of dimensions
or database size does not affect performance significantly in fully pre-calculated
databases. However, ad hoc reports usually take longer. Performance in good prod-
ucts is almost linearly dependent on the number of cells used to produce the
report. In fact, the principal factor that affects performance is the degree to which
the calculations are performed in advance and where live calculations are done.
Automatic adjustment of physical level requires that the OLAP system adjust
its physical schema automatically to adapt to the type of model, data volumes and
sparsity. Most vendors fall far short of this noble ideal.
The final group of features are concerned with dimension control. The Generic
dimensionality rule takes the purist view that each dimension must be equivalent
in both its structure and operational capabilities. The basic data structures,
formulae or reporting formats should not be biased towards any one dimension.
This has proven to be one of the most controversial of all the original 12 rules.
Technology-focused products largely tend to comply with it, so the vendors of
such products support it. Application-focused products usually make no effort to
comply, and their vendors bitterly attack the rule.
Complying with the Unlimited dimensions and aggregation levels rule is tech-
nically impossible. There is no such thing as an unlimited entity on a limited
computer. Few applications need more than about eight or ten dimensions and few
hierarchies have more than about six consolidation levels.
Unrestricted cross-dimensional operations requires that all forms of calcula-
tion must be allowed across all dimensions. Products that use relational storage are
often weak in this area. Other products with a true multi-dimensional database are
strong. These types of calculation are particularly relevant in applications that
analyze profitability.
vendor. The definition is short and easy to remember: Fast Analysis of Shared
Multi-dimensional Information (FASMI).
Fast means that the system is targeted to deliver most responses to users within
about five seconds, with the simplest analyses taking no more than one second and
very few taking more than 20 seconds. Research has shown that end-users are apt to
assume that a process has failed if results are not received with 30 seconds. Vendors
resort to a wide variety of techniques to achieve this goal, including specialized
forms of data storage, extensive pre-calculations and specific hardware
requirements.
Analysis means that the system can cope with any business logic and statistical
analysis that is relevant for the application and the user. This analysis may be done
with the vendor’s own tools or in a linked external product such as a spreadsheet.
Simply, all the required analysis functionality must be provided in an intuitive
manner for the target users. This could include specific features like time series
analysis, cost allocations, currency translation, goal seeking, ad hoc multi-dimen-
sional structural changes, non-procedural modelling, exception alerting, data
mining and other application-dependent features.
Shared means that the system implements all the security requirements for
confidentiality (possibly down to cell level) and, if multiple write access is needed,
concurrent update locking at an appropriate level. Not all applications need users
to write data back, but for the growing numbers that do, the system should be able
to handle multiple updates in a timely, secure manner. This is a major area of weak-
ness in many OLAP products, which tend to assume that all OLAP applications will
be read-only
Multi-dimensional is a key requirement. The system must provide a multi-
dimensional conceptual view of the data, including full support for hierarchies and
multiple hierarchies, as this is certainly the most logical way to analyze businesses
and organizations. There is no specific minimum number of dimensions that must
be handled. The underlying database technology is not specified, providing that
the user gets a truly multi-dimensional conceptual view.
Information is all of the data and derived information needed, wherever it is and
however much is relevant for the application. The capacity of products is measured
in terms of how much input data they can handle, not how many gigabytes they
take to store it. The capacities of the products differ greatly – the largest OLAP prod-
ucts can hold at least a thousand times as much data as the smallest.
16.7 Î EXERCISES
(i) Use the UFL database to create further examples of Oracle’s OLAP aggrega-
tions. Illustrate the underlying mechanisms through Explain Plan.
(ii) Search the Internet for reliable sources to survey current OLAP products and
compare their facilities.
CHAPTER 17
DATABASE PERFORMANCE
“Dear Mr. Churchill, my new play is opening in London and I am enclosing two tickets to the
opening night performance. One ticket is for you, and one ticket is for a friend, if you have
one.”
George Bernard Shaw
“Dear Mr. Shaw, I am very pleased that you sent me the two tickets to the opening night
performance of your new play in London, but unfortunately I have another commitment that
night and I won’t be able to be there, but I would be most pleased if you could send me two
tickets to the second night’s performance, if there is one.”
Winston Churchill
408
DATABASE PERFORMANCE 409
Response time is the sum of service time and wait time. The method of processing
jobs has a profound effect on response time. If jobs are queued for sequential
processing then the wait time for each job is the response time for all the preceding
jobs (Figure 17.1).
Wait time
Service time
Jobs
10
0 5 10 15 20 25
Time
If more resources are allocated, near parallel concurrent processing of jobs can be
achieved. Each independent task executes immediately using its own resources; no
wait time is involved (Figure 17.2).
Service time
Jobs
10
0 5 10 15
Time
database dispatcher will spend more and more time arbitrating queues, which will
add to wait time (Figure 17.3).
1000
0
0 5 10
Contention
Resources such as CPUs, memory, I/O capacity and network bandwidth are the key
to reducing service time. Adding resources makes higher throughput possible and
facilitates swifter response time (Table 17.1).
Capacity is not just about physical hardware components; the efficiency of the
O/S scheduler in maintaining job queues and the database dispatcher in balancing
thread contention can be an important influence. The ability to tune the sched-
uling algorithm or prioritize database jobs over background housekeeping can
improve performance without actually adding hardware (Figure 17.4).
Response time
100
50
0
0 5 10
Demand
Excessive demand gives rise to greatly increased response time and reduced
throughput. Linear increases in demand do not produce proportionate decreases in
performance. Often the relationship is exponential because more and more
resources are taken off the actual processing tasks to manage incoming jobs (Figure
17.4).
Systems administrators can directly affect demand, consumption and capacity.
These are the principal factors in any system of resource utilization. Wait time is
simply evidence of a system where these factors are poorly balanced (Table 17.2).
Decisions made during application development have the greatest effect on perfor-
mance. Once the application is deployed, the database administrator usually has
the primary responsibility for performance, but has limited capacity to change
fundamental factors.
When designing a system, performance targets should be set. If the application
does not meet that specification, the bottleneck should be identified (for example,
I/O contention), the cause determined, and corrective action taken. During devel-
opment, the application should be tested to determine whether it meets the design
performance goals before it is deployed.
Tuning is usually a series of trade-offs. Once bottlenecks have been identified,
other system resources may have to be sacrificed to achieve the desired results. For
example, if I/O is a problem, more memory or more disks may need to be
purchased. If a purchase is not possible, the concurrency of the system may have to
be limited, by rescheduling when certain user jobs may be submitted, to achieve
the desired performance.
With clearly defined performance goals, the decision on what resource to relin-
quish in exchange for improved performance is simpler.
At no time should achieving performance goals override the ability to
recover data. Performance is important, but data recovery and consistency are
critical.
With clearly defined performance goals, success depends on the functional objec-
tives established with the user community, the ability to measure objectively
whether the criteria are being met, and the capability to take corrective action to
overcome exceptions.
DBAs responsible for solving performance problems must remember all factors
that together affect response time. Sometimes what initially seems like the most
obvious source of a problem is actually not the problem at all.
Users might conclude that there is a problem with the database, whereas the
actual problem is with the network. A DBA must monitor the network, disk, CPU
and so on to identify the actual source of the problem rather than simply assume
that all performance problems stem from the database.
Figure 17.5 shows a series of tuning steps to be undertaken during the development
of an application. The steps are prioritized in order of diminishing returns: steps
with the greatest effect on performance appear first. For optimal results, therefore,
DATABASE PERFORMANCE 413
tuning issues should be resolved in the order listed: from the design and develop-
ment phases through instance tuning.
Tuning is an iterative process. Performance gains made in later steps may pave
the way for further improvements in earlier steps, so additional passes through the
tuning process may be useful.
414 SYSTEMS BUILDING WITH ORACLE
For optimal performance, business rules may have to be adapted. These concern
the high-level analysis and design of an entire system. Configuration issues are also
considered at this level, such as whether to use a single multi-threaded server
system-wide or a cluster of parallel servers.
Performance is directly affected by systems design and implementation, but also
by inappropriate business rules.
Designers sometimes provide far greater detail than is needed when they write
business functions for an application. They document an implementation, rather
than simply the function that must be performed. If managers effectively distil
business functions or requirements from the implementation, then designers have
more freedom when selecting an appropriate implementation.
Consider, for example, the business function of cheque printing. The actual
requirement is to pay money to people; the requirement is not necessarily to print
pieces of paper. It would be very difficult to print a million cheques per day, but it
would be relatively easy to record that many direct deposit payments and send
them over a network to the bank for processing.
Business rules should be consistent with realistic expectations for the number of
concurrent users, the transaction response time, and the projected number of
records stored online.
For example: a bank employs one loan advisor and one supervisor. There is a
business rule that the supervisor must approve loans of over £250. Upon investiga-
tion, it is found that there is a long queue of customers. More loan advisors are
employed, but the bottleneck moves to the supervisor’s function. There is now a
queue of loan advisors waiting for the supervisor’s approval as well as a queue of
customers waiting for the advisors.
A change to the business rule makes the system more scalable. If the supervisor
only needs to approve loans exceeding £1000, by changing the scale of the rule
more loans can now be dealt with directly by the advisors. Greater concurrency is
achieved. Only with the rule change does employing more advisors become
effective.
In the data design phase, the data needs of the application must be determined.
Identifying relations and their attributes is important. The information must be
structured to ensure database consistency, but, where possible, also to meet perfor-
mance goals.
The database design process undergoes a normalization stage when data is
analyzed to eliminate data redundancy. With the exception of primary keys, any
one data element should be stored only once in the database. After the data is
normalized, however, there may be a need to denormalize it for performance
reasons. It might be decided that the database should store frequently used
summary values. Rather than forcing an application to recalculate the total price of
DATABASE PERFORMANCE 415
all the lines in a given order each time it is accessed, it might be more efficient to
store the total value for each order in the database.
Another data design consideration is avoiding data contention. Consider a data-
base 1 TByte in size on which one thousand users access only 0.5% of the data. This
‘hot spot’ in the data could cause performance problems.
This primarily concerns index design, to ensure that the data is neither over- nor
under-indexed. In the data design stage the primary and foreign key indexes are
determined. In the logical structure design additional indexes may be created to
support the application.
Performance problems due to contention often involve inserts into the same
block or inappropriate use of sequence numbers. Particular care should be exer-
cised in the design, use and location of indexes, as well as in using the sequence
generator and clusters.
Be certain the application is taking full advantage of the SQL language and the
database features designed to enhance application processing. Understanding how
the DBMS is executing SQL statements in the application can suggest improve-
ments to those statements.
Enable query-optimizing modes consistent with performance needs. Regular
collection of full table statistics ensures that the optimizer makes informed deci-
sions but imposes maintenance overheads.
PL/SQL server-side programming with pre-compiled packages may be more effi-
cient than code kept in the application.
Transaction isolation modes, lock management and transaction options have
features and techniques that can promote or hinder throughput.
Ensure that there is efficient data access. For example, look at the structure of
indexes in the light of processing needs. If an application has a low hit rate with
416 SYSTEMS BUILDING WITH ORACLE
only one row from a large table taking part in each transaction, then a hash struc-
ture may be appropriate. High hit-rate processing, such as preparing a sorted report
from the whole table, would be better served by a B*-tree structure.
Ensuring efficient access may mean adding indexes or adding indexes for a
particular routine and then dropping them. It may also mean reanalyzing the
design after the database has been built.
If the query planner is consistently building a temporary index on a non-key
attribute, it may be advantageous to make this index permanent. This is particu-
larly so if the underlying table data is relatively static. If the table is subject to
regular updates or insertions, balance the costs of maintaining the extra index
against its advantages in query processing.
Disk I/O tends to reduce the performance of many applications. Most database
servers are designed so their performance need not be unduly limited by I/O (Table
17.3).
Never begin tuning without having first established clear objectives. You cannot
succeed without a definition of ‘success’.
‘Just make it go as fast as you can’ may sound like an objective, but it is very diffi-
cult to determine whether this has been achieved. It is even more difficult to tell
whether your results have met the underlying business requirements. A more
useful statement of objectives is: ‘We need to have as many as 20 operators, each
entering 20 orders per hour, and the packing lists must be produced 30 minutes
before the of the end of each shift’.
Keep the goals in mind as each tuning measure is considered. Estimate its perfor-
mance benefits in light of your goals.
Also remember that the goals may conflict. For example, to achieve best perfor-
mance for a specific SQL statement, it may be necessary to sacrifice the perfor-
mance of other SQL statements running concurrently on the database.
of the data. The test case should be measured with the trace facility and with timing
features.
If the trial tests a variety of changes, such as SQL rewrites, index restructuring or
memory reallocations, then be sure to record the effect of each change by reverting
to the original scenario and applying one change at a time. Then test the changes
in combination. Finally, the trial should be checked for scalability by running
against increasing proportions of the full data.
Keep records of the effect of each change by incorporating record keeping into the
test script. Automated testing with scripts provides a number of advantages.
Cost effectiveness is improved in terms of the ability to conduct multiple trials
quickly. It helps ensure that tests are conducted in the same systematic way, using
the same instrumentation for each hypothesis being tested. Carefully check test
results derived from observations of system performance against the objective data
before accepting them.
One of the great advantages of having targets for tuning is that it becomes possible
to define success. Past a certain point, it is no longer cost-effective to continue
tuning a system. Although there may be confidence that performance targets have
been met, this must nonetheless be demonstrated to two communities: the users
affected by the problem and those responsible for the application’s success.
Oracle provides a number of diagnostic tools, which can be used to track database
operations both during the design test phases and into production.
Explain plan is a SQL statement listing the access plan selected by the query
optimizer. The plan is output via a PL/SQL package and shows an overview of the
chosen optimized strategy for executing a submitted SQL statement. When explain
plan is used, the statement does not proceed to execution.
SQL Trace records SQL statements issued by a connected process and the
resources used in executing these statements. It collects all SQL events and Wait
events. SQL events include a complete breakdown of SQL statement activity, such
as the parse, execute and fetch operations. Data collected for server events include
resource usage metrics such as I/O and CPU consumed by a specific event.
DATABASE PERFORMANCE 419
Used together, a specific SQL statement can be analyzed by explain plan and
Oracle Trace. First, the plan reveals the major database operations to be undertaken
during the execution phase. These include full table scans, sorting, merge and hash
joins. Restriction predicates are linked to the operations as appropriate. If the
participating tables have had full histogram statistics collected, then costing in
terms of disk I/O and CPU usage can be estimated.
Secondly, the statement can be run with Oracle Trace enabled. Measurements of
actual resource utilization during each phase of execution can be taken and
compared with the plan estimates. SQL statements should be selected for tuning
based on their actual resource consumption. Revised statements or revised strate-
gies such as using pre-parsed queries and materialized views (denormalization) can
then be objectively compared on their new plans and actual resource costs.
This tool provides a common user interface from which administrative tasks and
diagnostic applications can be run. Information about the configuration and other
management data is itself kept in the database.
Enterprise Manager has access to several diagnostic, testing and advisory packs
for investigating various aspects of the system. Recommended changes to the
configuration can then be put into effect through its administrative tools (Table
17.4).
Most benchmark data should be viewed with healthy scepticism when the purpose
is product selection. In any event, performance is only one of many factors influ-
encing an evaluation. Considerations such as the availability of trained DBAs, the
vendor’s technical support and total cost of ownership are also important determi-
nants.
However, a benchmark constitutes a comprehensive and repeatable test suite
that can be applied against a developing database implementation. Adapting
benchmark methods to test the implementation as it is tuned can give a reliable
estimation of the relative benefits of each change in hardware, operating system
and database configuration.
A frequent criticism of benchmarking is that vendors tune their product to
perform well in the test rather than produce generally reproducible performance
enhancements.
Pre-loading the data into buffers and storing the SQL by pre-parsing and pre-
computing the execution plans are two popular methods for saving disk I/O and
CPU time. Database storage itself can also be optimized for the test by pre-joining
selection domains. Oracle has materialized views that can store the results of a
multiple table join, allowing fast data access for the benchmark queries but
imposing an overhead on normal table maintenance outside of the test.
Perhaps the biggest issue is the different hardware platforms on which databases
run. Comparing two different products running on two different platforms clearly
may not permit individual performance factors to be identified. The experiment may
lose its objectivity when database and host are a closely tuned and optimized pair.
Benchmark data is cited by virtually every database vendor to claim that its
product is the fastest, has the lowest response time or has the greatest throughput.
To avoid vendors choosing a test skewed towards their product’s characteristics,
the Transaction Processing Performance Council (TPC)1 was created to oversee
uniform benchmark tests.
TPC-H and TCP-R are decision support benchmarks. They consist of a suite of busi-
ness-oriented queries and concurrent data updates to a range of standard sized
datasets. These benchmarks provide a simulation of decision support systems that
422 SYSTEMS BUILDING WITH ORACLE
are applied against large volumes of data and execute queries with a high degree of
complexity.
TPC-H applies ad hoc queries for which the test platform may not be pre-opti-
mized. TPC-R allows additional optimizations based on advance knowledge of the
queries.
Both benchmarks report a measurement called the Composite Query-per-Hour
Performance Metric that reflects multiple aspects of the test system’s capability to
process queries. These aspects include the selected database size against which the
queries are executed, the query processing power when queries are submitted in a
single stream, and the query throughput when queries are submitted by multiple
concurrent users.
In order to illustrate the use of explain plan and Oracle Trace, a simple schema
consisting of three tables has been devised. The schema represents an order
processing system and has tables for customers, orders and lineitems. A customer
may make many orders; each order consists of one or more lineitems.
An SQL batch file and a PL/SQL package, makeCustomers, are available on the web
site associated with this book. The batch file creates the tables and the package
populates the tables with random but consistent data. It allocates primary keys
from sequences and ensures foreign key compliance (Figure 17.6). Running the
packages creates 50,000 customers, 200,000 orders and 2,000,000 lineitems.
100,000 rows
Using large tables makes the results more statistically significant. A single query
is needed to illustrate how explain plan can help in tuning operations. Figure 17.7
shows the query. It is a three-table join where each of the tables is quite large.
Further, the group by clause will cause sort operations. This query might be used to
create a monthly statement for customers showing their pending or unpaid orders.
Each customer might be working on a different 30-day account period, so this
@C:\oracle\ora92\rdbms\admin\utlxpls.sql
query would be run at least once every working day. For the sake of this example,
the query is restricted to just one customer.
The query is analyzed using the explain plan syntax and then formatted output is
obtained from plan_table using the Oracle-supplied utlxpls.sql batch file. This
procedure is repeated as schema changes are made to affect the performance of an
unchanging query.
The first plan, shown in Figure 17.8, indicates a very high cost for the query. The
total is 4491 units. By examining the plan in detail, the root causes can be identi-
fied as the full table scans on lineitems and orders in operations 5 and 6. This leads to
high-cost joins, sorts and projects in operations 0, 1 and 2 because the intermediate
relations are unrestricted. Further, the size of the buffer required to perform the full
table access of lineitems is 51 MByte. This will cause considerable buffer contention
with other transactions in a multi-user production database.
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 840 | 1092|
| 1 | SORT GROUP BY | | 20 | 840 | 1092|
|* 2 | HASH JOIN | | 20 | 840 | 1090|
| 3 | NESTED LOOPS | | 5 | 75 | 159|
|* 4 | INDEX UNIQUE SCAN| SYS_C003748 | 1 | 5 | 1|
|* 5 | TABLE ACCESS FULL| ORDERS | 1 | 10 | 158|
| 6 | TABLE ACCESS FULL | LINEITEMS | 2000K| 51M | 899|
--------------------------------------------------------------------
2 - access("B"."ORDER_NO"="C"."ORDER_NO")
4 - access("A"."CUST_ID"=20)
5 - filter("B"."CUST_ID"=20)
When tuning a query, it is important to make only one change at a time. When-
ever full table access appears in the plan prior to that table participating in a join,
consider adding an index to the base table.
This is not an index based on a primary key to ensure relational integrity.
Lineitems does not have a primary key. The cost of operation 6, the full table access
of lineitems, can be reduced by indexing on the order_no attribute. This index will
DATABASE PERFORMANCE 425
have duplicate keys since there are several line items for each order. This is an
index created for performance reasons.
Figure 17.9 shows the SQL command to create this index. The index key is the
order_no attribute of the table lineitems. The index name is li_ndx. Once the index is
created, the database will automatically maintain it whenever rows are changed in
the base table. This will add a small cost to those changes, but the improvement in
the test query should compensate for this cost.
---------------------------------------------------------------------
| Id | Operation | Name | Rows|Bytes| Cost |
---------------------------------------------------------------------
| 0|SELECT STATEMENT | | 20 | 840 | 171 |
| 1| SORT GROUP BY | | 20 | 840 | 171 |
| 2| TABLE ACCESS BY INDEX ROWID | LINEITEMS |2000K| 51M| 2 |
| 3| NESTED LOOPS | | 20 | 840 | 169 |
| 4| NESTED LOOPS | | 5 | 75 | 159 |
|* 5| INDEX UNIQUE SCAN | SYS_C003748| 1 | 5 | 1 |
|* 6| TABLE ACCESS FULL | ORDERS | 5 | 50 | 158 |
|* 7| INDEX RANGE SCAN | LI_NDX | 4 | | 1 |
---------------------------------------------------------------------
5 - access("A"."CUST_ID"=20)
6 - filter("B"."CUST_ID"=20)
7 - access("B"."ORDER_NO"="C"."ORDER_NO")
Once the index has been created the explain plan command is re-run for the test
query and the utlxpls utility used to format the plan output. Figure 17.9 shows a
dramatic reduction in cost. Just over 81% of the costs of the first test have been
wiped out by adding the index. There is an additional operation in this plan, the
index range scan of li_ndx.
Because statistics have been collected for the underlying table, the optimizer can
estimate that only a few rows will need to be scanned. Operation 7 has a corre-
sponding low cost. This operation retrieves order_no and rowid from the index,
allowing a reduced-cost join with the orders table later.
426 SYSTEMS BUILDING WITH ORACLE
This plan still has a full table access on the orders table that can be made more
efficient by adding an index on cust_id. This index will allow the optimizer to make
a better estimate of the number of rows from orders that will participate in the join
with customers and enable a range index scan of orders to produce a much smaller
intermediate relation for that join.
Adding the index and re-running the explain plan command and the utlxpls
utility results in the output shown in Figure 17.10. Again, the cost reduction in the
estimated plan is spectacular. The second index has resulted in a cumulative saving
of almost 99% of the same query optimized under the original conditions.
---------------------------------------------------------------------
| Id| Operation | Name | Rows|Bytes|Cost|
---------------------------------------------------------------------
| 0|SELECT STATEMENT | | 20 | 840| 15|
| 1| SORT GROUP BY | | 20 | 840| 15|
| 2| TABLE ACCESS BY INDEX ROWID |LINEITEMS |2000K| 51M| 2|
| 3| NESTED LOOPS | | 20 | 840| 13|
| 4| NESTED LOOPS | | 5 | 75| 3|
|* 5| INDEX UNIQUE SCAN |SYS_C003748| 1 | 5| 1|
| 6| TABLE ACCESS BY INDEX ROWID |ORDERS | 5 | 50| 2|
|* 7| INDEX RANGE SCAN |ORD_NDX | 5 | | 1|
|* 8| INDEX RANGE SCAN |LI_NDX | 4 | | 1|
---------------------------------------------------------------------
5 - access("A"."CUST_ID"=20)
7 - access("B"."CUST_ID"=20)
8 - access("B"."ORDER_NO"="C"."ORDER_NO")
Notice that operation 7, the index range scan of ord_ndx, has been restricted with
the same predicate as the customers table. This has been automatically added by
the optimizer to ensure the minimum size possible for the intermediate relations
that will participate in the later join.
DATABASE PERFORMANCE 427
Remember, these results are estimates from the optimizer based on statistics.
Only by running the query, adding the indexes one by one and monitoring actual
resource utilization using the trace facility can the actual savings be proven.
Figure 17.11 shows a simple experiment with SQL Trace. First, the two indexes,
li_ndx and ord_ndx are dropped and then the trace facility is enabled. The query is
then executed three times (Table 17.5), building an index after each run. Disabling
the trace facility stops the recording and closes the trace output file. The trace data
is recorded in a file in a sub-folder of the instance folder. This default can be
changed by a DBA respecifying the value of USER_DUMP_DEST.
The data is prepared for presentation by the separate executable tkprof. This
should run from an OS prompt and takes three parameters: the input file
containing the trace data, the sorted and formatted output file and an option speci-
fying no aggregation of data for multiple runs of the same SQL statement.
The, output file (Figure 17.12) can be viewed with any editor or word processor.
Each SQL statement has a set of data showing resource utilization during its three
phases: parse, execute and fetch. Each of the columns in the tkprof output repre-
sents a different resource used by the SQL statement. Count shows the number of
times it was parsed, executed or fetched. Notice that after the statement is first
issued, very little resource is used for the parse or execute stages.
428 SYSTEMS BUILDING WITH ORACLE
The count column shows that full parse and full execute for the statement was
done only once. Only the fetch phase is performed three times. The database
retains the statement in buffers and the syntactic, semantic and security checks are
not repeated for multiple runs of the same statement. There is minimal CPU
activity associated with the revisions to the execution plan caused by adding the
indexes.
This experiment shows the differences in the disk I/O as indexes are added
between runs. It confirms the optimizer’s estimates and shows how much benefit
can be gained by tuning the database structure.
For a fully objective test of all resource utilization, it would be necessary to log
out and close the instance between runs so that buffered data did not moderate the
results for the second and third repeat queries.
To make similar further efficiency gains in the query it will be necessary to
denormalize. Creating a materialized view of the query will enable a new perma-
nent structure in the database (Figure 17.13).
DATABASE PERFORMANCE 429
This will consist of the three table pre-joined for all customers. It can be config-
ured so that changes in the underlying tables such as insertions or deletions are
automatically reflected in the view. The materialized view can be analyzed to
compute statistics in preparation for query optimization. The original query is now
reduced to a simple restriction on the view with a resource utilization of only
0.42% of that under the original conditions.
---------------------------------------------------------------------
| Id |Operation | Name |Rows|Bytes|Cost|
---------------------------------------------------------------------
| 0|SELECT STATEMENT | | 20| 640 | 8 |
| 1|TABLE ACCESS BY INDEX ROWID |MV_CUSTORDERLINES| 20| 640 | 8 |
|* 2| INDEX RANGE SCAN | MV_NDX | 20| | 3 |
---------------------------------------------------------------------
2 - access("MV_CUSTORDERLINES"."CUST"=20)
The penalty for this improvement is the long query to produce the view and the
daily re-establishment of the view to take in any changes to the data. Since this can
be scheduled as an overnight job, there will be no real effect on daytime perfor-
mance. The improvement has been won at very little cost. If the materialized view
must be kept in tighter synchronization then a trigger fired by updates to any of the
base tables could be created (Figure 17.14).
17.8 Î EXERCISES
WEB INTERFACES
431
432 SYSTEMS BUILDING WITH ORACLE
There are three major factors that preclude the use of standalone applications in
this context. Firstly, the user interface must be made available through web
browsers that do not offer anything more than the ability to interpret code
received from a URL. Secondly, there is no control over the run-time configuration
of the client computer. Thirdly, the client computer may have any combination of
different CPUs and operating systems. All that is common is that the client is
running a web browser.
This means that everything necessary for the successful run of an application
must be delivered to the browser from the web server and this package must
contain all that is necessary to provide a user interface, access to the business logic
of the system and facilitate a connection to the database server.
There are several ways to achieve this. They involve a number of different system
architectures where components of the overall application may reside and there is
a choice of development languages.
A detailed treatment of these development choices is beyond the scope of this
book. However, a brief review of some of the principal methods for deploying
context-free browser-to-database applications is presented. The source code and
JBuilder project files for the case studies in this chapter may be downloaded from
the web site associated with this book.
The simplest way for the browser to display a user interface and to connect it to a
database server is the two-tier architecture. The example application in Chapter
14 uses a two-tier architecture. The client program provides both the user interface
and the business logic and connects directly to the database server. Portability
issues concerning the diversity of browsers and client machines mean that Java is
overwhelmingly the language of choice for a web-based two-tier application.
Nearly all web browsers have a Java Virtual Machine (JVM) that gives an environ-
ment for the applet classes to run. Normally this JVM has access to the standard
classes that support fundamental user interface components such as those found in
the java.awt package. If the applet uses any other support classes, these must be
downloaded from the server (Figure 18.1). This would apply, for example to the
JDBC classes for a type-4 driver to give connection facilities to an Oracle database.
The code for the application in Chapter 14 would be capable of adaptation to an
applet. It should be born in mind, however, that it uses a number of GUI support
classes that are specific to the JBuilder environment. The deployment configura-
tion would need to take this into account. The necessary classes must be down-
loaded from the web server along with the applet code. This would place a large
Internet traffic overhead on its operation.
WEB INTERFACES 433
There are also a small number of minor changes that would need to be made.
These are related to the way that downloaded Java code runs in a browser’s JVM.
Firstly, a browser cannot run code that is designed to run as a standalone applica-
tion class. It can only run a Java applet. An applet differs from an application in
that its principal class extends the Applet class and it does not have a main method.
When a Java application class is loaded into the JVM, control is passed to the main
method. This usually has only one line of code: a call to the init method that in
turn calls the constructor of the class.
A Java applet has only the init method and control is passed to this method when
an applet is instantiated in a browser’s JVM. This restriction is part of the security
regime of the browser’s JVM. It disallows certain functionality in an environment
where code may be downloaded from a web site whose authors may not have
totally benign intentions for the user.
Another part of this same security restriction is that the applet may only make a
network connection back to the web host from which it was loaded. This means
that, for a two-tier architecture, the web server software and the Oracle server must
be on the same machine.
The code in the applet can be minimized and simplified by making it responsible
only for the user interface and removing the business logic to PL/SQL stored func-
tions and procedures that can kept in the database itself. The applet will then use
JDBC calls to these stored programs, which will execute standard SQL queries or
updates internally. The results of such queries can then be passed back to the applet
for display in the browser window.
There are considerable security drawbacks in using a two-tier architecture to
underpin a web interface. The most important is that the database connect infor-
mation may have to be embedded in the applet. The tokenized code produced by
the Java compiler is capable of being reverse engineered to discover this informa-
tion. Malicious attackers can use this information to write applets that gain access
to the database, albeit at a low level of permission, outside of the direct control of
the original system’s architect.
434 SYSTEMS BUILDING WITH ORACLE
<html>
<head>
<title>Systems Building - JDBC Applet</title>
</head>
<body>
<h1>Systems Building - jdbcApplet</h1>
This page contains an example of an applet that uses the Thin JDBC
driver to connect to Oracle and query the players table.
The source code for the applet is in jdbcApplet.java on the <i>Systems
Building</i> web site.<p>
<hr>
<applet codebase ="." archive="classes12.zip"
code="jdbcApplet.class " width=400 height=380>
</applet>
</body>
</html>
Oracle databases through the OracleDriver class. This is not part of the standard
browser configuration, so the codepage attribute within the applet tag indicates
where such support classes may be found.
In this case, the codebase is the same directory in the web domain from which the
web page itself was loaded. The archive attribute within the same tag specifies a
particular target file where the required classes are to be found and indicates
implicitly to the browser that it is a zip file. The classes12.zip file is available for
download from the Oracle web site.
The applet tag then has a code attribute that contains the name of the applet class
and finally sets up an area in the displayed page of 400 pixels by 380 pixels where
the applet will run. The code for jdbcApplet is available for download from the web
site associated with this book. The user interface consists of simple components
from the java.awt package, selected for maximum portability.
The applet uses panels to divide its display into managed areas (Figure 18.3).
These panels can have properties set for background and foreground colours; their
positions are regulated by the layout manager of the component that contains
them. The applet has a BorderLayout manager, which uses the properties North,
South, Center, East and West to determine position.
applet
panel 2
panel 1
panel 3
panel 4
panel 5
Three panels are added directly to the applet in positions North, Center and South.
Two further panels are added to the central panel in position North and Center rela-
tive to their container. The sizes of these panel containers will be adjusted by their
layout managers according to sizes of the components they contain and the space
available.
Panel 2 contains a label component that is assigned the text JDBC Applet1 Show
Players. Panel 3 has a label with the text Select a Player and a Choice component
436 SYSTEMS BUILDING WITH ORACLE
(basic combobox) into which will be loaded the ufa_ids of all the rows in the
players table of the UFL database.
Panel 4 has a gridlayout set to divide the panel into cells, two columns wide and
six rows wide. This grid provides a convenient way to position the labels and
textfields that will display the individual details of players selected by ufa_id in the
selectPlayer component.
Any movement or click of the mouse will generate a Java event that can be
detected by the program. In this study, the principal interest is in detecting a
change in the selected item of the combobox, selectPlayer. Figure 18.4 shows a code
...
selectPlayer.addItemListener(new java.awt.event.ItemListener() {
public void itemStateChanged(ItemEvent e) {
selectPlayer_itemStateChanged(e);
}
});
...
...
try{
selectPlayer.removeAll();
if (conn == null) {
DriverManager.registerDriver(new
oracle.jdbc.driver.OracleDriver());
conn=DriverManager.getConnection
("jdbc:oracle:thin:@localhost:1521:scotoracle","a27ws","orange");
}
Statement stmt = conn.createStatement ();
ResultSet rset = stmt.executeQuery ("select ufa_id from players");
while (rset.next ())
selectPlayer.addItem (rset.getString(1));
Status.setText("Status: Select a Player");
getPlayer();
}
catch (Exception e) {
System.out.println(e.getMessage());
}
void getPlayer(){
try {
PreparedStatement stmt1 = conn.prepareStatement("select * from
players where ufa_id=?");
stmt1.setString(1,selectPlayer.getSelectedItem());
ResultSet rs = stmt1.executeQuery ();
if (rs.next ()){
Forename.setText(rs.getString(3));
Surname.setText(rs.getString(2));
Nationality.setText(rs.getString(7));
Squad_no.setText(rs.getString(5));
Position.setText(rs.getString(6));
Club.setText(rs.getString(8));
Status.setText("Status: Player Found");
}
}
catch (Exception f) {
System.out.println(f.getMessage());
}
}
void selectPlayer_itemStateChanged(ItemEvent e) {
getPlayer();
}
Figure 18.7
When the selected item changes, the listener ensures that control is passed to the
selectPlayer_itemStateChanged method. This does nothing more than call another
method getPlayer where the database query is made. A code fragment for this is
shown in Figure 18.7.
Stmt1 is a PreparedStatement, initialized with a query string containing a ? place-
holder. The placeholder is replaced by the stmt1.setString method, taking the actual
selected item from selectPlayer. The statement is then executed and the returned
single row assigned to the ResultSet. The various fields of the returned query are
then assigned to the appropriate TextFields in the applet.
Every time that a new selection is made in the combobox, the action method will
ensure that the player’s details are shown in the textfields. Figure 18.8 shows the
applet display after selectPlayer’s selection has changed. The details of player
AK675280 are shown.
WEB INTERFACES 439
The important limitations of the two-tier architecture, as far as Java applets are
concerned, are the requirements that the applet can only establish a direct session
with an Oracle server on the same machine as the web server and the need for the
associated classes to be downloaded with the applet code.
Very often, this might place unnecessary stress on the host machine, where the
number of web clients and database clients seeking connections might be
numbered in the thousands at any particular time. In any event, the client–server
paradigm is meant to offer increased performance through division of the work
amongst relatively cheap but increasingly powerful server hosts.
The security restrictions on the Java applet are no bad thing, but most of them
are directed at protecting an unwitting client from downloading a malicious
applet. There is still scope for attack from those who might seek to damage or
undermine the service provision.
One solution is to place some kind of intermediary on the web host. This could be an
RMI server that would assume responsibility for all the JDBC logic and, perhaps,
440 SYSTEMS BUILDING WITH ORACLE
other application logic. The web-page applet is thus relieved of the requirement to
connect directly to the database and need only be responsible for the presentation
logic – the user interface. The intermediate RMI server has the business requirements
embedded in its code in the form of standard and authenticated connect informa-
tion, SQL queries or PL/SQL calls.
The deployment configuration problem is simplified since only the RMI server
needs access to the JDBC classes. There is therefore no longer any need to download
the JDBC classes to the remote web browser. Further, the code running in the server
host’s JVM is not available for inspection by third parties.
Security is enhanced since the Oracle server is no longer being contacted directly
by thousands of unknown users running the applet in their browsers. The applet
will be smaller since it has less functionality and will be quicker to download with
the web page.
This solution is still complex. It requires additional compilation steps to
generate the RMI stub class. Ideally, the Java server program should also be multi-
threaded to enable it to respond to more than one concurrent client request. This
approach offers great flexibility, since the Java programs – the applet and the server
– can be engineered precisely to fulfil the business requirements. This flexibility is
paid for in terms of the need for software development time and expertise.
Remote Method Invocation (Figure 18.9) is a Java standard and is specified
through a number of standard classes that underpin the communications and data
Server host
Oracle server
marshalling necessary for a client to make a call to the server. The services are
defined in a remote interface. The server implements this interface. It directly
implements the service methods in its code body. It indirectly implements the
remote nature of the interface by inheritance from one of the standard RMI classes.
Converting the two-tier example into a three-tier client–server system requires that
the client applet be divided into two. The new client applet will have responsibility
only for the user interface and for detecting user interaction. Figure 18.10 lists the
project files required.
The JDBC code is taken out and forms the core of a new Java program, the RMI
server. The RMI server class contains two functions that can be called by the client.
These provide a service firstly to populate the combobox component in the client
with the ufa_ids of players in the database and secondly to provide the detail of a
player when his ufa_id is selected in the combobox.
These functions are wrapped in a server class that inherits from the
UnicastRemoteObject class. This latter is a standard Java class that enables the
networking connections on the server side.
To preserve type safety and to ensure that the server actually provides the func-
tions that the client will call, a third entity is required in an RMI application. This is
the interface class. The interface class simply consists of the signatures of the remote
methods. When the client class is compiled, the compiler checks that the remote
method calls conform to these signatures. When the server class is compiled, the
compiler checks that these methods are actually implemented in the server.
Figure 18.11 show the code for the interface queryPlayer. The class contains two
methods: getUfa_IDs and getPlayer. The various class imports provide type defini-
tions for the code used in the interface. The queryPlayer class inherits from the
Remote class. This information will enable the compiler to check that the client
calls the methods through a specially obtained remote object reference and not
through a local reference. The server will also be checked to ensure that it imple-
ments a class compatible with the Remote class.
Both getUfa_IDs and getPlayer return a Vector type and may, in case of any
networking or database malfunction or error, raise a RemoteException. Exceptions
provide a safe way of dealing with error conditions. Methods that raise exceptions
442 SYSTEMS BUILDING WITH ORACLE
package rmiapp;
import java.rmi.Remote;
import java.rmi.RemoteException;
import java.util.*;
public interface queryPlayers extends Remote {
Vector getUfa_IDs() throws RemoteException;
Vector getPlayer(String p_id)throws RemoteException;
}
...
public class queryPlayersImpl extends UnicastRemoteObject implements
queryPlayers{...
In JBuilder, the server’s build property can be set to generate the source code for
the stub class queryPlayersImpl_Stub.java. This simply uses the rmic (rmi compiler)
program, part of the JDK installation. The rmic program can be called from a
command window if JBuilder is not in use. The standard Java compiler, javac, can
the be applied to the other source files.
Java programs do not run directly on the CPU as operating systems processes and
therefore cannot make direct network connections. The JVM in which the
tokenized compiled server program is interpreted is a process and can make
network connections. However in a multi-user or a multi-server host there may be a
problem in identifying the particular JVM in which a requested class resides.
A special program, the rmiregistry, runs as a process to resolve this difficulty.
When the server class, queryPlayersImpl, is launched in a JVM, it first registers its
presence with the rmiregistry and also registers the location of its stub class. It then
enters a listening state, waiting for a client request
Figure 18.13 shows the main method from the queryPlayersImpl class. An object,
server, is instantiated using the constructor for the queryPlayersImpl class. The
Naming.rebind method registers the object reference for server with the rmiregistry
under the name //localhost/queryPlayersServer, the same name that the client
method, Naming.lookup, will use later in its search. The location of the stub class is
also registered. The server object then waits, listening for a client request.
...
public static void main(String args[]) {
try {
queryPlayersImpl server = new queryPlayersImpl();
System.out.println("Registering server
//localhost/queryPlayersServer");
//Bind this object instance to the name "queryPlayersServer"
Naming.rebind("//localhost/queryPlayersServer", server);
System.out.println("queryPlayersServer bound in registry"); }
catch (Exception e) {
System.out.println("queryPlayersImpl err: " + e.getMessage());
e.printStackTrace();
}
}
The Naming.lookup method, when invoked in the client, first contacts the
rmiregistry on the server host and obtains a remote object reference for the server
and a location for the server’s stub class file. The stub class is downloaded to run co-
resident in the client’s JVM. The stub acts as a local proxy for the remote server.
From now on, client requests, using the remote object reference, will be passed to
444 SYSTEMS BUILDING WITH ORACLE
the local stub; these will be passed on to the remote stub and thence to the server
itself.
Figure 18.14 shows a code fragment from the rmiappClient class. The server
attribute is an instance of the queryPlayers class and therefore, by inheritance,
Remote. The local Vector object, v, is assigned the return value of server.getUfa_IDs().
If the call is successful, v has an element for each of the ufa_ids found in the data-
base. Mapping the elements of v to an Enumeration is one way of getting at an
unknown (at run-time) number of elements in the vector. A while loop assigns each
element of the Enumeration to the item list of the selectPlayer component.
User interaction is dealt with in the same way as in the two-tier example. A
listener is installed in the selectPlayer component. When the listener detects that
the selected item has changed, control is passed to the action method installed in
the listener.
The method selectPlayer_itemStateChanged (Figure 18.15) illustrates an alterna-
tive way if dealing with a vector when the number of elements is known. The
vector’s get() method is indexed as necessary for each of the display elements. The
number of elements is known because server.getPlayer returns a vector with six
elements, one for each of the attributes of the player whose ufa_id was passed as the
parameter
void selectPlayer_itemStateChanged(ItemEvent f) {
try {
Vector v=server.getPlayer(selectPlayer.getSelectedItem());
Enumeration e = v.elements ();
Surname.setText(e.nextElement().toString());
Forename.setText(e.nextElement().toString());
Nationality.setText(e.nextElement().toString());
Squad_no.setText(e.nextElement().toString());
Position.setText(e.nextElement().toString());
Club.setText(e.nextElement().toString());
Status.setText("Status: Player found");
}
catch(Exception g){ }
}
Once the server is running in its JVM and has registered with rmiregistry, the
client can be launched.
Figure 18.16 shows the client running in JBuilder’s applet viewer. The combobox
has been loaded and a selection is about to be made.
Figure 18.17 shows the applet after the selection. The player details have been
found and synchronized with the combobox.
Table 18.1 summarizes the stages in RMI development.
The RMI client, running as an applet in a browser can only contact the RMI server
if it is running on the same machine as the web server from which the page was
loaded. The server, because it is running as an application, is not subject to applet
security restrictions, and can connect to any host as necessary.
446 SYSTEMS BUILDING WITH ORACLE
Java servlets have been described as the server-side equivalent of applets. That is to
say that they do not run independently as standalone programs. They rely on
running inside a servlet engine, usually associated with the web server. Because
they run on the server side, they do not themselves have a graphical user interface.
Servlets are written in Java: they are not protocol- or platform-dependent. A
servlet need not necessarily be associated with a web server but, since web applica-
tions development is the theme of this chapter, it that aspect of their use that will
be the focus of this section.
Quite early on in the development of web technology, it was found that there
was a need for independent server programs to run on the web host. Their principal
purpose was to respond to requests via the web server for dynamically generated
HTML code that could be tuned to user demand. If the HTML code can be generated
on the fly, then ad hoc requests could be possible. It would be impossible to create
static web pages that answered all the various combinations of user request. The
first response was the Common Gateway Interface (CGI). This allowed the web
server to call a separate program, passing user supplied parameters to it. The
program would respond, embedding its response in HTML tags which would be
passed on the user as a dynamic web page.
The most popular CGI mechanism was Perl, an interpreted programming
language, but C++, Visual Basic or any other language could be used. The main
problems were performance and portability. Each request might mean another
copy of the executable or the interpreter being spawned in the memory of the web
448 SYSTEMS BUILDING WITH ORACLE
server. Further, the executable had to be capable of running on the server host and
would need to be recompiled to move to another architecture; if there was a
compiler or interpreter for that CPU.
Writing a servlet in Java solves the portability problem and because the multi-
threaded servlet runs on the web server, one executable can service millions of
requests without any need for host spawning or applet downloading.
A client program, usually a browser, accesses the web server and loads a page that
makes a request. The request is then processed by the web server’s servlet engine,
which passes it on to the servlet. The servlet then sends a response through the web
server back to the client. One critical advantage for servlet technology is speed.
Unlike CGI programs, servlets are loaded into memory once and run from memory
after the initial load. Servlets are spawned as a thread, and are by nature multi-
threaded. Since they are based on the Java language, they are platform-
independent.
This project will consist of a relatively simple web page that will call a servlet. The
servlet will make use of a datamodule class to connect to Oracle and retrieve the
contents of the players table in a dataset. The servlet then passes the players data,
formatted in HTML, back to the web page
Using JBuilder is convenient because all the configuration data will be organized
automatically, and the Tomcat server, which comes with the Enterprise Edition,
will host the web page, run the servlet and instantiate the datamodule. Create a
new project in JBuilder; give it the name SBServlet. Make sure the directory names
are correct and press OK.
Create a webapp by selecting File |New and the Web tab. Name it SBServletwebapp
and click OK. The webapp is the configuration tool that will ensure that the Tomcat
server has all the necessary resources deployed to its domain.
Create a new servlet. Name it SBServlet. Check the Standard servlet box in the
dialog. Click on Next. Check the doGet and doPost boxes and the Generate SHTML
file box. Click on Finish. JBuilder creates a simple SHTML file in the webapp and the
skeleton of the servlet code
Amend the code for the SBServlet.shtml file as shown in Figure 18.18. The code
creates a simple form area with a submit button in it. When the button is pressed in
the rendered web page, a post is performed. This is passed by the server to the
servlet’s doPost() method. The servlet tags below will perform a get as the page is
loaded. This will be passed to the servlet’s doGet() method. The rest of the HTML is
standard.
Figure 18.19 shows the web page in the design view. The two engraved lines
across the page are caused by the <hr> tags. The results of the get will be shown here
because the tags enclose the implied get call to the servlet in the <servlet> tags.
The code for the servlet is shown in Figure 18.20. Amend the generated code so
that it looks like this. Alternatively, the file can be downloaded from this book’s
web site.
WEB INTERFACES 449
<html>
<head>
<title>
SBServlet
</title>
</head>
<body bgcolor="#c0c0c0">
package sbservlet;
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;
import java.sql.*;
import com.borland.dx.dataset.*;
import com.borland.dx.sql.dataset.*;
import com.borland.datastore.*;
public class SBServlet extends HttpServlet {
static final private String CONTENT_TYPE = "text/html ";
DataModule1 dm = sbservlet.DataModule1.getDataModule();
public void init() throws ServletException {}
public void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
response.setContentType(CONTENT_TYPE);
PrintWriter out = response.getWriter();
out.println("<H2>The servlet has received a GET. This is the
reply. Press Submit for a list of Players</H2>");
}
public void doPost(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
response.setContentType(CONTENT_TYPE);
PrintWriter out = response.getWriter();
out.println("<html>");
out.println("<head><title>SBServlet</title></head>");
out.println("<H2 bgcolor=\"#c0c0c0\">");
out.println("<H2>The servlet has received a POST. This is the
reply.</H2>");
Column [] columns = dm.getQueryDataSet1().getColumns();
out.println ("<table border = 1><tr>");
for (int i=1; i < columns.length; i++) {
out.print("<th>" + columns [i].getCaption() + "</th>");
}
out.println("</tr>");
dm.getQueryDataSet1().first();
while (dm.getQueryDataSet1().inBounds()) {
out.print("<tr>");
for (int i = 1; i < columns.length; i++) {
out.print ("<td>" + dm.getQueryDataSet1().format(i) + "</td>");
}
out.println("</tr>");
dm.getQueryDataSet1().next();
}
out.println("</table>");
out.println("</body></html>");
}
public void destroy () {}
}
There are several points of note. Firstly, the servlet class extends (inherits from)
the HttpServlet class: it has no orthodox constructor. The server will perform the
instantiation and take over all communication between the web page and the
servlet. The doGet method writes out just one line of text in HTML tags. This will be
displayed in the original calling page. The doPost() method writes out several lines
of text before processing the dataset. These include the tags for a new page. The
method’s output will therefore not be displayed in the calling page. A new content
page will be created dynamically.
Create a new datamodule for the project. Click File|New and select the datamodule
icon in the General tab. Accept the name datamodule1 and click on Finish. Fill in the
generated skeleton code as shown in Figure 18.21.
Again, there are some points to note. The datamodule1 class implements the
datamodule interface rather than extending a class. A datamodule acts as a
container for other data-aware components. It does not have an orthodox
constructor. Instead, it has an attribute, an instance of a datamodule1, called
myDM.
If myDM does not already exist, a call to the getDataModule() method causes
myDM to be instantiated by the servlet engine. The constructor does nothing
except immediately call jbInit(). The object is instantiated in the server and the
components it contains are initialized. The calling servlet gets a reference returned
to it by which it can address the data-aware components contained within the
datamodule.
The datamodule may outlive the servlet that first called it. It then resides in the
server awaiting another call from any other servlet whose class definition makes it
aware of the datamodule1 class. Only a specific call to destroy or the closure of the
server will end the life of the datamodule.
This datamodule has some of the same data-aware components that were seen in
Chapter 14. The database component makes the connection to Oracle and the
QueryDataSet contains the queried rows of the players table. The connection string
and the SQL query are set in the component property inspectors. The Java code is
generated automatically.
When it is instanitiated in the server, SBServlet obtains a reference to the
datamodule (Figure 18.20). The datamodule either already exists or is immediately
452 SYSTEMS BUILDING WITH ORACLE
package sbservlet;
import com.borland.dx.dataset.*;
import com.borland.dx.sql.dataset.*;
import java.sql.*;
public class DataModule1 implements DataModule {
static private DataModule1 myDM;
private Database database1 = new Database();
private QueryDataSet queryDataSet1 = new QueryDataSet();
public DataModule1() {
try {
jbInit();
}
catch(Exception e) {e.printStackTrace();
}
}
private void jbInit() throws Exception {
database1.setConnection(new com.borland.dx.sql.dataset.
ConnectionDescriptor("jdbc:oracle:thin:
@loopback:1521:scotoracle", "a27ws", "orange",
false, "oracle.jdbc.driver.OracleDriver"));
queryDataSet1.setQuery(new com.borland.dx.sql.dataset.
QueryDescriptor(database1,
"select * from players", null, true, Load.ALL));
queryDataSet1.open();
}
public static DataModule1 getDataModule() {
if (myDM == null) {
myDM = new DataModule1();
}
return myDM;
}
public com.borland.dx.sql.dataset.Database getDatabase1() {
return database1;
}
public com.borland.dx.sql.dataset.QueryDataSet getQueryDataSet1() {
return queryDataSet1;
}
}
instantiated by the servlet engine. The returned reference is the object identifier of
myDM in the datamodule. This is assigned to the dm object in the servlet. SBServlet
can use a local reference because both it and the datamodule are running in the
WEB INTERFACES 453
same address space, the web server’s engine. It then processes the dataset through
the dm reference, getting its column names in an array of the Column class and
outputting the header row of an HTML table. It processes each row of the dataset,
getting each of the attribute values, referring to the columns of the current row by
number, using the dataset’s format() method. All of the output is enclosed inside
HTML tags, constructing a new page.
Before the project is run, ensure that the webapp has the appropriate configura-
tion information. Figure 18.22 shows the Run|Configurations editor. The Server tab
shows a tree of options. Check all the boxes in the Command line and Libraries
sections.
The running project is shown in Figures 18.23 and 18.24. When the SHTML web
page loads, it performs an implied get request. This is intercepted by the web server.
If the SBServlet does not already exist, it is immediately instantiated by the web
server’s engine and the get request forwarded to its doGet() method.
Figure 18.23 SBServlet: the submit form. The get has been processed.
Figure 18.24 SBServlet: the players list. The post has been processed.
WEB INTERFACES 455
The output text from the servlet appears between the two engraved lines.
Clicking the Submit button causes the page to send a post request. Again this is
passed on, this time to the servlet’s doPost() method. The database is queried and
the results displayed in a new page.
The bean provider or developer must generate the bean classes that contain the
business process logic. In addition, two types of interface must be created. The
home interface defines the methods a client uses to create, locate and destroy
instances of an enterprise bean. The remote interface defines the business methods
implemented in the bean. A client accesses these methods through the remote
interface.
456 SYSTEMS BUILDING WITH ORACLE
Application Bean provider Writes the bean. Creates the home and
roles remote interfaces.
Application Writes the application that uses the
assembler beans. The application may include GUI
clients, applets, servlets or JSP pages.
Assembly instructions are embedded in
deployment descriptors.
Infrastructure EJB server Provide a framework in which to run EJB
roles provider containers. This must have, at a minimum,
a naming service and a transaction
service.
EJB container Provide tools to deploy enterprise beans
provider and the runtime support for the beans. A
container provides management services
to one or more beans..
Deployment Deployer Adapts the application to the target envi-
roles ronment, modifying the properties of the
enterprise beans. For example, deployers
set transaction and security policies.
Systems admin Monitors application execution; takes
action if it behaves abnormally. Respon-
sible for administering the enterprise’s
computing and networking infrastructure.
Once the bean is deployed in the EJB container, the client calls the create ()
method defined in the home interface to instantiate the bean. The home interface
is implemented in the container. Other methods declared in the home interface
permit the client to locate an instance of a bean and to remove a bean instance
when it is no longer needed.
Once the bean is instantiated, the client can call the business methods within it.
The client never calls a method in the bean instance directly, however. The
methods available to the client are defined in the remote interface of the bean, and
the remote interface is again implemented by the container. When the client calls a
method, the container receives the request and delegates it to the bean instance.
The case study at the end of the next section demonstrates how a JavaServer page
calls a simple bean to access the UFL database. Table 18.3 describes the different
types of enterprise bean.
WEB INTERFACES 457
JavaServer Pages (JSPs) are also becoming widely used in web applications to
provide an alternative to the relatively complicated RMI mechanism described
above. They also avoid the need for a static HTML page used to call servlets. JSP uses
a technology specified by Sun Microsystems as a convenient way of generating
dynamic HTML content that a browser can interpret. The JSP is directly referenced
in the browser’s calling URL. The generated HTML is the web server’s response.
JSPs are closely associated with Java servlet technology and with enterprise
beans. They allow the inclusion of Java code snippets and calls to servlets, beans or
other external Java components, as well as HTML tags. JSP code is distinct from
other Web scripting code, such as JavaScript. Anything that can be included in a
normal HTML page can be included in a JSP page.
The JSP is the web page. Eventually, all the Java snippets, bean references and
HTML code are translated by the compilation process into a single Java-only file.
This is compiled to a single class. When the browser calls a JSP in a URL, the web
server launches the class in its servlet engine, along with any other dependent
servlet or bean classes. The JSP class outputs the necessary HTML to construct the
page in the browser. There is no need to download any support libraries, so
network traffic is minimized and browser configuration is unnecessary (Table
18.4).
458 SYSTEMS BUILDING WITH ORACLE
The example application uses the jspSB.java server page to output the HTML code
that is eventually seen in the web page. The JSP calls a simple bean that makes the
Oracle connection, performs a query and returns a ResultSet to the JSP. The JSP then
formats the contents of the returned ResultSet, embedding the columns and rows in
HTML table tags. The application is developed in JBuilder in order to use the auto-
mated configuration tools that ensure that the web application runs correctly.
Create a new project; name the project jspSB and make sure that this name is
reflected in the standard directories for the project. Use the Project|Project Properties
dialog, selecting the Server tab to configure a single server for all services. Select
Tomcat 3.3 as the web server.
Add a webapp to the project. Select File |New and select the web application icon
in the Web tab. Name it jspSBwebapp and click on OK. The web app holds the
configuration information, including a list of libraries and servlet or bean classes to
deploy to the work domain of the Tomcat server at run-time.
Add a JSP to the project. Select File |New and select the JavaServer Page icon in the
Web tab. Name it jspSB and check the Generate submit form and Generate sample bean
buttons. The generated code for the JSP contains a mixture of JSP code and HTML.
The important part of this code is the section that identifies the generated bean,
jspSBBean, in JSP tags. This enables the JSP page to call the bean methods from the
server. Edit the code in the JSP page as shown in Figure 18.25. All the source files for
this project are on the book’s web site, so the code can be downloaded and pasted
into this file.
Mixing Java statements and HTML makes the source difficult to read, but it
should be clear that the java.SQL.* library path is imported in the first line. This
makes JDBC classes available to the program. Later, a ResultSet is assigned to the
returned value of the bean’s myPlayers method. The rather confusing code that
follows is caused by having to put the Java statements in JSP tags (<%...%>) and the
Table (<TABLE.../TABLE>, Table Header (<TH... /TH>) and Table Data (<TD.../TD>)
HTML tags around the content of the ResultSet fields. Notice also that the ResultSet
operations are in a try/catch structure because of the possibility of raised excep-
tions.
The ResultSet can be passed as a parameter because both the JSP class and the bean
will be instantiated in the same address space: Tomcat’s servlet engine.
WEB INTERFACES 459
<% }
else {%>
<P> Sorry, the query returned no rows! </P>
<% }
rset.close();
} catch (SQLException e) {
out.println("<P>" + "There was an error doing the query:");
out.println ("<PRE>" + e + "</PRE> \n <P>");
}%>
</body>
</html>
The simple bean created along with the JSP needs substantial amendment.
Change the code in jspSBBean.java to that shown in Figure 18.26. Again, this can be
downloaded and pasted into the file. The bean has two methods, myDate and
myPlayers. The first demonstrates how to construct a formatted date string and pass
it back to the JSP. The second makes the Oracle connection, executes a query and
passes a ResultSet back to the JSP.
Before the project is compiled, check that the run-time configuration is properly
set. Select Run|Configuration; in the dialog select the configuration jspSB and click
on Edit. The Server tab should show the tree for Server and Services options. The
Command line options make output path available and make libraries available
should both be checked. In the Library options, make sure that the required library,
OracleJDBC, is checked.
When the project is compiled, the bean source code produces a standard class file
in the output directory. The JSP code is used to generate a Java-only source file. This
converts the HTML to equivalent Java output statements. The source can be seen in
the .../classes/Generated Source directory. The source is copied to the
jspSBwebapp directory and compiled to a class.
Using the configuration information, the Tomcat server loads the jspSB servlet
class and the jspSBBean class. The resulting interaction is a stream of string outputs
that consists only of HTML. This stream is caught and displayed in the browser
window. Figure 18.27 shows this output.
If a user tries to determine the source code of the JSP by using View |Source in the
browser’s menu, Figure 18.28 shows a listing of what they would see. The browser is
only aware of receiving a stream of HTML code from the requested URL. The listing
is quite long: there is a block of six lines of <td> tagged attribute formatting for each
player in the returned ResultSet.
The actual source or class code for the JSP or the bean cannot be downloaded and
is secure from those who might wish to reverse engineer it.
WEB INTERFACES 461
package jspsb;
import java.text.*;
import java.util.Date ;
import java.sql.*;
18.8 Î XML
club>. XML gives context to words and values in web pages, identifying them as
data instead of simple textual or numeric elements. An XML document depends
upon a stylesheet in order that its data may be rendered as a web page in a browser
Every XML document must have a root or top-level element. This is the outer-
most element and contains all the other elements. Any name for the root element
may be chosen – hence the word extensible in XML. XML tags can be user-defined.
In HTML, tags are standardized.
A Document Type Definition (DTD) is a set of rules that define the allowable
structure of an XML document. DTDs derive their format from SGML and can be
either included in an XML document or referenced as an external file. A DTD uses
the tag structure to define the names of elements, attributes and entities of a docu-
ment without specifying their actual values.
WEB INTERFACES 463
<html>
<head>
<title>
jspSB
</title>
</head>
<body>
<h1>
Systems Building JSP uses JDBC Bean
</h1>
<p>Today is: Fri, 16 05 2003 AD at 02:32:11 PM BST.</p>
The file xmlPlayerlist.dtd, shown in Figure 18.29, first defines the xmlPlayer tag
as an element consisting of eight attributes. Each of those attributes is then defined
as a tag containing data. Finally, the xmlPlayerlist tag is defined as consisting of any
number of xmlPlayer tags. The file is named after the root tag, xmlPlayerlist.
464 SYSTEMS BUILDING WITH ORACLE
An XML document that conforms to the structural and notational rules of XML is
considered well-formed. A well-formed XML document does not have to contain
or reference a DTD. A well-formed document is parsable, i.e. it is syntactically
correct although it may not yet have all the information that makes it semanti-
cally meaningful.
<club>Purity FC</club>
</xmlPlayer>
...
Figure 18.32 shows how the XML document would appear in a browser without
the benefit of an applied style.
The player list view stylesheet shown in Figure 18.33 formats the same XML file
as a list of selected attributes (Figure 18.34). The player rows are sorted in surname
order.
The stylesheet inserts HTML list tags <dl> </dl> around the row data, which is
processed in a loop. The surname and forename values are embedded in a <dt> </dt>
pair and the other attributes are embedded in a <dd> </dd> pair.
The players table view derived from the same XML file is shown in Figure 18.35.
The same data has been given a completely different appearance by an alternative
stylesheet (Figure 18.36).
There are a number of data-aware components in the JBuilder design palette that
are specifically tailored either to output dynamic XML documents from SQL
queries or to store XML data in database tables. The dynamic output can be
formatted with a standard set of stylesheets for a particular application.
WEB INTERFACES 467
RMI applet/ Thin applet responsible only for GUI; may still need support
server archives and still needs a static web page. Business logic and
WEB INTERFACES 471
JSP No static web page; the JSP is the page. Operates with same
security rating as the web server. JSP can use all types of bean.
No real limitation on functionality. Complicated initial
configuration. Easy to maintain complex systems
18.10 Î EXERCISES
(i) Design the Clubs form, described in Chapter 14. Implement it as an applet in a
web page.
(ii) Adapt the implementation of the Players form from Chapter 14 to display in a
web page. Use servlet technology to create both the user interface and
support the combobox queries.
(iii) Adapt the matchResults form in Chapter 14 to display in a web page. Use JSP
technology supported by data-aware components in a bean.
(iv) Adapt the matchResults report from Chapter 14 to display in a web page. Use
XML and a suitable stylesheet to format the report.
CHAPTER 19
PARALLEL DATABASES
There isn’t a Parallel of Latitude but thinks it would have been the Equator if it
had had its rights.
Mark Twain, Following the Equator
This chapter outlines some performance and reliability issues associated with tradi-
tional, single processor, centralized database systems, and describes a number of
architectural alternatives devised to alleviate such problems. Specific focus is given
to parallel database architectures that enable a single database to distribute its asso-
ciated query processing across multiple processors and disks.
472
PARALLEL DATABASES 473
Client #1
Client #2
1
2
Client #n
2
1 Data request
2 Data transmitted
requests. The client may also perform some application-specific business logic,
such as data validation against entries that the end-user may wish to insert into the
database. Client-side data validation helps to limit the amount of network commu-
nication between client and server, and so reduces processing of invalid user
requests.
The responsibilities of the database server are to accept client requests, to service
those requests, and to return the result back to the client. The database server is
also responsible for ensuring consistent concurrent access, recovery and security of
the database system.
The database systems provided by most university computing labs generally
consist of a single database server, with many lab-based client workstations
running vendor-specific client software. In the case of Oracle, all client machines
would usually require a copy of SQL*Plus to handle user interaction, and a local
copy of Oracle Net networking software to communicate with the Oracle server.
Figure 19.2 details Oracle’s client–server architecture. Full details of Oracle
processes are discussed in Chapter 11.
In certain situations, this simple client–server architecture can be problematic.
Specific issues of concern are database reliability, data availability, transaction
throughput and database scalability.
The standard client–server system involves only a single server. Such systems are
referred to as centralized database systems. All of the data resides in a single central
location. If the database server fails then all of the data located at that central loca-
tion is unavailable and all client requests will fail.
Centralized database systems have a single point of failure. With small, non-
mission-critical systems this is not considered a particular handicap. If the server
fails, then the users (clients) will just have to wait until it is repaired and back
online.
474 SYSTEMS BUILDING WITH ORACLE
Database client
SQL*Plus
SQL*Plus
Network connection
RDBMS
Oracle Net
Oracle Net TCP/IP
Oracle Net
Oracle
database
Database server
the backup. This architecture allows for transparent failover in the event of any
server hardware failure.
The difficulty with such a model is in maintaining synchronicity between the
current master server and its mirrored replica. This problem, and the various solu-
tions proposed, is examined in Section 20.6 on replication. That section also
explains how replication of data can be used to optimize data availability as well as
increase database server accessibility.
The size of the database and the throughput of transactions against it can also
affect performance. A database that can handle 1 GByte of data and 100 transac-
tions per second may fail to handle 10 TByte of data with 1000 transactions per
second. Such a failure is referred to as a failure of scalability. This is a common
problem that besets database application systems that are inadequately tested for
future growth or throughput prior to rollout. It is also a problem that can lie
dormant for some time as the database continues to grow towards its non-docu-
mented maximum.
Solutions to problems of scale lie in the realm of parallelism: the ability to take a
large problem, break it into small parts, work on each of those parts simultaneously
and then merge the part solutions into a single solution to the original problem.
Parallel database architectures developed to achieve scalability are discussed in
Section 19.4.
Data that is accessed across a network, as in the case of a client–server database
system, is of course affected by the availability of the network connection between
the client and the server. In a centralized server architecture, if the network
connection to the database server is unavailable no client data processing can
occur. Once again, a single point of failure exists.
A possible solution to such total failure is to have more than one database server,
and to position each server at a different node of the network. In a local area
network these would, in all likelihood, be replication servers.
It was, until recently, generally thought preferable to partition the database
between a number of database servers, and to locate each server next to the group
of clients that make most use of that data subset.
This geographical distribution of data is used to achieve high locality of reference
and to limit the effect of any single database server failure or network inaccessi-
bility. The benefits are primarily to those clients that are collocated with the server.
However, increasingly reliable, secure and fast wide area network connections are
reducing the range of situations where systems designers would consider data
distribution, or fragmentation, as a fundamental requirement for a new informa-
tion system. Such distributed database architectures, and the various issues associ-
ated with them, are discussed in Chapter 20.
Scale-up Speed-up
Linear speed
5 CPUs 10 CPUs
1 GByte database 2 Gbyte database 5 CPUs 10 CPUs 16 CPUs
There are three main architectural models for parallel database servers: shared
memory, shared disk and shared nothing.
The shared memory parallel database model is generally built upon a symmetrical
multiprocessing (SMP) architecture. This model consists of a host computer with
multiple processors (CPUs). All of the host’s memory, and each of its attached
disks, is accessible by each one of its processors. As both memory and disks are
shared, this architecture is also referred to as shared everything. A simple SMP shared
memory model is outlined in Figure 19.4.
Memory
Interconnect
CPU
CPU
CPU
CPU
CPU
CPU
Even though it is possible for an SMP system to share a single disk, a more usual
configuration would include a set of integrated disks. The main drawback with the
SMP model is that the CPUs are connected via a shared bus to the same pool of
memory. There is a limit to the number of processors that can be effectively accom-
modated with an acceptable level of memory bandwidth contention. The optimum
number of processors is generally between 4 and 16.
A number of recent advances in memory-to-CPU interconnection, such as
crossbar switching, as shown in Figure 19.5, and the introduction of large
processor cache memories, have reduced memory contention and raised the
478 SYSTEMS BUILDING WITH ORACLE
CPU
CPU
CPU
CPU
the processor is accessing its own local memory or making a remote access to the
local memory of another processor. When a processor requires remote memory,
effective cache coherency protocols are used to fetch the remote memory into the
processor’s local memory.
Performance in NUMA systems, as with SMP, is limited by memory bus band-
width. Contention increases as CPUs are added to the system, and beyond a certain
point, performance can start to decrease rather than increase. Such a diminishing
return violates both linear scale-up and speed-up. The exact point at which adding
CPUs to a system results in minimal (or negative) performance improvement varies
with application type and system architecture. It is generally recognized that most
typical shared memory systems do not scale well beyond 64 processors. Even
though there is little interprocessor communication overhead resulting from using
common memory structures, because of memory bus and disk I/O contention,
neither SMP nor NUMA shared memory architectures will ever be able to offer true
linear scalability beyond a limited number of processors.
Nevertheless, an advantage of these shared memory systems, from the perspec-
tive of database vendors and system architects, is that applications currently
running on single-processor-based hardware do not have to undergo major
rewrites in order to enjoy an incremental performance improvement by being
ported to shared memory systems.
Unlike the single node shared memory model, the shared disk model consists of
multiple processing nodes; each with its own dedicated memory, connected to a
set of common disks. Each node can either contain a single CPU or can implement
its own SMP or NUMA shared memory architecture.
A shared disk model can be built either on top of a massively parallel processing
(MPP) distributed memory substructure, or around a cluster of loosely coupled
cooperating nodes. In MPP architectures, processing nodes communicate by
sending messages to each other over a very fast interconnect network, whereas
nodes in a cluster communicate via message passing over a high-speed bus. In a
shared disk cluster configuration, each node is connected to a common set of data-
base disks via a high-speed data bus.
In an MPP shared disk architecture, each node is physically connected to its own
private disk, but executes a system-level virtual disk layer that enables access to the
complete set of database disks. Oracle uses the notion of disk affinity to enable an
MPP shared disk configuration access over its databases. Figure 19.7 gives an
example of a clustered SMP shared disk architecture, and Figure 19.8 outlines an
MPP shared disk example.
As memory is not shared amongst the nodes of a shared disk parallel database,
each node has to have its own instance of the database server process and a data
cache. Consequently, given that disks are shared, a lock manager is required to main-
tain cache consistency across all the nodes, and locks are used to ensure that each
node in the cluster continues to have the same view of the data. The additional
480 SYSTEMS BUILDING WITH ORACLE
Node 9 Node 10
CPU CPU
Node 5 Node 6
Memory Memory
CPU CPU
Node 11
Memory Node 1 Node 2
CPU
CPU CPU
Node 7
Memory
CPU Memory Memory
Memory Memory
overhead in maintaining such locks and ensuring data cache consistency results in a
negative impact on scalability and performance. Also, as the shared disk model uses
data shipping to move data from disk to node, I/O bandwidth acts as an additional
constraint upon achievable levels of scalability. However, as there is no memory
PARALLEL DATABASES 481
contention, the shared disk model generally scales better than a shared memory
model.
As with the shared disk model, a shared nothing parallel database can either be
built on top of an MPP environment, or can be constructed over a clustered set of
nodes. In either case, each of the nodes will be attached to a dedicated disk (or disk
pool), with each disk containing a discrete portion of the overall database. Figure
19.9 shows a high-level view of such architecture.
Each shared nothing node maintains a separate instance of the database process.
In contrast to the shared disk model, a shared nothing node can only directly
access the portion of database data contained in its own physically attached
disk(s). If the data to be processed by a particular node resides on a disk connected
to another node then it needs to send an appropriate processing request message to
that node.
The recipient node will then process the requested data operation and return the
result set to the requesting node. Such function shipping is costly and should be
minimized in order to maximize the scalability of the shared nothing model. Theo-
retically, because neither memory nor disk is shared, the shared nothing model
could provide linear scalability for an unlimited number of processors. Indeed,
shared nothing machines containing many hundreds of processors have already
been developed.
However, such a situation is highly dependent on the degree of parallelism
supported by the operating system and application software running over such
architecture, and the effectiveness with which the database is distributed across
the nodes. If an operation performed at a particular node has local access to all the
data it requires then it will perform well.
If it does not, a function-shipping request will need to be passed to the node that
does posses the data. This can result in node hot spots where the same data is
482 SYSTEMS BUILDING WITH ORACLE
frequently requested, and this in turn can reduce scalability because of the
resulting disk I/O and processor bottlenecks.
There are four different types of parallelism that can be applied to database applica-
tions (Table 19.1).
The same is true of the two project operations. Both pairs of operations can there-
fore benefit from intra-query parallelism by executing on separate nodes. This type
of intra-query parallelism is known as independent parallelism.
It should be noted that the relationship between each of the two restricts, and
their corresponding projects, is that of a producer and a consumer. The project
operation takes as input the output of the restrict operation.
As there is no requirement for each restrict’s output to have been completed prior
to it being input to the corresponding project operation, it is possible for the two
projects to be performed in parallel with the two restricts, consuming each
restricted tuple as it is produced. This allows for a type of intra-query parallelism
known as pipelining. Figure 19.12 shows a high-level overview of both types of
intra-query parallelism.
484 SYSTEMS BUILDING WITH ORACLE
Restrict players
Project
(Table scan)
Join
(Sort-merge)
Restrict contracts
Project
(Table scan)
Restrict Players
Intra-operator parallelism
Project
Players.club = ‘Beauty United’ Players
Intra-query parallelism
Range partitioning involves splitting a data relation and assigning each row to a
particular disk on the basis of the value of one or more of its attributes. The Matches
table could be split according to which season the match occurred, with a separate
disk partition created for all the matches associated with a particular season. Such a
technique would enable all of a season’s match data to reside at a single node. This
would allow queries involving the Matches table to be processed in parallel
according to season. Figure 19.14 shows the Oracle SQL (DDL) statement required
to split the Matches table across four disk partitions.
Range partitioning is particularly useful when there are distinct logical ranges of
data that require storing together and where the resulting partitions are likely to be
processed in parallel. In order to achieve maximum intra-operation parallelism, it
is essential that data is evenly distributed across all partitions.
Round robin partitioning involves evenly splitting a data relation between a fixed
number of partitions, regardless of attribute values. Therefore, if there are n parti-
tions, the relation’s ith tuple will be allocated to partition P on the basis of
(Pi mod n). Given four partitions, the tenth tuple of a relation would always be
placed at the second partition. However, the round robin technique is only
optimum in situations where entire relations require processing and intra-opera-
tion parallelism can be maximized. In situations where small subsets of a relation’s
tuples are required, such as frequently occurs when associatively accessing tuples on
the basis of particular attribute values, or it would usually be more efficient to
partition according to the possible values of the operation’s predicate attribute.
Oracle does not support round robin partitioning.
the operation initiation time becomes a significant fraction of the operation’s total
execution time.
If it takes 0.1 seconds to initiate an operation request at a node, and one second
for a node to process 100 tuples, one node (i.e. no parallelism) would take 100.1
seconds to process 10,000 tuples. Ten nodes processing 1000 tuples each would
take 11 seconds ((0.1 × 10) + 10). One hundred nodes processing 100 tuples each
would also take 11 seconds ((0.1 × 100) + 1). One thousand nodes processing 10
tuples each would take 100.1 seconds ((0.1 × 1000) + 0.1). In reality, the cost of
using a large number of nodes is even higher than this as it would include the time
taken to collate the results of each of the separate sub-tasks.
Parallel Using more than one CPU to access a single physical database
architectures
19.7 Î EXERCISES
(i) Investigate the similarities and the difference between concurrency control
schemes for parallel databases and distributed databases.
(ii) Investigate the current application of parallelism to object databases.
CHAPTER 20
DISTRIBUTED DATABASES
Good sense is of all things in the world the most equally distributed, for everyone thinks
himself so abundantly provided with it
René Descartes, Discourse on Method, part I
491
492 SYSTEMS BUILDING WITH ORACLE
Distributed database
A B C Copy
A
users. If the groups of users were spread across a wide geographical area, this might
once have been a reason for designing data distribution into a new development.
The development of fast, secure networks has undermined this rationale and very
few distributed databases are designed from the ground up.
Keeping live copies of the data at different sites is another important and current
motivation for the use of distributed databases. This has implications for main-
taining high availability of the data and for disaster recovery.
Figure 20.1 shows a typical distributed database. It consists of a partial view of each
the participants’ datasets. The individual nodes have their own local users who
issue local queries. Other users connect to the distributed database. They are global
users who issue global queries. The diagram says nothing about the kinds of data-
base servers contributing to the distributed view or how much each node contrib-
utes to that view.
A model of the complexity of a distributed database will give an insight into the
difficulty or otherwise of providing a satisfactory systems solution to any proposi-
tion. Three main factors have to be taken into account: autonomy, distribution
and heterogeneity (Figure 20.2).
Autonomy measures the degree to which individual nodes are susceptible to
outside control. Clearly, in an array of database servers, each one has its own transac-
tion and recovery mechanism. In a situation where there are local users submitting
queries in addition to remote queries against the allowed portion of the schema, it is
unlikely that there could be any relaxation or delegation of these controls.
Site autonomy means that each server participating in a distributed database is
administered independently from all other databases. Although several databases
can work together, each database is a separate repository of data that is managed
individually.
DISTRIBUTED DATABASES 493
Autonomy
Heterogeneity
Distribution
Autonomy represents the level of control that an individual database site has
over the data that it holds. It is therefore only of relevance to sites that have local as
well as global users. If all users of a distributed database are global users of its collec-
tive data then there is no local autonomy.
Heterogeneity is a measure not only of the variety of different vendors’ prod-
ucts, but also of the presence of different data models and query languages in the
distributed database. The more diverse this factor becomes, the more problematic
is the acquisition of a unified view and the provision of a single query mechanism.
Most distributed database implementations, especially those necessitated by busi-
ness acquisition or serving industry associations, will have high heterogeneity
ratings.
Distribution measures the degree to which distributed queries will need to take
data from more than one site. In a situation where a centralized database is repli-
cated across a series of other nodes, each node is capable of completing any query.
Provided updates, insertions and deletions are propagated to each replica, they
continue to offer a low distribution rating and offer high availability. If the distrib-
uted view has tables spread across many sites or if tables from many sites are
required to be joined in most queries, the distribution rating will be high.
However, the three axes of this model give a slightly misleading view. First, the
three dimensions are not of equal importance in assessing complexity; nor can
they be precisely measured. Further, they not as orthogonal (mutually inde-
pendent) as the diagram makes them appear.
Perhaps a more realistic model is the taxonomy outlined in Figure 20.3, which
classifies distributed databases according to their architecture.
Database heterogeneity is considered the major dimension when classifying
distributed database architectures. Such heterogeneity is usually assumed to indi-
cate different DBMSs (such as Oracle and SQL Server) with a similar data model
(e.g. relational). However, it could also mean different operating systems, different
data models or even different versions of the same DBMS.
494 SYSTEMS BUILDING WITH ORACLE
Homogeneous Heterogeneous
Unfederated Federated
Single Multi
20.1.2 TRANSPARENCIES
Transformation
mechanism
User Distributed
database
The overall transparent view can be described in terms of eight types of transpar-
ency that can be linked in a dependency hierarchy. Figure 20.5 shows these
dependencies. From the diagram, it can be seen, for example, that scalability trans-
parency is dependent on migration and replication transparencies, which, in turn,
are dependent on access and location transparencies.
Access transparency means that a mechanism is provided whereby the user
accesses a remote resource in much the same way as a local resource. Mapping a
local drive letter in Windows to a subdirectory of a network file server is an
example of this. The actual mechanism is hidden: the remote resource is not actu-
ally a drive but the head of a directory tree below the remote drive root. Neverthe-
less, Windows Explorer presents exactly the same file manipulation operations for
the mapped drive as for a local drive: directory listings, file copies, cut and paste
and so on are all done in the same way.
This same drive mapping is also an example of location transparency. The
precise location of the remote subdirectory in the map is hidden. If the network
administrator had provided the subdirectory on the remote machine as a link to
yet another machine, the user accessing files on the mapped drive would be
496 SYSTEMS BUILDING WITH ORACLE
Scalability Performance
Failure transparency
transparency transparency
unaware of the complex chain of network operation required to open a file for
editing.
Concurrency transparency means that if remote resources are accessed, and
possibly updated, by more than one user at the same time then each user process
will be unaware of any interference from the others. Transactions in a local data-
base have their isolation maintained through a lock manager, but extensions to
this are required for a distributed transaction. Remote print queues, established so
that users may share a high-capacity printing resource, are another instance where
user manipulation of their own jobs in the queue needs segregation. The detailed
process by which this is achieved is hidden from public view.
When a remote resource changes location, the change is hidden from the user by
migration transparency. The test for this whether the same SQL statement works
before and after the location change, whether the same drive map needs no amend-
ment after the change and so on. Certainly, the DBA or the network administrator
will be aware of any such changes and may have to make appropriate alterations.
The transparency of the location change is, however, a matter for the user view and
if this is preserved, the transparency is maintained.
Network file servers may be mirrored or datasets in databases may be replicated
to provide assured availability or better performance. Replication transparency
means that the user is unaware of which precise copy is being used as a result of an
access request. The selection of a particular copy may be made based on optimized
response or availability, but that selection is made without the knowledge of the
user. In fact, the user should not even be aware of the existence of any copies of the
requested resource. Whether there are two or twenty replicas available, the
requesting process accesses them in the same way as a local resource (access trans-
parency) and is unaware of which host which is meeting the request (location
transparency).
At the highest level of abstraction in the transparency diagram are three related
transparencies. Scalability transparency attempts to ensure that users are kept
DISTRIBUTED DATABASES 497
Each of the participating databases will have its own data dictionary kept inside the
databases. This data dictionary is crucial for maintaining the storage structures and
the structural and behavioural rules applied to the data, and for supporting queries
from existing local users who may or may not be destined to issue distributed
queries. The data dictionary cannot be used directly as it contains all the local
definitions.
However, a flexible mechanism for superimposing a different way of looking at
the data was described in Chapter 9: the database view. Database views can be
constructed to offer maps of internal storage names and common naming conven-
tions that may be required for the distributed queries.
At the simplest level, if one database has an employee table with a surname
attribute and another has a payroll table with a familyname attribute, these differ-
ences can be resolved by views. It may even be possible to construct a view within,
say, a Codasyl database, that bridges its normal navigational query language with
SQL. Even if it is not possible to do this with the Codasyl database manager, it
would surely be possible to build a relatively simple software bridge.
Reference architectures for databases were proposed as long ago as 1972. The
American National Standards Institute established the Standards Planning and
498 SYSTEMS BUILDING WITH ORACLE
GCS
Each contributing database has its own Local Internal Schema (LIS) related
directly to the stored data. A conceptual (or virtual) schema can be constructed at
each site using views or external software as appropriate. This is known as the Local
Conceptual Schema (LCS); conceptual because it does not refer directly to stored
data..
A coordinating host then collects the local conceptual schemas together into a
Global Conceptual Schema (GCS). The GCS is the union of the local conceptual
schemas. At this stage, the preparations for access transparency are well under way,
since the local conceptual schema should conform to a common data model and
hence a common query mechanism. The GCS has knowledge of the position of
each part of the schema, but not necessarily the way in which access transparency
was achieved in the individual site mappings between LIS and LCS.
1 This work led on to the establishment of the SQL:1999 and SQL3 standards for relational
databases.
DISTRIBUTED DATABASES 499
A subset of the GCS, the Export Schema (ES), can then be exported to serve the
needs of each group of users at the remote query hosts. This subset can vary
between just one table and the whole GCS depending on user query requirements.
Each of these export schemas is thus configured and customized to serve its user
population. The export schemas do not contain location details beyond the loca-
tion of the GCS, so this architecture supports a high degree of both access and loca-
tion transparency.
Consider a simple example: two companies within a group independently main-
tain relational databases. One company is a manufacturer; the other manages
private health care.
The Manufacturer database has an Employee table with attributes as shown in
Figure 20.7. The HealthCare database, on another site, has a Doctor table with
different attributes. These two tables can be made to appear congruent using
views. The differences in attribute names are masked by allocating pseudonyms in
the view. If there were also differences in attribute size or format, these could also
be hidden with SQL functions in the view creation (Figure 20.8).
A central coordinator has access to these views granted to it and combines them
into one view with a union operation (Figure 20.9).
Access to the global payroll view is granted to the headquarters site and the coor-
dinator location further masked by use of a synonym (Figure 20.10 ).
X Y Z
ES ES ES
Location transparency
Queries
Q s
ue e
rie eri
s Qu
Distributed database
Coordinator
GCS Re
pl
ica
nt
CS
LCS
LCS
L
Access transparency
A B C Copy
LIS LIS LIS A
Users at headquarters can now issue a query against the local synonym without
any necessary knowledge of the location of the data sources or the nature of the
database types (Figure 20.11).
Because both of the example databases are relational, standard SQL is enough to
harmonize the differences in the local data dictionaries. There is no requirement
for both databases to come from the same vendor. If one of them were a Codasyl
database then more complex local processing might be necessary to produce the
desired effect. This might involve an additional purpose-built software layer for the
local conceptual schema to convert from a navigational to a relational view of the
data. Most vendors of modern network and hierarchical database software provide
a form of SQL that reduces the need for such specialist software.
Finally, in practical terms, the coordinator host and each remote query host need
a method of implementing the [Host ].object notation. This will generally involve a
client–server connection. The ES host will assume the role of client to the GCS host
for the schema to be exported and for queries to be submitted (Figure 20.12).
The coordinator will assume the role of client to the individual node servers for
the construction of the global conceptual schema. Where the database servers are
all Oracle installations this is relatively easily achieved through a database link. If
the nodes are relational and support SQL:1999 then communication can be
achieved through ODBC or JDBC since the databases will have common Call Level
Interfaces (see Sections 12.5 and 12.6). Special gateway products are available to
provide the connection in more complex situations.
502 SYSTEMS BUILDING WITH ORACLE
A global conceptual schema brings together a unified view of the distributed data.
Its principal functions are to hide precise locational detail from the export schema
and to resolve the data into the appearance of single entity sets. In a situation where
one node contains a complete entity set, this task is simplified and the GCS need
do no more than create a synonym for the view represented in a single local
conceptual schema.
Manufacturer.payroll (view)
ID name
DF1547 Jones
BK1675 Brown
HealthCare.payroll (view)
ID name
234876 Morecombe
456987 Wise
ID name
DF1547 Jones
BK1675 Brown
234876 Morecombe
456987 Wise
However, when the individual local entity sets provide only a part of the desired view,
the data is said to be fragmented. In the case where each local conceptual schema
supplies only some of the rows of the view, the data is horizontally fragmented.
Horizontal fragmentation can be resolved through a union in the global concep-
tual schema. This is demonstrated in Figure 20.14. The local conceptual schemas
must be union compatible: they must have the same attribute names and types.
When each of the local conceptual schemas provides only some of the attributes
for the desired view, the data is vertically fragmented. A third company in the
distribution example provides in-house training for the group.
It maintains a database that contains the courses table along with appropriate
attributes. The Training Company’s DBA creates a view for this table and makes it
available to the coordinator (Figure 20.15).
A view of the training records for the HealthCare doctors can now be constructed
in the GCS host by means of a join operation between two remote sites (Figure
20.16).
Vertical fragmentation is always resolved through a join. Creating the
medicaltraining view merely saves the view definition in the Coordinator’s data
dictionary. Should a query involving this view be received, it will have to be mate-
rialized. This can only happen at one site, since ultimately the join takes place in
file and buffer operations. The join can take place at the Training site, the
HealthCare site or at the Coordinator site. Whichever one is chosen is a matter for
the distributed query optimizer (Figure 20.17).
Figure 20.15 (a) Local internal schema; (b) Local conceptual schema.
The most complex case occurs when there is both horizontal and vertical fragmen-
tation. This is call hybrid fragmentation and must be resolved in the global concep-
tual schema by the use of both union and join views.
All of the schema solutions so far have involved unmaterialized views. Only the
view definition is kept in the GCS. It is, of course possible to keep a materialized
view in the coordinator’s database. However, even if this is worth doing for perfor-
mance reasons, care must be taken to ensure that the view does not go out of date
with respect to its base views or tables.
When a distributed query is issued from an export schema host, it will be directed
to the Coordinator host. The database there will be responsible for materializing the
views taken from the various local conceptual schemas, correlating the result and
passing it back to the originator of the query (Figure 20.18).
Taking into account the fragmentation effects in a distributed database and
recording the allocation of fragments to participants leads to a redrawing of the
ANSI/SPARC three-level model into a five-level diagram where the role of the
central coordinator in providing location transparency is made more clear (Figure
20.19).
DISTRIBUTED DATABASES 505
Coordinator.payroll
(union view) Training.groupcourses (view)
ID name course_ID delegate_ID date cost
DF1547 Jones AC34 234876 ... ...
BK1675 Brown AD45 456987 ... ...
234876 Morecombe MF23 234876 ... ...
456987 Wise MX34 234876 ... ...
FD25 DF1547 ... ...
FD25 BK1675 ... ...
A distributed query needs to have the same checks and operations applied to it as a
local query. The differences are mainly to do with where those checks and opera-
tions happen. Figure 7.9 shows what happens to a local SQL query to transform it
from the high-level SQL language into a series of file reads or writes, buffer sorts or
sort merges.
In principal, distributed query processing is very similar (Figure 20.20). When
the user query is issued at the export schema site it can only be checked syntacti-
cally and semantically against the export schema. When this is done, the query is
passed to the Coordinator site.
Here, the query can be checked semantically against the global conceptual
schema. The query is then passed for optimization where a distributed execution
plan is devised. Remember, the Coordinator has the precise stored data details hidden
from it because access transparency is provided at the local sites. The Coordinator
506 SYSTEMS BUILDING WITH ORACLE
ES 1 ES 2 ES n
Fragmentation schema
Allocation schema
does have location information and it has information about view objects. In effect,
the distributed query is decomposed into a number of sub-queries, optimized for
locational efficiency, which can be satisfied by each local site. There may be some
processing of intermediate results by the Coordinator and the final result will
certainly be correlated here and passed back to the user. Figure 20.20 shows a
diagram of a general scheme for processing distributed queries.
When the queries are received at participating sites, they are subject to all the
normal processes of a local query. In fact, because of autonomy considerations, the
local site does not distinguish between a query from a local user and one from the
Coordinator. The received query is subject to further syntactic and semantic
checking and optimized to produce a local execution plan. This is put into effect
through the recovery manager, which maintains the transactional logging and
locking systems and the query executive, which performs the necessary file and
buffer operations to satisfy the query.
The results of the local query are passed back to the Coordinator. Only the Coordi-
nator is aware that it is an intermediate result in the global query. The local site has
no knowledge of the global query, only the local query that it has completed. Each
DISTRIBUTED DATABASES 507
User
User’s server
Interface handler
Export schema
Semantic checker
Semantic checker
Global conceptual
Global query optimiser schema
Local conceptual
Local query processor
schema
Semantic checker and optimiser
Local internal
Local recovery manager schema
System logs and
journals
Query executive
File and buffer
Participating server operations
local site has information only about its own local schemas. It has no knowledge of
the other local sites in the distributed database. Only if it also receives a copy of an
export schema and thus becomes both a local site and a remote query site can it
have an inkling of remote data beyond its realm. In this case, it would still only be
aware of the Coordinator as a source of that external data and the detailed location
information would continue to be hidden from it.
Access and location transparencies in the distributed database keep every node
on a need to know basis.
508 SYSTEMS BUILDING WITH ORACLE
The costs of this query are the sum of the two local queries, network transfers of
their results to the Coordinator and the join at the Coordinator site.
DISTRIBUTED DATABASES 509
There are at least two alternatives to this strategy (Figure 20.22). The materialized
payroll view could be transferred to the Training site, the join performed there and
the results transferred back to the Coordinator.
Notice that although the transfer of the payroll rows from HealthCare to Training
seems to be done in one statement, it will involve an implicit temporary table on
the Coordinator since HealthCare is unaware of Training’s existence and cannot
perform the transfer directly. A third strategy would be to transfer the courses rows
to HealthCare and to perform the join there.
As far as the global query is concerned, the most efficient plan is the one that
restricts network traffic the most. If payroll is small compared to courses then the
second strategy might be best. The transfer of payroll to Training would have little
network cost and the join result would be small because only a restricted number of
rows from courses would match the transferred payroll rows.
No rational decision about this can be made without some information about the
relative sizes of the materialization of payroll and courses. In Section 7.3, the analyze
command was demonstrated as a means of collecting table statistics which the
local optimizer uses. The Coordinator could, from time to time, collect similar
statistics by materializing the views in the GCS and analyzing them.
Once the global query has been decomposed and the local queries dispatched to
the local nodes, the local query optimizer assumes responsibility for the final trans-
formation of the received query into optimum file and buffer operations. Not only
are statistics taken into account, but also information about table storage structure,
indexes and so on.
The transaction mechanism for assuring the data security of a centralized database
was discussed at length in Chapter 8. The properties of atomicity, consistency,
isolation and durability are just as important for a distributed database
A database must guarantee that all statements in a transaction, distributed or
non-distributed, either commit or roll back as a unit. The effects of an ongoing
510 SYSTEMS BUILDING WITH ORACLE
transaction should be invisible to all other transactions at all nodes; this transpar-
ency should be true for transactions that include any type of operation, including
queries, updates, insertions or deletions.
In a distributed database, transaction control must be coordinated with the same
characteristics over a network to maintain data consistency, even if a network or
system failure occurs.
The particular problem for distributed transactions is that the global query is
decomposed into a serialized set of local transactions that are submitted to autono-
mous local servers. If one of these should fail for any reason and the others succeed,
the distributed view will be compromised. Each local server will maintain local
consistency through its transaction mechanism, but there must be some form of
transaction control exercised through the Coordinator.
Coordinator Participating
site
Initial Initial
Wait Ready
Vote Vote Global Global
Abort Commit Abort Commit
commits its local transaction and acknowledges the command to the voting
coordinator.
If there is so much as one vote against a global commit then the voting coordi-
nator issues a global abort command. Servers that were in a pre-commit state
perform a rollback, servers that were in a failed state move to the abort state. In
each case an acknowledgement is again sent back to the coordinator.
The two-phase commit mechanism attempts to guarantee that either all partici-
pating servers in a distributed transaction commit or all roll back. A two-phase
commit mechanism also protects implicit local DML operations performed by
integrity constraints or triggers.
Most mainstream database packages implement 2PC. There are many articles in
the research literature that propose the strengthening of the 2PC mechanism to
take into account its perceived limitations, such as failure of the voting coordi-
nator during either of the phases or failure of a participant after voting. Any exten-
sions to 2PC, such as those that allow for compensating (reverse) transactions to be
issued if atomicity were compromised or for the election of a replacement voting
coordinator, would have to be included in specially constructed software.
Distributed
concurrency control
Pessimistic Optimistic
Timestamping
Centralised Basic
Decentralised Conservative
Optimistic control does not produce wait states or mutual exclusion. Therefore
deadlocks will not arise. Not having to maintain and check lock acquisition
reduces the transaction manager’s overhead. This favours throughput, but may
result in a large number of restarts if its use is not matched to the type of processing
in the distributed database (Table 20.1).
indeed, there is one), maintains a lock table on global schema objects. Local
schema objects are under the control of the local database.
If the distributed database has only one repository for the lock table and only one
lock manager then this is a centralized system.
There are two ways of distributing the work of a centralized lock manager. The
Primary Copy method identifies a primary site for each item in the GCS. Each lock
request is made to that site. Several sites may thus assume responsibility for the
management of all of the global schema objects. Each one is responsible not only
for a particular schema object, but also for any replicas of that object that may
exist. This reduces the susceptibility of the overall distributed database to a single
point of failure in the locking system, but there is still a single point of failure, and
perhaps a bottleneck, for each schema object.
A distributed locking system copies the entire locking information to every node
that originates global queries. Each node is capable of updating this information
either to grant lock acquisition or to reclaim relinquished locks. However, permis-
sion to update the locks is restricted to the current holder of a token or baton which
circulates among the nodes.
Changes made by the baton-holder are broadcast as update messages before the
baton is passed on. Lock requests or releases are made at the node originating the
transaction and thus involve no network traffic, but are delayed until the node
receives the baton. The total number of messages is reduced, but response time is
increased because transactions are held at each node until the baton arrives. Simi-
larly, the baton cannot be passed on until acknowledgements of the broadcast
changes have been received.
Distributed timestamps and multi-versioning follow similar lines to their
centralized cousins. One major difficulty is obtaining accurate timestamps across
the network where each CPUs clock is drifting at a different rate. One proposed
solution to this problem is a ticket server that would issue sequentially ordered
tickets to transactions as they started. Transaction ordering in the validation phase
would then be based on ticket ordering.
20.6 Î REPLICATION
Fragmentation schema
Allocation schema
the autonomy of local sites within the distributed database, it might be desirable
that these tables remain directly accessible for local transactions. If the base tables
are updated directly, the local materialized view will be inconsistent with its base
tables, as will the replicas of it that are distributed around the nodes of the distrib-
uted database. Such autonomous local updates will have been made outside of any
global query.
The only solution is that autonomy must be reduced to disallow direct updates to
the base tables of replicated materialized views. The introduction of replication
into a distributed database has implications for the autonomy of local sites.
Conversely, if local sites must remain fully autonomous then replication of data
from those sites becomes problematic.
Replication within a distributed database also has implications for distributed
query optimization, for transaction control and for concurrency control.
Because of the replicas in the distributed database, the transaction has survived a
catastrophic network event. All of this has been hidden from the user issuing the
original global query. Failure transparency is enhanced when replication is present
at a cost of rather complicated procedures. There are several variations of this
simplified account of quorum voting protocols in the research literature. These are
concerned with multiple network partitions, identification of primary partitions,
failures of voters or coordinators.
The Utopian Professional Players Association (UPPA) is the trade union for players
in the UFL. It has a small budget and cannot afford the same kind of expensive
computing and software systems that the UFL has invested in.
Surprisingly, the UFL has agreed that it can have limited access to the main
Oracle system for taking players’ personal details into its membership system. The
membership system runs in Microsoft Access and one of its functions is to record
the monthly membership subscription paid by each of the players.
Access has a way of making the connection to Oracle based on the ODBC stan-
dard. Before the Access program can make the connection, the local ODBC driver
manager must have a data source configured for its use.
In Windows XP this is done through the Administrative Tools folder of the
Control Panel. Figure 20.26 shows the Data Source configurator screen. An Oracle
ODBC driver is added to the driver manager when Oracle is installed. The driver
configuration specifies which Oracle instance the data source will use and its
ODBC name. Fill in the form as shown, with suitable amendments, and click on
Test Connection until the success message appears.
After clicking on OK, the new data source, UFL, is listed in the Machine Data
Source tab (Figure 20.27).
This data source can be used to connect to the UFL database from any ODBC-
compliant program. As well as MS Access this includes Excel, PowerPoint or Word
and any user program, written in (say) C++ with the ODBC API.
Launch the MS Access program and create a blank database called ufltest. Select
File |Get External Data|Link Table from the Access main menu. The file selector
dialog appears. Select ODBC Databases () from the Files of Type drop down. The DSN
selector appears.
Select the UFL data source and supply the password for the Oracle user name
configured in it (Figure 20.28).
The tables in the user schema are displayed. Choose the Players table and check
the Save Password box. Click on OK.
The Oracle table is now installed in Access as an updatable view called
A27WS_PLAYERS (Figure 20.29). It is updatable because the password was saved
with the view definition and Access is free to connect with the data source to
synchronize data whenever the view changes. Opening the view in Access will also
bring the latest data into the view.
If the password were not saved with the view definition, a read-only view would
be created and a snapshot of the data would be taken. Any changes in the data on
the Oracle side would need a new snapshot to be taken manually.
A simple Access application that uses this table view can now be constructed. The
members table has just three fields. Figure 20.30 gives the details.
The query shown in Figure 20.31 appends rows into members. It takes the ufa_id
from each player and today’s date as the date of joining. Because the membership_id
is a primary key and is an autonumber type, it is automatically incremented for
each row inserted. Access calls this an append query. It is stored in the database a
query insert members.
A join view can now be created between members and A27WS_PLAYERS. The join
is made on the common field, ufa_id. This is a select query and is stored as members
and players (Figure 20.32).
The subscriptions table has a simple structure, shown in Figure 20.33. This table
can be populated with the append query insert subs, shown in Figure 20.34.
The members form can now be created using the forms wizard (Figure 20.35). The
main form is based on the members and players view and contains a sub-form based
on subscriptions. The form and its sub-form are linked in a master/detail relation-
ship, based on membership_id. The Access database, ufltest.mdb, is available on the
book’s web site. The datalink to Oracle will need localizing.
The data link to the Players table is providing all of the transparencies described
earlier in the chapter. Once the link is configured, it behaves just like a local table
with no special processing or knowledge about location required. Any changes to
players’ data are recorded in the Oracle base table. Changes made directly to the
table show up in the view. Concurrent access is provided by normal lock manage-
ment on the Oracle side.
520 SYSTEMS BUILDING WITH ORACLE
20.9 Î EXERCISES
(i) Research the various proposals in the literature for centralized and distributed
deadlock detection and resolution. Write a simulation of deadlock detection in
a language of your choice.
(ii) As network speeds and reliability increase, any benefits of distributing data for
performance improvement have long since disappeared. Discuss this state-
ment in relation to what you can find in the research literature and in reliable
sources on the Internet.
CHAPTER 21
523
524 SYSTEMS BUILDING WITH ORACLE
Text load and indexing Recognize file format, convert to common format
for presentation; establish word string pattern;
word location analysis (inversion); suffix stemming
Text analysis Document signatures; automated thesaurus
Issue queries Adaptive query feedback
The problems with text are both syntactic and semantic. Text is written in human
languages and humans are still generally better at interpreting and understanding
language than computers. The syntactic problem relates to structural uncertainty;
the semantic problems relates to the meaning of words and their contexts.
English has a grammar that sometimes seems more often defined by the excep-
tions rather than by the rules of usage. The structure of a language can sometimes
be defined only in the simplest terms. English language consists of words, which
make up sentences. One or more sentences make paragraphs and one or more para-
graphs make up documents. Sentences should contain a verb but not always. There
should be a subject, but this is sometimes inferred. Subject: verb: object. That last
sentence passed Microsoft Word’s grammar check with flying colours! (Without
the colons, Word objects to it.)
Most information retrieval systems take words as the basic building block of their
databases. Different file formats from text processors or HTML from Internet
sources can be read directly and are usually converted into a common presenta-
tional format in order to display the context of searches.
Divining the meaning of words within sentences, paragraphs and documents is a
critical factor in ensuring that queries bring back results that are relevant to the
searcher’s purpose. Many documents, even formal reports and manuals, contain
colloquialisms, ambiguities, analogies and other figures of speech, which make it
difficult to interpret. Establishing semantic linkage between documents which are
about the same subject but do not necessarily contain the same words will enhance
the search for relevant, pertinent and timely information.
Dealing with the syntactic and semantic problems associated with language
takes place during the three phases of an information retrieval (IR) system’s
development.
A definition of a word is needed before bulk loading of documents can begin. The
simplest definition is a string of alphabetic characters separated by spaces or punc-
tuation. However, some words contain punctuation. It’s is a word and so is £50.00.
TEXT AND DOCUMENT MANAGEMENT 525
Three groups of characters can be defined within the English ASCII character set
(Table 21.2). The word concordable refers to the concordances long maintained and
created manually by librarians and which are, essentially, inversions of important
documents.
Word Occurrence
CAT 1,1,1,2;
MAT 1,1,1,6;
ON 1,1,1,4;
SAT 1,1,1,3;
THE 1,1,1,1; 1,1,1,5
WORD
-------
THE
CAT
SAT
ON
THE
MAT
6 rows selected.
Figure 21.2 In the beginning was the word... and then the occurrence.
TEXT AND DOCUMENT MANAGEMENT 527
Small connecting words like if, and and but take up a large proportion of English
language text. Just 48 conjunctions, prepositions and other words contribute more
than 50% of any normal document. Filtering these words out with a stop list is
another good method for ensuring that the word list contains entries with a rich
information content.
Queries will be enhanced if distinct forms of words are not filed as different words
in the database. Regular verb tenses: play, playing, played; adjectives: national,
nationally, sufficient; adverbs: systematically, successfully; and plurals should all
be filed as occurrences of their root, which is generally a noun or a verb. Removal of
suffixes such as ally, ient, ness and fully is relatively easy in English. Even double
suffixes, such as in hopefulness, can be filtered out.
This results in a smaller word list and allows a semantic link to be established
between the variously suffixed word roots. Plurals, adjectives and adverbs gener-
ally describe similar concepts to their roots.
Filtering prefixes would also reduce words to their root, but is generally not a
good idea nor so easily achieved. Prefixes are often negations of the root: helpful,
unhelpful, theist, atheist. Removing the prefix would mean that opposing
concepts were being joined in the database.
Text inversion, identification of words as patterns of character classes, stopped
words and suffix stemming are the major algorithmic techniques that can be
applied to the syntactic analysis of documents during the loading phase. Other
techniques exist, but suffer from the law of diminishing returns. The computing
and programming efforts that must be employed do not render corresponding
improvements in query efficiency.
Much can be discovered about the meaning and context of words by the company
they keep. The data in the occurrences table gives a number of possibilities for
measuring proximity (Table 21.3).
If a word occurs in the same document as another, they may have a semantic
connection. If the word analysis occurs in the same sentence as systems then it
probably has a different association than if it were in the same sentence as
inorganic.
Early text retrieval systems developed a similar syntax for queries based on
logical connectors, illustrated in Figure 21.3. Although results based on binary
logical or proximity operators could be ranked in terms of the number of ‘hits’
returned, the query relied on the presence or absence of the actual search terms in
the document.
528 SYSTEMS BUILDING WITH ORACLE
Used together, the two ratios express the filtering capacity of the system. Recall
and precision tend to be inversely related. Attempts to introduce factors that
increase recall may diminish precision, and vice versa.
TEXT AND DOCUMENT MANAGEMENT 529
Methods to increase recall usually rely on including documents that are relevant to
the concepts contained in the search terms, although those precise terms may not
actually appear in the document.
There are two principal ways of doing this. One is to process the database looking
for words that often appear together in documents. If this correspondence meets
certain statistical tests, then it may be possible to generate an automatic search
thesaurus of related terms. If the words appear to be about related concepts or
topics then perhaps when a search is made for one term its related terms will indi-
cate an additional set of documents which, although they do not contain the
actual search term, may be about the same topic.
Automatic thesaurus generation is a problematic task and involves highly
complex statistical balances in attempting to produce an objective, calculated rele-
vance factor. Ranking the returned documents on the strength of their relevance
connection to the original search is one way of limiting the relegation of precision in
favour of recall.
A second method involves the characterization of a document as containing
certain key concepts or themes. After making an initial query, using normal or
expanded word matches, the user marks certain retrieved documents as being
particularly relevant. The system attempts to characterize these choices using
complex statistics to represent the user feedback. A document signature is
produced which is then reapplied to the database to retrieve other documents with
similar signatures but which were not included in the original retrieval.
The extended query can then be saved and reapplied against an increasing docu-
ment collection. The objective is to extend recall without compromising precision.
Most evaluations of IR systems use recall and precision as one of the quantitative
tests of an evaluation of an IR system (Table 21.4). The problem for such evalua-
tions is finding an objective measure for relevance and estimating the relevant
documents that are not retrieved (misses) in a search.
Oracle Text is a package supplied with the Enterprise Edition. It provides special-
ized tools for managing document collections. Text content is integrated with
structured data about the documents and may form part of a general information
system for an organization. Applications can access text content through both
object or relational interfaces. Text retrieval applications can query and analyze
documents from local or Internet sources. Data may be originated in most
common formats.
530 SYSTEMS BUILDING WITH ORACLE
Common criteria
Recall
Precision
User effort
Amount of time a user spends conducting a search
Amount of time a user spends negotiating his inquiry and then separating
relevant from irrelevant items
Response time
Benefits
Search costs
Cost effectiveness
Cost benefits
The text services require users to be granted the CTXAPP role, which enables
them to create indexes in the context format, create preferences that specify
default indexing and query behaviour, make text queries and use the Oracle Text
PL/SQL packages
Dynamic text data, where documents are regularly added to the database, require
repeated index maintenance. This can be done manually by the user or automati-
cally by the CTXSYS administrator launching the ctxsrv server, which performs
index updates as documents are added.
column. It is only during the indexing process that the word recognition software
strips the formatting for the index entries.
A BLOB cannot be copied directly into a row using the standard DDL insert state-
ment. The SQL*Loader utility is used to insert or append rows. This utility runs
from a DOS prompt. Figure 21.5 shows the command line. There are two parame-
ters: the user’s authentication and the name of the control file that contains
instructions for the loader program.
532 SYSTEMS BUILDING WITH ORACLE
The contents of the control file (loader1.txt) are shown in Figure 21.6. This indi-
cates a datafile (loader2.txt) which contains the structured row data and an
external file reference for the text data, the table to be loaded and the columns
represented in the datafile. The datafile itself contains one row of data, including
the name of a Word document containing the full contract details.
loader1.txt
LOAD DATA
INFILE 'loader2.txt'
INTO TABLE contract_details
FIELDS TERMINATED BY ','
(contract_id, ufa_id, club, contract_date, ext_fname FILLER CHAR(80),
contract_text LOBFILE(ext_fname)TERMINATED BY EOF)
loader2.txt
1,VB376512,Purity FC,21-Jul-00, C:\Docs\Contract.doc,
Large objects are not stored in the same way as other data. A large object (LOB) is
stored near the row and a reference called a lob locator is actually stored in the table
column. The locator is used internally by Oracle to reference the actual contents of
the LOB.
Once the file has been copied into the database by the SQL*Loader program, the
contract_text column can be indexed in the same way as previously. The MS Word
format is detected automatically when a context index is created and formatting is
stripped from words in the index (Figure 21.7).
The system has defaults for the way in which context indexes are created. It
detects the column type and uses filtering for binary column types like long raw,
BLOB or BFILE. This avoids formatting characters appearing in the index. If English
is the language specified when the database was installed a standard stop list is used
to eliminate unnecessary words from the index. Fuzzy and stemming queries are
enabled and supported by the index.
TEXT AND DOCUMENT MANAGEMENT 533
The contains function is used to search the text column; the function appears in the
where clause, specifying the search condition. Word matches are made by using a
context index on the column holding the loaded text.
The contains function takes three parameters: the column to be searched, the
search term and an optional numeric label (Figure 21.8).
UFA_ID SUBSTR(TEXT,1,50)
---------- --------------------------------------------------
VB376512 Lucius has a knee tendon injury. He is not expect
AK675280 Antonio has an Achilles tendon strain which is res
2 rows selected
The function returns a relevance score based on how many documents contain
the search term compared to the total number of documents in the collection.
Testing the returned value as greater then zero gives those rows holding the search
term (Figure 21.9).
ufa_id club
---------- ------------
VB376512 Purity FC
1 row selected.
When the text is held in a LOB column, it cannot easily be returned directly in an
SQL query. The LOB could be holding as much as 4 GByte. The first row of
contract_details actually holds about 4 MByte.
The relevance score is associated with the label in the contains function and can
be determined using the score operator (Figure 21.10). Since there is only one docu-
ment in the contract_details table, any successful query will have a relevance of 100.
As more documents are added, the relevance score will become more meaningful
and can be used for ranking the returned rows.
Proximity searching is also enabled, using the near operator. This takes as a
parameter a list of search words or phrases and a specification of how much separa-
tion may occur between words in the list. If the order variable is set to TRUE, the
terms must occur in the order they appear in the list. Notice in Figure 21.11 that the
near clause is quoted within the contains operator. Max_span is set to 50, so the
contains function will return greater than zero for any row where the three search
terms occur within 50 words of each other.
1 row selected.
The normal Boolean operators (AND, OR, NOT) can also be applied to search
terms as well as stemming and fuzzy searches. Thesaurus searches are also available
and a special thesaurus may be loaded for technical applications
All of the context features can be used by an external program such as a Java
application using JDBC. The user may then be presented with a graphical user
interface that constructs the select statement from user responses. The query
returns a hitlist that may be ordered by relevance. The document contents can then
be delivered to the application and displayed in a variety of ways.
Typically, a text query application allows the user to view the documents
returned by a query. The user selects a document from the hitlist and then the
application presents the document in some form. With interMedia Text, the docu-
ment can be rendered in different ways. For example, documents can be presented
with query terms highlighted.
Oracle Text provides most, if not all, of the standard indexing and querying
mechanisms and operators found in a typical IR system. In addition, it offers struc-
tured data describing the text data and seamless integration with a general organi-
zational information system. All queries, for whatever type of data, use SQL.
These advantages are won at a cost. The loading of documents into the database
where they will be subject to transactional control is complicated, and the tools
provided have not been made easier to use in several generations of Oracle release
versions. The administration of specialized thesauri to underpin theme searching
and expanded and constricted term searching is similarly confusing and non-
instinctive.
Enhancing the text retrieval facilities beyond the functions provided to incorpo-
rate sophisticated user relevance feedback systems or applications using artificial
intelligence involves the same stages of systems development as for any major
information system. Easy production of a graphical user interface can be accom-
plished using, for example, the Java language with access to the database through
JDBC. In addition to the text management facilities, such a system would have
built-in reliability and data safety stemming from the standard transactional
control mechanisms of Oracle.
Query System learns user preferences for selecting relevance. This can
feedback be based on document signatures: a statistic that records when
a word has the preferred meaning and is accompanied by
certain other words
21.5 Î EXERCISE
(i) Design a schema for a large-scale document collection. Sample collections are
available for download from the Internet. One example is the RFC (Request for
Comment) archive2. Load the sample collection into the database.
Alternative tracks:
G Design and implement a user interface using a Java application to make
queries and display results.
G Perform interactive queries with explain plan to analyze performance.
G Analyze recall and precision with the sample data.
2 http://www.rfc-editor.org/download.html
BIBLIOGRAPHY
Aho, A. V., Beeri, C. and Ullman, J. D. (1979) The theory of joins in relational databases. ACM
Transactions on Computer Systems , 4(3), 297–314.
Armstrong, W. (1974) Dependency structures of database relationships. In Proc. IFIP Congress ,
Geneva, pp. 580–583.
Armstrong, W. W. and Delobel, C. (1980) Decomposition and functional dependencies in rela-
tions. ACM Transactions on Database Systems 5(4), 404–430.
Bayardo, R. J. Jr and Agrawal, R. (1999) Mining the most interesting rules. In Proc. 5th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 145–154.
Bernstein, P. A. (1976) Synthesizing Third Normal Form relations from functional dependencies.
ACM Transactions on Database Systems, 1(4), 277–298.
Birrell, A. and Nelson, B. J. (1984) Implementing Remote Procedure Calls. ACM Transactions on
Computer Systems , 2(1), 39–59.
Booch, G., Jacobson, I. and Rumbaugh, J. (1998) The Unified Modeling Language User Guide .
Reading, MA: Addison-Wesley.
Burns, T., Fong, E. N., Jefferson, D., Knox, R., Mark, L., Reedy, C., Reich, L., Roussopoulos, N.
and Truszkowski, W. (1986) Reference Model for DBMS Standardization, Database Archi-
tecture Framework Task Group (DAFTG) of the ANSI/X3/SPARC Database System Study
Group. ACM SIGMOD Record, 15(1), 19–58.
Camps, R. (1996) Domains, relations and religious wars. ACM SIGMOD Record, 25(3), 3–5.
Carey, M. J. (1994) Parallel Database Systems in the 1990s. In Proceedings of the 1994 ACM
SIGMOD International Conference on Management of Data . ACM Press. p. 466.
Carey, M. J. and Livny, M. (1989) Parallelism and concurrency control performance in distrib-
uted database machines. In Proceedings of the 1989 ACM SIGMOD International Confer-
ence on Management of Data. ACM Press. pp. 122–133.
Carey, M. J. and Stonebraker, M. (1984) The performance of concurrency control algorithms for
database management systems. In VLDB 1984, Proc. Tenth International Conference on
Very Large Data Bases , Singapore. San Francisco, CA: Morgan Kaufmann. pp. 107–118.
Carey, M. J., Mattos, N. M. and Nori, A. (1997) Object–relational database systems: principles,
products and challenges (Tutorial). Proceedings of the ACM SIGMOD International Confer-
ence on Management of Data.
Chamberlin, D. D. (1976) Relational Data-Base Management Systems. ACM Computing Surveys ,
8(1), 43–66.
Chaudhuri, S. and Dayal, U. (1997) An overview of data warehousing and OLAP technology.
ACM SIGMOD Record, 26(1), 65–74.
Chen, P. P. (1976) The entity–relationship model – toward a unified view of data. ACM Transac-
tions on Database Systems , 1(1), 9–36.
Codd, E. (1970) A relational data model for large shared data banks. Communications of the
ACM , 13(6), 377–387.
537
538 SYSTEMS BUILDING WITH ORACLE
Codd, E. F. (1971) Normalized Data Structure: A Brief Tutorial. Proceedings of 1971 ACM-
SIGFIDET Workshop on Data Description, Access and Control . ACM Press. pp. 1–17.
Codd, E. F. and Associates (1993) Providing OLAP (on-line analytical processing) to user-
analysts: an IT mandate. Commissioned by Arbor Software (now Hyperion Solutions).
Colliat, G. (1996) OLAP, relational, and multidimensional database systems. SIGMOD Record,
25(3), 64–69.
Davidson, S. B., Garcia-Molina, H. and Skeen, D. (1985) Consistency in partitioned networks.
ACM Computing Surveys , 17(3), 341–370.
Database Task Group (1971) CODASYL: DBTG Report . Technical Report, ACM.
Deux, O. (1990). The story of O 2. IEEE Transactions on Knowledge and Data Engineering, 2(1),
91–108.
DeWitt, D. J. and Gray, J. (1990) Parallel database systems: the future of database processing
or a passing fad? SIGMOD Record, 19(4), 104–112.
Dobkin, D. P., Jones, A. K. and Lipton, R. J. (1979) Secure databases: protection against user
influence. ACM Transactions on Database Systems , 4(1), 97–106.
Elmagarmid, A. K. and Du, W. (1990) A paradigm for concurrency control in heterogeneous
distributed database systems. In Proceedings of the Sixth International Conference on
Data Engineering. IEEE Computer Society Press. pp. 37–46.
Fadous, R. and Forsyth, J. (1975) Finding candidate keys for relational data bases. In Proceed-
ings of the 1975 ACM SIGMOD International Conference on Management of Data . ACM
Press. pp. 203–210.
Garcia-Molina, H. and Barbará, D. (1985) How to assign votes in a distributed system. Journal
of the ACM , 32(4), 841–860.
Garcia-Molina, H. and Kogan, B. (1987) Achieving high availability in distributed databases. In
Proceedings of the Third International Conference on Data Engineering . IEEE Computer
Society. pp. 430–440.
Gotlieb, L. R. (1975) Computing joins of relations. In Proceedings of the 1975 ACM SIGMOD
International Conference on Management of Data . ACM Press. pp. 55–63.
Gray, J. (ed.) (1998) The Benchmark Handbook, 2nd edn. San Francisco: Morgan Kaufmann.
Available on-line at: http://www.benchmarkresources.com/handbook/introduction.html.
Gray, J. and Graefe, G. (1997) The five-minute rule ten years later, and other computer storage
rules of thumb. ACM SIGMOD Record, 26(4), 63–68.
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F. and
Pirahesh, H. (1997) Data cube: a relational aggregation operator generalizing group-by,
cross-tab, and sub totals. ACM Journal of Data Mining and Knowledge Discovery , 1(1),
29–53.
Harman, D. (ed.) (1992) The First Text REtrieval Conference (TREC1), National Institute of Stan-
dards and Technology, Special Publication 500-207. (The Text REtrieval Conference (TREC)
started in 1992 as part of the TIPSTER Text program. Its purpose is to support research
within the information retrieval community by providing the infrastructure necessary for
large-scale evaluation of text retrieval methodologies. TREC reports are available at http://
trec.nist.gov/.)
Inmon, W. (1996) Building the Data Warehouse . New York: John Wiley & Sons.
Jarke, M. and Koch, J. (1984) Query optimization in database systems. ACM Computing
Surveys , 16(2), 111–152.
Khoshafian, S. and Valduriez, P. (1987) Sharing, persistence, and object-orientation: a database
perspective. In Advances in Database Programming Languages. ACM Press/Addison-
Wesley. pp. 221–240.
Khoshafian, S., Valduriez, P. and Copeland, G. P. (1988) Parallel query processing for complex
objects. In Proceedings of the Fourth International Conference on Data Engineering . IEEE
Computer Society. pp. 210–217.
Kimball, K. (1996) The Data Warehouse Toolkit . New York: John Wiley & Sons.
BIBLIOGRAPHY 539
Kung, H. T. and Robinson, J. T. (1981) On optimistic methods for concurrency control. ACM
Transactions on Database Systems , 6(2), 213–226.
Labio, W. and Garcia-Molina, H. (1996) Efficient snapshot differential algorithms for data ware-
housing. In Proceedings of 22th International Conference on Very Large Data Bases, VLDB
’96. pp. 63–74.
Martin, J. (1990) Information Engineering. Englewood Cliffs, NJ: Prentice Hall (three volumes).
Munz, R. and Krenz, G. (1977) Concurrency in database systems – a simulation study. In
Proceedings of the 1977 ACM SIGMOD International Conference on Management of Data .
ACM Press. pp. 111–120.
Ng, W. and Yau, J. (2002) Adapting web interfaces by WHAT. In International Conference on
Adaptive Hypermedia and Adaptive Web Based System AH’2002 . Lecture Notes in
Computer Science Vol. 2347. Berlin: Springer-Verlag. pp. 551–555.
Oracle Corp. (2002) Oracle9i Application Developer’s Guide – Object-Relational Features.
Chapter 1, Introduction to Oracle Objects . Release 2 (9.2).
Ries, D. R. and Stonebraker, M. (1977) Effects of locking granularity in a database management
system. ACM Transactions on Database Systems , 2(3), 233–246.
Roussopoulos, N. (1998) Materialized views and data warehouses. ACM SIGMOD Record, 27(1),
21–26.
Sakai, H. (1980) Entity–relationship approach to the conceptual schema design. In Proceedings
of the 1980 ACM SIGMOD International Conference on Management of Data . ACM Press.
pp. 1–8.
Salton, G. (1968) Automatic Information Organization and Retrieval. New York: McGraw-Hill.
Sharman, G. C. H. (1976) A constructive definition of Third Normal Form. In Proceedings of the
1976 ACM SIGMOD International Conference on Management of Data . ACM Press. pp.
91–99.
Simsion, G. (1994) Data Modelling Essentials . New York: Van Nostrand Reinhold.
Skeen, D. (1981) Nonblocking commit protocols. In Proceedings of the 1981 ACM SIGMOD
International Conference on Management of Data . ACM Press. pp. 133–142.
Spertus, E. and Stein, L. A. (1999) A relational databases interface to the World-Wide Web. In
Proceedings of the Fourth ACM conference on Digital Libraries . ACM Press. pp. 248–249.
Stonebraker, M., Wong, E., Kreps, P. and Held, G. (1976) The design and implementation of
INGRES. ACM Transactions on Database Systems , 1(3), 189–222.
Thomas, R. H. (1979) A majority consensus approach to concurrency control for multiple copy
databases. ACM Transactions on Database Systems , 4(2), 180–209.
Tomasic, A., Garcia-Molina, H. and Shoens, K. A. (1994) Incremental updates of inverted lists
for text document retrieval. In Proceedings of the 1994 ACM SIGMOD International
Conference on Management of Data. ACM Press. pp. 289–300.
Van Rijsbergen, C. J. (1979) Information Retrieval . London: Butterworths. Available at http://
www.dcs.gla.ac.uk/Keith/Preface.html.
Welty, C. and Stemple, D. W. (1981) Human factors comparison of a procedural and a
nonprocedural query language. ACM Transactions on Database Systems , 6(4), 626–649.
Yourdon, E. (1989) Modern Structured Analysis . Englewood Cliffs, NJ: Prentice Hall/Yourdon
Press.
Zhang, A. and Elmagarmid, A. K. (1993) A theory of global concurrency control in
multidatabase systems. VLDB Journal, 2(3), 331–360.
APPENDIX
I am Sir Oracle,
And when I ope my lips let no dog bark!
William Shakespeare, The Merchant of Venice
The Oracle software can be downloaded from the Oracle web site (warning: 1.5
GByte) or purchased for minimal cost on a set of three CDs. Install the complete
software on the hard drive of your PC. Follow the install instructions, choosing the
Enterprise Edition install and select all the software components available. Your
will need about 3.5 GByte of free space on your disk. There are a few choices to be
made during installation:
G Select the standard example database without replication
G Specify a name for the Oracle instance (e.g. SCOTORA). This is the name of the
Oracle process.
G Specify the communications protocols that will be configured for use (normally
TCP/IP).
G Specify a different name for the SID (System Identifier), e.g. SCOTORACLE. This
is the name of the database to which the process will connect.
G Specify passwords for the SYS and SYSTEM users. These are two DBA accounts
that own the data dictionary structures. Write down the passwords.
If you have shared access to a network host running the full version of Oracle 9i
then refer to your systems administrator for information on using the Oracle Enter-
prise Manager, passwords, connect strings and so on.
540
APPENDIX 541
This book uses the standard configuration choices made for a local Enterprise
Edition installed on a single PC running Windows XP Professional Edition; you
may not see precisely the same screens if you are using any other configuration.
You will find that shortcuts to the main components of Oracle will make using
the various packages easier. Using Windows Explorer, go to the directory C:\Docu-
ments and Settings\All Users\Start Menu\Programs\Oracle - OraHome92. Create a
shortcut for this folder and drag it to your desktop.
The standard installation configures the Oracle server to start when the machine
is switched on. When the machine is shut down the Oracle server will also be shut
down. This is normally safe, as Oracle will recover automatically from an unsched-
uled shutdown. To be absolutely safe, use the Enterprise Manager to dismount the
database and shut down the Oracle instance before shutting down the PC.
Warning: the Enterprise Manager is a very powerful tool, especially when you are
logged in as a DBA. You can ruin the database and make it inoperable if you make
mistakes. If in doubt, cancel any operation about which you are not sure.
The user who will develop the UFL application must now be created. Click on the
Security folder and then on the Users folder. Right click on the Users folder and
select Create from the dropdown menu. The Create User Dialog appears. Specify a
user name and a password in the General tab of the dialog (Figure A.2); click on the
Role tab and from the list of different roles available, allocate the Connect and
Resource roles to this new user. To do this simply select the CONNECT role and use
the arrow symbol to copy it to the bottom section. Do the same with the
RESOURCE role. These will allow the new user to connect to the instance and
create tables, indexes and other structures.
Developer users should generally have the same rights and privileges as operator
users. This prevents a situation where parts of the application work for developers
because of their higher status and operators cannot use the facility. When the
application is deployed, parts of it can be segregated between users based on their
group membership or their profile, not their rights.
Click on the System tab and allocate the UNLIMITED TABLESPACE privilege
from those available using the arrow symbol. The new user will be able to create
tables, indexes and other structures and populate them with data. The SHOW SQL
button opens a frame containing the SQL commands that will be executed when
the Create button is clicked.
APPENDIX 543
The Enterprise Manager can also be used to inspect the properties of tables. Select
the Schema folder and then select the folder SCOTT. A list of SCOTT’s tables and
indexes appears. Right clicking on the EMP table displays a pop-up menu; selecting
View|Edit shows the structural definitions (attributes) of the table (Figure A.3).
Figure A.3 Oracle Enterprise Manager: View/Edit table. The attributes (column
definitions) of the EMP table.
Clicking on the tabs reveals other kinds of information about the selected table
(Figure A.4).
The structure of the index can be examined by selecting it in the SCOTT schema,
right-clicking and choosing View|Edit (Figure A.5).
Right-clicking on the EMP table and selecting View|Edit Contents from the pop-up
shows the that the table has 14 rows (Figure A.6).
Nothing is private from the SYS user acting as SYSDBA. Allocating this role to
anyone for a production database creates a position of great trust and
responsibility.
544 SYSTEMS BUILDING WITH ORACLE
Figure A.4 Oracle Enterprise Manager: Constraints tab. SCOTT has defined two
constraints for this table. PK_EMP defines a primary key that has resulted in an
index called PK_EMP being established.
Figure A.5 Oracle Enterprise Manager. Details of the PK_EMP index. It is asso-
ciated with the table EMP and has one attribute, EMPNO.
APPENDIX 545
procedure fillCustomer is
a integer; b integer;
c integer; d integer;
e integer; f integer;
g number(6,2);
BEGIN
dbms_random.initialize (76835);
for a in 1..100000 loop
select newcustomer.nextval into d from dual;
insert into customers (cust_id,surname,forename)
values(d,getstring(15), getstring(12));
UFA_ID Contracts
Discipline
Date UFA_ID
Joined
Result Club
Left
Division UFA_ID
Stadium
Surname
Clubs Forename
Players
Name Nationality
Manager Squad_No
Position
UFA_ID Type
Teamsheets UFA_ID
Match_ID
Match_ID
Goals
Figure A.17 Teamsheets entity.
Minute
AwayClub
Figure A.20 Seasons entity.
Matches
MatchDate
Season
Table A.5
COUNT Returns the number of elements that are contained within the
collection.
EXISTS Checks whether an element has been deleted or not – remember,
nested tables can contain sparse elements, i.e. gaps are allowed! It
does not work with Varrays, as these are dense collections.
FIRST Returns the first (existing) element’s subscript number (index) in a
collection. In general this will be 1 – unless element 1 has been
deleted!
LAST Returns the last (existing) element’s subscript number (index) of a
collection.
LIMIT This method is mainly of use to varrays. It returns the maximum
number of elements a collection can contain. This method returns
NULL with associated arrays and nested tables, as both are
unbounded collection types.
PRIOR Returns the subscript number of the previous (existing) element in
the collection class. If there is no preceding element, NULL is
returned.
NEXT Returns the subscript number of the next (existing) element in the
collection class. If there is no next element, NULL is returned.
EXTEND [(n)] Allows the creation of one or more NULL elements in the collection
set. This method does not work with associated arrays as they are
implicitly extended as elements are added.
TRIM [(n)] Physically removes one or more elements from the end of a collec-
tion set.
DELETE [(i)] Used to delete specific elements from a collection, where (i) refers
to an element’s subscript. If no specific element is specified, all
elements are deleted. This method does not work with varrays, as
these are dense collections that do not allow gaps between
elements.
INDEX
abort 148, 154, 157, 235, 511 Codd 16, 73, 84, 403–405
Access 6, 10, 12, 14, 17, 107, 133, 266, 277, 338, codebase 435
341, 495, 499, 507, 516–519 codepage 435–436
acyclic 149 collating sequence 3, 488
ambiguity 111, 127, 300 collections 23, 28, 40, 289, 291, 555
ANSI 100, 159, 176, 187, 422, 498, 504 columns 95, 97, 101–105, 109–111, 186–187, 240,
Apache 467, 469 242–243, 339–343, 453
API 265–268, 277, 517 comparison operators 102, 113, 129, 370
arbitration 152, 410 compilation 192, 203, 214, 263, 308, 422, 433, 436,
atoms 89–91, 98 440–411, 457
axiom 76, 79, 86 complex objects 22–23, 25, 28–29
compliant 269, 517
bandwidth 410, 477–480, 489 component 313, 315, 317–318, 320–323, 340–341,
Bayesian 400, 402 435–436, 455
BCNF 84–85, 87, 102, 179 composite key 59, 67, 81, 85, 243
Booch 33 concatenate 74, 118, 323, 332–333, 354, 388,
boolean 39, 41, 202, 231, 273, 282–283, 288, 332, 390–391
334–336, 364, 535 conceptual schema 62, 494, 498, 500–505, 508, 521
Borland 268, 308 concordance 525
bottleneck 409, 411, 414, 482, 513 concurrency 143–144, 146, 160–163, 404, 409, 496
BoyceCodd 84–85 acquire locks 153, 163–164
bridge 268, 497 conflict 11, 140, 143–144, 146–156
browser 422, 432–435, 440, 445, 448, 457, 460–461, locks 152–155, 162–166, 478–479, 511–513
464, 466 optimistic 152, 155–156, 160, 165–166, 511–512
buffers 133–134, 136, 138, 245, 247–248, 369–372, pessimistic 152, 155, 165, 511–512
374, 416–417, 420, 424, 428 release locks 162, 177, 535
unrepeatable read 161
cache 245, 416–417, 477, 479 versions 11, 23, 25–27, 38, 74–76, 82–83,
candidate keys 58–59, 71, 80–87 103–104, 133, 157, 161, 163, 166, 176–177,
cartesian product 95 268–269, 308, 319, 417, 464–465, 467, 469,
cascade delete 106 493, 535, 540, 545
cascadeless schedule 149 conformance 49, 52, 100, 139, 143, 187, 374, 464
CGI 447–448 congruent 499
checkpoints 158, 191 conjunction 93, 96, 98
CLI 265, 268, 270, 277 connectors 374, 527, 536
client 25, 252, 260, 380, 404, 473 contention 363, 417, 479
clusters 366–369, 371–374, 404, 414–415, 421, CORBA 33, 269
479–481, 489, 497 correctness 73, 79, 88, 99, 151
COBOL 6–8, 14, 23, 264 crossbar 477–478, 489
556
INDEX 557
DTD 462, 464 functions 29, 220–221, 232, 236, 276, 286–287,
489, 559
engine 192–193, 203, 211, 214, 379, 401–402, 404, fuzzy 532, 535
447–448, 451–453, 457–458, 462 garbage collection 275
Enterprise Java Bean 455–456 gateways 249, 388, 447, 494, 500
entities 50, 53, 62, 71, 74, 308 glyphs 314–315
attributes 50–51, 62, 71, 74, 102, 319, 374 GMT 232
tuples 16, 71, 78, 95, 98 GNU 422
entity relationship diagram 50, 52, 62–64, 68–69, Golden Rule 52
71–73, 99, 102, 308 grammar 89–90, 93–94, 98, 142, 189, 524
enumeration 444–445 granularity 153, 164, 230, 236, 281, 376, 386
equality 17, 26–27, 29, 31, 52, 96, 101, 112, 129, grouping columns 393–400
175, 199, 340, 349, 362, 369, 371, 493 GUI 260, 275, 432, 470
escalation 153, 164
events 228–236, 279–280, 299–300, 303–307, 310, head 5, 13
322–323, 325–326, 328–329, 418 hitlist 535
Excel 517 hits 527
exceptions homogeneous 494
handling 7, 16, 191, 208–209, 234, 259, 305, 442 host 500
raising 25, 164, 210, 257, 336, 489 HTML 433–434, 440, 448, 457–458, 460–462, 466
throwing 272, 442, 452 hub 5
trapping 206
try/catch 323, 336, 352, 355, 441, 452, 458 identifier 7–8, 22, 58–59, 452
exclusive locks 155 identity 16, 23, 27, 56
exist 60–62, 77, 208–211 IDMS 15, 18
expectations 139, 411–412, 414 IIOP 269
explain impedance mismatch 26–29, 194, 273, 380
histogram 419 IMS 11, 14, 18
explain plan 45, 71, 133–135, 137–138, 166, 175, index scan 362–363, 426
177, 188–189, 246, 371–373, 377, 407, 418–419, indexes 7–8, 29, 238, 243, 253–254, 361, 366–367,
423–426, 428–429, 505–506, 508–509, 515, 536, 373–374, 533
545 B*-tree 362
Utlxpls 134, 137, 177, 423–426 bitmap 363–365
explicit 165, 295–296 hash 136, 367–369, 372–375, 416, 419, 486, 488
export 494, 499–501, 504–505, 507–508 inverted 525
extensibility 27 reverse key 363
extensible 461–462 traversal 14–15, 23
extension 29, 91, 98, 239–240, 268, 270, 290, 300, trees 240–242, 362
393–394, 407 inheritance 21, 41–42, 277, 380, 535
extents 49, 238–240, 253–254 initiate 17, 35, 152–153, 200, 212, 214, 230, 233,
248, 252, 282, 285, 293, 336, 488–489
fact table 384–387 inline 307
failover 475, 489 integer 40, 349, 404
FASMI 406 integrator 420
faults 75, 474 integrity 50, 61, 100, 122, 144, 163, 209, 511
fetch 193, 196, 198, 202, 294, 307, 363, 417–418, interface 265–266, 447, 500, 539
427–428, 479 InterMedia 523, 535
filespaces 140 Internet 30, 166, 249, 269, 407, 422, 431–432, 436,
filters 135, 137, 177, 230–231, 348, 363, 424–425, 470, 522, 524, 529, 536
527–528, 532 interoperability 178
forecast 377 intersection 95, 98, 156
foreign key 59–66, 68–71, 104–106, 109–110, 385 inversion 525–527, 535
frequency 386, 421 ISO 100, 159, 176, 187, 251, 467, 469
INDEX 559
RHS 78 sets 16, 62, 73, 88, 95, 123, 193, 204, 321, 330,
RMI 262–264, 269, 276, 439–442, 444–445, 447, 341–342
457, 470 SGA 245–248, 370, 416
activation 264, 279, 321, 356 SGML 461–462, 471
Rmic 263, 442 shadow 155, 511
Rmid 264 shared 406, 477
Rmiregistry 443, 445 shared memory 245–246, 416, 477–479, 481, 489,
ROLAP 382 497
roles 3, 41, 53–54, 56, 61, 69–71, 79, 92–93, shutdown 305, 541
107–108, 123, 175, 189, 192, 221, 248, 404, signals 5, 9, 355, 528
455–456, 500, 504, 523, 530, 541–543 simulation 420–422, 522
row 16, 122–124, 127, 135, 137, 163, 177, 184, 238, skew 403, 420, 488
363, 367, 429 slave 474
Smalltalk 19, 23
scalability 33, 414, 418, 422, 430, 455, 473, 475, SPARC 498, 504
478–482, 487, 489, 495–496 spawn 253, 261, 264, 447–448
scale 3, 6, 101, 139, 414, 421, 455, 475–477, 479, spreadsheet 346, 404, 406
495, 536 SQL
scheduler 6, 257–259, 276–277, 409–410 aggregation 170, 405
schemas 10–11, 16–17, 26, 29, 299, 303, 307, 407, alter 11, 18, 100, 106–107, 109, 118, 123, 216,
498–499, 521 230, 404, 427
SCN 160 analyze 134, 405–406, 429, 529, 558
scope 41, 95, 109, 221, 225–227, 295, 323, 330, average 136, 144, 169–170, 172, 174–178, 187,
332, 382, 386, 432, 439, 459, 497, 514 369, 476
script 134, 214, 418, 545, 547 constraints 102–106, 109–110, 118, 122
search term 527, 529, 533–535 count 167–178
security 11, 245, 440, 542 create 99–108, 110, 118, 120–121
segments 238 data definition language 100, 123, 164, 299, 304,
semantics 67, 90, 143, 254, 289, 428, 464, 505–506, 485, 531
524, 527, 536 data manipulation language 100, 123, 163–164,
semaphores 9, 259, 478 179, 187, 229–230, 233–235, 280–282,
sentence 90–91, 94, 98, 524–525, 527 299–303, 511
sequences 119–121, 133, 142, 144, 190–191, 193, Delete 61, 106, 122, 124, 234, 330
235, 248, 400, 402, 415–416, 423, 512, 525, 550, Distinct 54, 74, 259, 264, 333, 341–343, 363, 406,
558 457, 486, 494, 523, 527
Currval 119–120 Drop 100, 103–107, 109, 122–123, 304, 314, 316,
Nextval 119–121, 210, 554–555, 559 321, 341, 427, 517, 545, 550, 558
sequential 4, 7–8, 29, 155, 289, 291, 294, 373–374, Dual 120, 333, 559
400, 409, 488, 513 Grant 16, 100, 108–109, 123, 154, 499, 512–513,
serial 4, 6, 143–148, 155–156, 160, 162–163, 530
294–295, 372, 476, 482 Group By 167–178, 187, 353, 389–396, 398–399,
serialization 263, 508, 510 407, 423, 560
recoverable schedule 149 Having 173–178, 187
schedules 85–86, 144, 146–151, 155, 157, 165, Insert 109–110, 119–120
258, 276, 410–411, 417, 430 Max 170–172, 187, 280, 282–283, 534
view equivalence 147 Min 170, 187
Servererror 305 Order by 114–116, 130–131, 168–169, 344, 526
serveroutput 203–204, 215, 228, 297 restrictions 69, 112–115, 126, 129, 131, 133,
servers 10, 191, 246, 254, 259–260, 265–266, 276, 136–137, 140
363, 404, 453, 458, 460, 470–471, 489, 493, 511 ROWID 240, 242, 339, 345, 361–363, 366, 370,
servlets 447–448, 471 374, 426, 429
sessions 252 Select 109, 111–116, 118–121
set membership 16, 28, 40, 50–54, 58–60, 70, 75, sub-queries 129, 131, 133, 506
87, 96, 129, 270, 491, 516, 518–520, 542
INDEX 563
Sum 95, 144, 170, 178, 187, 221, 238, 260–261, switch 5, 258, 276–277, 325–326, 328, 474,
386, 409, 423, 508 477–478, 489, 541
Synonym 499, 501 swizzle function 25, 29
Union 95, 118, 122, 130, 132–133, 390–393, Sybase 18, 100, 191, 265
498–499, 501, 503–504, 558 symbols 51, 62–63, 71, 88, 97–98, 112, 128, 542
Update 120–125 symmetric 242, 474, 477, 489
Views 177–186, 299–303, 428–430, 491–506, symmetric multi-processor 477–479
514–515 synchronization 9, 83, 195, 283, 326, 339–340, 342,
SQL1999 393, 399 430, 445, 474–475, 478, 489, 518, 533
SQL2 100 synchronous 261
SQL3 100, 178 synonymous 382
SQL89 100 SYSDATE 127, 129, 210, 226, 228, 333, 389–392,
SQL92 100, 159–160 394–396, 398–399, 559
SQLJ 190 Sysdba 108
stack 250–251, 260–261, 267, 318, 325 systematic 418, 527
staging 380–381, 404
standalone 118, 221, 224, 422, 432–433, 447 table scan 155, 361, 369–373, 419, 424, 484, 486,
standards 99–100, 159, 176–178 488
startup 156, 305–306 tables 104, 123, 226, 238, 242, 254, 287, 296, 304,
state 8, 34, 122, 141–143, 148, 152, 154, 165, 309, 317–318, 321, 325–328, 367, 370–371, 374,
257–258, 443, 510–511 458, 469, 517
state transition diagram 34 hash clustered 367–369, 372
stateful 471 index-organized 367–368
stateless 471 tablespaces 108, 125, 151, 157, 159, 165, 238–239,
static 294, 346, 380, 404, 416, 443, 447, 452, 457, 244–245, 253–254
470–471 tags 338, 434–436, 447–448, 451, 453, 457–458,
statistics 133–134, 166, 224, 370, 373, 377–378, 460–464, 466
401, 406, 415, 417, 419, 421, 423, 425, 427, 429, taxonomy 493–494, 511
509, 529, 536, 558 text 523–524, 527, 529–530, 535
status 61, 84, 105, 192, 247, 310, 313–314, 317, text retrieval 527, 535
324, 327–328, 355, 421, 437–438, 444–445, 542 themes 447, 535
stemming 412, 527, 532, 535–536 thesaurus 529, 535
stop list 527, 532 theta join 96, 98, 125, 370
storage 4–6, 23–24, 26–27, 237–238, 244, 248 threads 247–248, 252–253, 259, 261, 276–277, 410,
strings 21, 52, 97, 101, 320, 323–330, 332–335, 337, 414, 416, 440, 448, 498
350–352, 355–356, 524–525 three tier 440–441
strongly typed 296 tightly coupled 478
stub 263–264, 440, 442–443 time dimension 384–386
stylesheet 467, 469 Tkprof 427–428
sub-sets 53, 58–59, 73, 80–81, 86, 167, 179, 299, TNS 252
388, 418, 475, 487, 489, 499–500 tokens 251, 513
substitution 31, 201–202, 205, 213, 220, 291, 299, Tomcat 448, 458, 460, 462
307 topology 472
suffixes 332, 335, 354, 527 TPC 420–422, 430
summaries 378, 384 TPM 421
summarization 167, 175, 270, 377–379, 390, 393, transactions 141–142, 146, 155, 157–158, 165, 254,
397 415, 420, 496, 513, 521
Sun 268, 277, 308, 457 atomicity 21, 27, 39–40, 74, 86, 91, 141, 143, 151,
superkeys 57–59, 71, 78–82, 86 165, 197, 300, 482, 509–511
supertypes 26 commit 143, 148, 151, 158, 162, 165, 248, 511
supervisor 414 consistency 8, 23, 52, 143, 161–163, 165–166,
swap 416 244, 303, 380, 405, 414–415, 423, 473, 513
Swing 316–317, 323, 350 durability 151, 157, 166, 509, 511
564 INDEX