Dbms

Chapter 1: Introduction
Purpose of Database Systems
View of Data
Data Models
Data Definition Language
Data Manipulation Language
Transaction Management
Storage Management
Database Administrator
Database Users
Overall System Structure

Database Management System
(DBMS)interrelated data
Collection of
Set of programs to access the data
DBMS contains information about a particular enterprise
DBMS provides an environment that is both convenient and
efficient to use.
Database Applications:
 Banking: all transactions
 Airlines: reservations, schedules
 Universities: registration, grades
 Sales: customers, products, purchases
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax
deductions
Databases touch all aspects of our lives

Purpose of Database System
In the early days, database applications were
built on top of file systems
Drawbacks of using file systems to store
data:
 Data redundancy and inconsistency
 Multiple file formats, duplication of information in
different files
 Difficulty in accessing data
 Need to write a new program to carry out each new
task
 Data isolation — multiple files and formats
 Integrity problems
 Integrity constraints (e.g. account balance > 0)
become part of program code
 Hard to add new constraints or change existing
ones

Purpose of Database Systems
(Cont.)
Drawbacks of using file systems (cont.)
 Atomicity of updates
 Failures may leave database in an inconsistent state
with partial updates carried out
 E.g. transfer of funds from one account to another
should either complete or not happen at all
 Concurrent access by multiple users
 Concurrent accessed needed for performance
 Uncontrolled concurrent accesses can lead to
inconsistencies
 E.g. two people reading a balance and updating it at the
same time
 Security problems
Database systems offer solutions to all the
above problems

Levels of Abstraction

Physical level describes how a record (e.g.,
customer) is stored.
Logical level: describes data stored in database, and
the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
View level: application programs hide details of data
types. Views can also hide information (e.g.,
salary) for security purposes.

View ofaData system
An architecture for database

Instances and Schemas
Similar to types and variables in programming languages
Schema – the logical structure of the database
 e.g., the database consists of information about a set of
customers and accounts and the relationship between
them)
 Analogous to type information of a variable in a program
 Physical schema: database design at the physical level
 Logical schema: database design at the logical level
Instance – the actual content of the database at a particular point in time
 Analogous to the value of a variable
Physical Data Independence – the ability to modify the physical schema without changing
the logical schema
 Applications depend on the logical schema
 In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.

Data Models
A collection of tools for describing
 data
 data relationships
 data semantics
 data constraints
Entity-Relationship model
Relational model
Other models:
 object-oriented model
 semi-structured data models
 Older models: network model and
hierarchical model

Entity-Relationship Model
Example of schema in the entity-relationship model

Entity Relationship Model
(Cont.)
E-R model of real world
 Entities (objects)
 E.g. customers, accounts, bank branch
 Relationships between entities
 E.g. Account A-101 is held by customer
Johnson
 Relationship set depositor associates customers
with accounts
Widely used for database design
 Database design in E-R model usually
converted to design in the relational model
(coming up next) which is used for storage
and processing

Relational Model Attributes
Example of tabular data in the relational model
customer- customer- customer- account-
Customer-
name street city number
id
192-83-7465 Johnson
Alma Palo Alto A-101
019-28-3746 Smith
North Rye A-215
192-83-7465 Johnson
Alma Palo Alto A-201
321-12-3123 Jones
Main Harrison A-217
019-28-3746 Smith
North Rye A-201

Data Definition Language (DDL)
Specification notation for defining the database
schema
 E.g.
create table account (
account-number char(10),
balance integer)
DDL compiler generates a set of tables stored in a
data dictionary
Data dictionary contains metadata (i.e., data about
data)
 database schema
 Data storage and definition language
 language in which the storage structure and access
methods used by the database system are specified
 Usually an extension of the data definition language

Data Manipulation Language
(DML)
Language for accessing and manipulating
the data organized by the appropriate
data model
 DML also known as query language
Two classes of languages
 Procedural – user specifies what data is
required and how to get those data
 Nonprocedural – user specifies what data is
required without specifying how to get those
data
SQL is the most widely used query
language

SQL
SQL: widely used non-procedural language
 E.g. find the name of the customer with customer-id 192-83-
7465
select customer.customer-name
from customer
where customer.customer-id = ‗192-83-7465‘
 E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‗192-83-7465‘ and
depositor.account-number = account.account-
number
Application programs generally access databases through
one of
 Language extensions to allow embedded SQL
 Application program interface (e.g. ODBC/JDBC) which allow SQL
queries to be sent to a database

Database Users

Users are differentiated by the way they expect to
interact with the system
Application programmers – interact with system
through DML calls
Sophisticated users – form requests in a database
query language
Specialized users – write specialized database
applications that do not fit into the traditional
data processing framework
Naïve users – invoke one of the permanent
application programs that have been written
previously
 E.g. people accessing database over the web, bank
tellers, clerical staff

Database Administrator
Coordinates all the activities of the database
system; the database administrator has a
good understanding of the enterprise‘s
information resources and needs.
Database administrator's duties include:
 Schema definition
 Storage structure and access method definition
 Schema and physical organization modification
 Granting user authority to access the database
 Specifying integrity constraints
 Acting as liaison with users
 Monitoring performance and responding to
changes in requirements


A transaction is a collection of operations that
performs a single logical function in a
database application
Transaction-management component ensures
that the database remains in a consistent
(correct) state despite system failures (e.g.,
power failures and operating system
crashes) and transaction failures.
Concurrency-control manager controls the
interaction among the concurrent
transactions, to ensure the consistency of
the database.

Storage Management

Storage manager is a program module that
provides the interface between the low-
level data stored in the database and the
application programs and queries
submitted to the system.
The storage manager is responsible to the
following tasks:
 interaction with the file manager
 efficient storing, retrieving and updating of
data

Application Architectures

Two-tier architecture: E.g. client programs using ODBC/JDBC to
communicate with a database
Three-tier architecture: E.g. web-based applications, and
applications built using “middleware”

Chapter 2: Entity-Relationship
Model
 Entity Sets
 Relationship Sets
 Design Issues
 Mapping Constraints
 Keys
 E-R Diagram
 Extended E-R Features
 Design of an E-R Database Schema
 Reduction of an E-R Schema to Tables

Entity Sets

 A database can be modeled as:
 a collection of entities,
 relationship among entities.
 An entity is an object that exists and is
distinguishable from other objects.
 Example: specific person, company, event, plant
 Entities have attributes
 Example: people have names and addresses
 An entity set is a set of entities of the same
type that share the same properties.
 Example: set of all persons, companies, trees,
holidays

Entity Sets customer and loan
customer-id customer- customer- customer- loan- amount
name street city number

Attributes
 An entity is represented by a set of
attributes, that is descriptive properties
possessed by all members of an entity set.
Example:
customer = (customer-id, customer-name,
customer-street, customer-city)
loan = (loan-number, amount)

 Domain – the set of permitted values for each
attribute
 Attribute types:
 Simple and composite attributes.
 Single-valued and multi-valued attributes
 E.g. multivalued attribute: phone-numbers
 Derived attributes
 Can be computed from other attributes
 E.g. age, given date of birth

Relationship Sets
 A relationship is an association among several
entities
Example:
Hayes depositor A-102
customer entityrelationship setaccount entity
 A relationship set is a mathematical relation
among n 2 entities, each taken from entity sets
{(e1, e2, … en) | e1 E 1 , e2 E2, …, en
En}
where (e1, e2, …, en) is a relationship
 Example:
(Hayes, A-102) depositor

Relationship Sets (Cont.)
 An attribute can also be property of a
relationship set.
 For instance, the depositor
relationship set between entity sets
customer and account may have the
attribute access-date

Degree of a Relationship Set
 Refers to number of entity sets that
participate in a relationship set.
 Relationship sets that involve two entity
sets are binary (or degree two).
Generally, most relationship sets in a
database system are binary.
 Relationship sets may involve more
than two entity sets.
E.g. Suppose employees of a bank may have jobs
(responsibilities) at multiple branches, with different jobs at
different branches. Then there is a ternary relationship set
between entity sets employee, job and branch

Mapping Cardinalities
 Express the number of entities to
which another entity can be
associated via a relationship set.
 Most useful in describing binary
relationship sets.
 For a binary relationship set the
mapping cardinality must be one of
the following types:
 One to one
 One to many
 Many to one
 Many to many


One to one One to many
Note: Some elements in A and B may not be mapped to any
elements in the other set


Many to one Many to many

Note: Some elements in A and B may not be mapped to any
elements in the other set

Mapping Cardinalities affect ER Design
 Can make access-date an attribute of account, instead of a
relationship attribute, if each account can have only one customer
 I.e., the relationship from account to customer is many to one,
or equivalently, customer to account is one to many

E-R Diagrams

 Rectangles represent entity sets.
 Diamonds represent relationship sets.
 Lines link attributes to entity sets and entity sets to relationship sets.
 Ellipses represent attributes
 Double ellipses represent multivalued attributes.
 Dashed ellipses denote derived attributes.
 Underline indicates primary key attributes (will study later)

E-R Diagram With Composite, Multivalued, and Derived
Attributes

Relationship Sets with
Attributes

Roles
 Entity sets of a relationship need not
be distinct
 The labels ―manager‖ and ―worker‖ are called roles; they specify
how employee entities interact via the works-for relationship
set.
 Roles are indicated in E-R diagrams by labeling the lines that
connect diamonds to rectangles.
 Role labels are optional, and are used to clarify semantics of
the relationship

Cardinality Constraints
 We express cardinality constraints by
drawing either a directed line ( ),
signifying ―one,‖ or an undirected line
(—), signifying ―many,‖ between the
relationship set and the entity set.
 E.g.: One-to-one relationship:
 A customer is associated with at most one
loan via the relationship borrower
 A loan is associated with at most one
customer via borrower

One-To-Many Relationship
 In the one-to-many relationship a loan
is associated with at most one
customer via borrower, a customer is
associated with several (including 0)
loans via borrower

Many-To-One Relationships
 In a many-to-one relationship a loan
is associated with several (including 0)
customers via borrower, a customer is
associated with at most one loan via
borrower

Many-To-Many Relationship

 A customer is associated with
several (possibly 0) loans via
borrower
 A loan is associated with several
(possibly 0) customers via
borrower

Participation of an Entity Set in

a Relationship Set
Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set
 E.g. participation of loan in borrower is total
 every loan must have a customer associated to it via borrower
 Partial participation: some entities may not participate in any relationship in the
relationship set
 E.g. participation of customer in borrower is partial

Alternative Notation for
Cardinality Limits
 Cardinality limits can also express participation constraints

Keys
 A super key of an entity set is a set
of one or more attributes whose
values uniquely determine each
entity.
 A candidate key of an entity set is a
minimal super key
 Customer-id is candidate key of
customer
 account-number is candidate key of
account
 Although several candidate keys
may exist, one of the candidate

Keys for Relationship Sets

 The combination of primary keys of
the participating entity sets forms a
super key of a relationship set.
 (customer-id, account-number) is the
super key of depositor
 NOTE: this means a pair of entity sets can
have at most one relationship in a
particular relationship set.
 E.g. if we wish to track all access-dates to
each account by each customer, we cannot
assume a relationship for each access. We
can use a multivalued attribute though

E-RDiagram with a Ternary
Relationship

Cardinality Constraints on
Ternary at most one arrow out of a
We allow Relationship
ternary (or greater degree) relationship
to indicate a cardinality constraint
 E.g. an arrow from works-on to job
indicates each employee works on at
most one job at any branch.
 If there is more than one arrow, there are
two ways of defining the meaning.
 E.g a ternary relationship R between A, B and
C with arrows to B and C could mean
 1. each A entity is associated with a unique
entity from B and C or


Binary Vs. Non-Binary to be
Some relationships that appear
Relationships
non-binary may be better represented
using binary relationships
 E.g. A ternary relationship parents, relating
a child to his/her father and mother, is best
replaced by two binary relationships, father
and mother
 Using two binary relationships allows partial
information (e.g. only mother being know)
 But there are some relationships that are
naturally non-binary
 E.g. works-on

Converting Non-Binary
Relationships to an artificial entity set.
Binary Form
 In general, any non-binary relationship can be represented using
binary relationships by creating
 Replace R between entity sets A, B and C by
an entity set E, and three relationship sets:
1. RA, relating E and A 2.RB, relating E and B
3. RC, relating E and C
 Create a special identifying attribute for E
 Add any attributes of R to E
 For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E 2. add (ei , ai ) to RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC

Converting Non-Binary
Relationships (Cont.)
 Also need to translate constraints
 Translating all constraints may not be
possible
 There may be instances in the translated
schema that
cannot correspond to any instance of R
 Exercise: add constraints to the relationships
RA, RB and RC to ensure that a newly created
entity corresponds to exactly one entity in
each of entity sets A, B and C
 We can avoid creating an identifying
attribute by making E a weak entity set
(described shortly) identified by the three

Design Issues
 Use of entity sets vs. attributes
Choice mainly depends on the structure
of the enterprise being modeled, and on
the semantics associated with the
attribute in question.
 Use of entity sets vs. relationship sets
Possible guideline is to designate a
relationship set to describe an action
that occurs between entities
 Binary versus n-ary relationship sets
Although it is possible to replace any
nonbinary (n-ary, for n > 2) relationship
set by a number of distinct binary

HOW ABOUT DOING
AN ER DESIGN
INTERACTIVELY ON
THE BOARD?
SUGGEST AN
APPLICATION TO BE
MODELED.

Weak Entity Sets

 An entity set that does not have a
primary key is referred to as a weak
entity set.
 The existence of a weak entity set
depends on the existence of a
identifying entity set
 it must relate to the identifying entity set
via a total, one-to-many relationship set
from the identifying to the weak entity set
 Identifying relationship depicted using a
double diamond

Weak Entity Sets (Cont.)
 We depict a weak entity set by
double rectangles.
 We underline the discriminator of a
weak entity set with a dashed line.
 payment-number – discriminator of
the payment entity set
 Primary key for payment – (loan-
number, payment-number)

Weak Entity Sets (Cont.)

 Note: the primary key of the strong
entity set is not explicitly stored with
the weak entity set, since it is implicit
in the identifying relationship.
 If loan-number were explicitly stored,
payment could be made a strong
entity, but then the relationship
between payment and loan would be
duplicated by an implicit relationship
defined by the attribute loan-number

More Weak Entity Set Examples
 In a university, a course is a strong
entity and a course-offering can be
modeled as a weak entity
 The discriminator of course-offering
would be semester (including year) and
section-number (if there is more than
one section)
 If we model course-offering as a strong
entity we would model course-number
as an attribute.
Then the relationship with course would
be implicit in the course-number

Specialization
 Top-down design process; we
designate subgroupings within an
entity set that are distinctive from other
entities in the set.
 These subgroupings become lower-
level entity sets that have attributes or
participate in relationships that do not
apply to the higher-level entity set.
 Depicted by a triangle component
labeled ISA (E.g. customer ―is a‖
person).
 Attribute inheritance – a lower-level

Generalization
 A bottom-up design process –
combine a number of entity sets that
share the same features into a higher-
level entity set.
 Specialization and generalization are
simple inversions of each other; they
are represented in an E-R diagram in
the same way.
 The terms specialization and
generalization are used
interchangeably.

Specialization and Generalization
(Contd.)
 Can have multiple specializations of an
entity set based on different features.
 E.g. permanent-employee vs.
temporary-employee, in addition to
officer vs. secretary vs. teller
 Each particular employee would be
 a member of one of permanent-employee
or temporary-employee,
 and also a member of one of officer,
secretary, or teller
 The ISA relationship also referred to as

Design Constraints on a
Specialization/Generalization
 Constraint on which entities can be
members of a given lower-level
entity set.
 condition-defined
 E.g. all customers over 65 years are
members of senior-citizen entity set;
senior-citizen ISA person.
 user-defined
 Constraint on whether or not entities
may belong to more than one lower-
level entity set within a single
generalization.
 Disjoint

Design Constraints on a
Specialization/Generalization
(Contd.)
Completeness constraint -- specifies
whether or not an entity in the higher-
level entity set must belong to at least
one of the lower-level entity sets
within a generalization.
 total : an entity must belong to one of the
lower-level entity sets
 partial: an entity need not belong to one of
the lower-level entity sets

Aggregation
 Consider the ternary relationship works-on, which we saw earlier

 Suppose we want to record managers for tasks performed by an
employee at a branch

Aggregation (Cont.)
 Relationship sets works-on and manages
represent overlapping information
 Every manages relationship corresponds to a
works-on relationship
 However, some works-on relationships may
not correspond to any manages relationships
 So we can‘t discard the works-on relationship
 Eliminate this redundancy via aggregation
 Treat relationship as an abstract entity
 Allows relationships between relationships
 Abstraction of relationship into new entity
 Without introducing redundancy, the

E-R Design Decisions

 The use of an attribute or entity set to
represent an object.
 Whether a real-world concept is best
expressed by an entity set or a
relationship set.
 The use of a ternary relationship
versus a pair of binary relationships.
 The use of a strong or weak entity set.
 The use of
specialization/generalization –

E-R Diagram for a Banking
Enterprise

HOW ABOUT DOING
ANOTHER ER DESIGN
INTERACTIVELY ON
THE BOARD?

Summary of Symbols Used in E-R
Notation

UML

 UML: Unified Modeling Language
 UML has many components to
graphically model different aspects of
an entire software system
 UML Class Diagrams correspond to E-
R Diagram, but several differences.

Summary of UML Class Diagram
Notation

UML Class Diagrams (Contd.)
 Entity sets are shown as boxes, and
attributes are shown within the box,
rather than as separate ellipses in E-R
diagrams.
 Binary relationship sets are represented
in UML by just drawing a line connecting
the entity sets. The relationship set
name is written adjacent to the line.
 The role played by an entity set in a
relationship set may also be specified by
writing the role name on the line,
adjacent to the entity set.

UML Class Diagram Notation
(Cont.)

overlapping

disjoint

*Note reversal of position in cardinality constraint depiction
*Generalization can use merged or separate arrows independent
of disjoint/overlapping

UML Class Diagrams (Contd.)
 Cardinality constraints are specified in
the form l..h, where l denotes the
minimum and h the maximum number of
relationships an entity can participate in.
 Beware: the positioning of the constraints
is exactly the reverse of the positioning
of constraints in E-R diagrams.
 The constraint 0..* on the E2 side and
0..1 on the E1 side means that each E2
entity can participate in at most one
relationship, whereas each E1 entity can
participate in many relationships; in other

Reduction of an E-R Schema to
Tables
 Primary keys allow entity sets and
relationship sets to be expressed
uniformly as tables which
represent the contents of the
database.
 A database which conforms to an
E-R diagram can be represented
by a collection of tables.
 For each entity set and
relationship set there is a unique
table which is assigned the name
of the corresponding entity set or

Representing Entity Sets as
 A strong entity set reduces to a table
Tables same attributes.
with the

Composite and Multivalued
 Composite attributes are flattened out by
Attributes
creating a separate attribute for each
component attribute
 E.g. given entity set customer with composite
attribute name with component attributes
first-name and last-name the table
corresponding to the entity set has two
attributes
name.first-name and name.last-
name
 A multivalued attribute M of an entity E is
represented by a separate table EM
 Table EM has attributes corresponding to the
primary key of E and an attribute
corresponding to multivalued attribute M

Representing Weak Entitycolumn for
 A weak entity set becomes a table that includes a
Sets
the primary key of the identifying strong entity set

Representing Relationship a
 A many-to-many relationship set is represented as
Sets as columns for the primary keys of the two of
table with
Tables
participating entity sets, and any descriptive attributes
the relationship set.
 E.g.: table for relationship set borrower

Redundancy of Tables
 Many-to-one and one-to-many relationship sets that are total
on the many-side can be represented by adding an extra
attribute to the many side, containing the primary key of the
one side
 E.g.: Instead of creating a table for relationship account-
branch, add an attribute branch to the entity set account

Redundancy of Tables (Cont.)

 For one-to-one relationship sets,
either side can be chosen to act as the
―many‖ side
 That is, extra attribute can be added to
either of the tables corresponding to the
two entity sets
 If participation is partial on the many
side, replacing a table by an extra
attribute in the relation corresponding
to the ―many‖ side could result in null
values
 The table corresponding to a

Representing Specialization as
 Tables
Method 1:
 Form a table for the higher level entity
 Form a table for each lower level entity set, include primary
key of higher level entity set and local attributes

table table attributes
person name, street, city
customer name, credit-rating
employee name, salary
 Drawback: getting information about, e.g., employee
requires accessing two tables

Representing Specialization as
 Method 2: (Cont.)
Tables
 Form a table for each entity set with all local and inherited
attributes
table table attributes
person name, street, city
customer name, street, city, credit-rating
employee name, street, city, salary

 If specialization is total, table for generalized entity (person)
not required to store information
 Can be defined as a ―view‖ relation containing union of
specialization tables
 But explicit table may still be needed for foreign key
constraints
 Drawback: street and city may be stored redundantly for
persons who are both customers and employees

Relations Corresponding to
Aggregation
 To represent aggregation, create a table containing
 primary key of the aggregated relationship,
 the primary key of the associated entity set
 Any descriptive attributes

Relations Corresponding to
Aggregation manager, create a table
(Cont.)
 E.g. to represent aggregation manages between relationship
works-on and entity set
manages(employee-id, branch-name, title, manager-name)
 Table works-on is redundant provided we are willing to store
null values for attribute manager-name in table manages

Existence Dependencies
 If the existence of entity x depends on the
existence of entity y, then x is said to be
existence dependent on y.
 y is a dominant entity (in example below, loan)
 x is a subordinate entity (in example below, payment)

loan loan-payment payment

If a loan entity is deleted, then all its associated payment entities
must be deleted also.

Chapter 3: Relational Model
 Structure of Relational
Databases
 Relational Algebra
 Tuple Relational Calculus
 Domain Relational Calculus
 Extended Relational-
Algebra-Operations
 Modification of the Database
 Views

Basic Structure
 Formally, given sets D1, D2, …. Dn a relation r is a
subset of
D1 x D 2 x … x D n
Thus a relation is a set of n-tuples (a1, a2, …, an)
where
each ai Di
 Example: if
customer-name = {Jones, Smith, Curry, Lindsay}
customer-street = {Main, North, Park}
customer-city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over customer-name x customer-street
x customer-city

Attribute Types

 Each attribute of a relation has a name
 The set of allowed values for each attribute is
called the domain of the attribute
 Attribute values are (normally) required to be
atomic, that is, indivisible
 E.g. multivalued attribute values are not atomic
 E.g. composite attribute values are not atomic
 The special value null is a member of every
domain
 The null value causes complications in the
definition of many operations
 we shall ignore the effect of null values in our main
presentation and consider their effect later

Relation Schema

 A1, A2, …, An are attributes
 R = (A1, A2, …, An ) is a relation
schema
E.g. Customer-schema =
(customer-name,
customer-street, customer-city)
 r(R) is a relation on the relation
schema R
E.g. customer (Customer-schema)

Relation Instance
 The current values (relation instance)
of a relation are specified by a table
 An element t of r is a tuple,
represented by a row in a table
attributes
(or columns)
customer-name customer-street customer-city

Jones Main Harrison
tuples
Smith North Rye
(or rows)
Curry North Rye
Lindsay Park Pittsfield

customer

Relations are Unordered
 Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
 E.g. account relation with unordered tuples

Database
 A database consists of multiple relations
 Information about an enterprise is broken up into
parts, with each relation storing one part of the
information
E.g.: account : stores information about
accounts
depositor : stores information about which
customer
owns which account
customer : stores information about
customers
 Storing all information as a single relation such as
bank(account-number, balance, customer-name, ..)
results in
 repetition of information (e.g. two customers own an account)
 the need for null values (e.g. represent a customer without
an account)
 Normalization theory (Chapter 7) deals with how to
design relational schemas

E-R Diagram for the Banking
Enterprise

Keys

 Let K R
 K is a superkey of R if values for K are
sufficient to identify a unique tuple of
each possible relation r(R)
 by ―possible r‖ we mean a relation r that
could exist in the enterprise we are
modeling.
 Example: {customer-name, customer-
street} and
{customer-name}
are both superkeys of Customer, if no two

Determining Keys from E-R Sets

 Strong entity set. The primary key of
the entity set becomes the primary
key of the relation.
 Weak entity set. The primary key of
the relation consists of the union of
the primary key of the strong entity
set and the discriminator of the weak
entity set.
 Relationship set. The union of the
primary keys of the related entity

Schema Diagram for the Banking
Enterprise

Query Languages

 Language in which user requests
information from the database.
 Categories of languages
 procedural
 non-procedural
 ―Pure‖ languages:
 Relational Algebra
 Tuple Relational Calculus
 Domain Relational Calculus
 Pure languages form underlying basis

Relational Algebra

 Procedural language
 Six basic operators
 select
 project
 union
 set difference
 Cartesian product
 rename
 The operators take one or more
relations as inputs and give a new

Select Operation – Example
• Relation r A B C D

  1 7
  5 7
  12 3
  23 10

•  ^ D > 5 (r)
A=B
A B C D

  1 7
  23 10

Select Operation
 Notation: p(r)
 p is called the selection predicate
 Defined as:
p(r) = {t | t r and p(t)}
Where p is a formula in
propositional calculus consisting
of terms connected by : (and),
(or), (not)
Each term is one of:
<attribute> op
<attribute> or <constant>

Project Operation – Example

 Relation r: A B C

 10 1
 20 1
 30 1
 40 2

  (r)
A,C A C A C

 1  1

 1 =  1

 1  2

 2

Project Operation

 Notation:

(r)
A1, A2, …, Ak

where A1, A2 are attribute names and r
is a relation name.
 The result is defined as the relation of
k columns obtained by erasing the
columns that are not listed
 Duplicate rows removed from result,
since relations are sets

Union Operation – Example
 Relations r,As: B A B

 1  2
 2  3
 1
s
r

r s: A B

 1
 2
 1
 3

Union Operation

 Notation: r s
 Defined as:
r s = {t | t r or t s}

 For r s to be valid.
1. r, s must have the same arity
(same number of attributes)
2. The attribute domains must be
compatible (e.g., 2nd column
of r deals with the same type of

Set Difference Operation –
Example
 Relations r, s:
A B A B

 1  2
 2  3
 1
s
r

r – s: A B

 1
 1

Set Difference Operation

 Notation r – s
 Defined as:
r – s = {t | t r and t s}
 Set differences must be taken between
compatible relations.
 r and s must have the same arity
 attribute domains of r and s must be
compatible

Cartesian-Product Operation-
Example A B C D E
Relations r, s:
 1  10 a
 2  10 a
 20 b
r  10 b

s
r x s:
A B C D E

 1  10 a
 1  10 a
 1  20 b
 1  10 b
 2  10 a
 2  10 a
 2  20 b
 2  10 b

Cartesian-Product Operation

 Notation r x s
 Defined as:
r x s = {t q | t r and q s}
 Assume that attributes of r(R) and s(S)
are disjoint. (That is,
R S = ).
 If attributes of r(R) and s(S) are not
disjoint, then renaming must be used.

Composition of Operations

 Can build expressions using multiple
A B C D E
operations
 1  10 a
 Example:  1
A=C
 1
(r x s)


10
20
a
b
 rxs  1
 2


10
10
b
a
 2  10 a
 2  20 b
 2  10 b

A B C D E

 1  10 a
 2  20 a
 2  20 b

Rename Operation

 Allows us to name, and therefore to
refer to, the results of relational-
algebra expressions.
 Allows us to refer to a relation by
more than one name.
Example:
x (E)
returns the expression E under the
name X
If a relational-algebra expression E has

Banking Example
branch (branch-name, branch-
city, assets)
customer (customer-
name, customer-
street, customer-only)

account (account-number, branch-
name, balance)

loan (loan-number, branch-
name, amount)

depositor (customer-

Example Queries
 Find all loans of over $1200

amount > 1200 (loan)

Find the loan number for each loan of an amount greater than
$1200

loan-number (amount > 1200 (loan))

Example Queries
 Find the names of all customers who
have a loan, an account, or both, from
the customer-name (borrower) customer-name (depositor)
bank
Find the names of all customers who have a loan and an
account at bank.

customer-name (borrower) customer-name (depositor)

Example Queries
 Find the names of all customers who have
a loan at the Perryridge branch.
 customer-name ( branch-name=“Perryridge”
(borrower.loan-number = loan.loan-number(borrower x loan)))

 Find the names of all customers who have a loan at the
Perryridge branch but do not have an account at any branch of
the bank.

customer-name (branch-name = “Perryridge”
( borrower.loan-number = loan.loan-number(borrower x loan))) –
customer-name(depositor)

Example Queries
 Find the names of all customers who have a
loan at the Perryridge branch.
Query 1
customer-name( branch-name = “Perryridge” (
borrower.loan-number = loan.loan-number(borrower x loan)))

 Query 2
customer-name( loan.loan-number = borrower.loan-number(
(branch-name = “Perryridge”(loan)) x borrower))

Example Queries
Find the largest account balance
 Rename account relation as d
 The query is:
balance(account) -  account.balance
( account.balance < d.balance (account x d (account)))

Formal Definition

 A basic expression in the relational
algebra consists of either one of the
following:
 A relation in the database
 A constant relation
 Let E1 and E2 be relational-algebra
expressions; the following are all
relational-algebra expressions:
 E1 E2
 E1 - E2
 E1 x E2

Additional Operations
We define additional operations that do
not add any power to the
relational algebra, but that simplify
common queries.
 Set intersection
 Natural join
 Division
 Assignment

Set-Intersection Operation

 Notation: r s
 Defined as:
 r s ={ t | t r and t s}
 Assume:
 r, s have the same arity
 attributes of r and s are compatible
 Note: r s = r - (r - s)

Set-Intersection Operation -
Example A B A B

 Relation r, s:
1  2
 2  3
 1

r s

A B

 2

 r s

Natural-Join Operation
 Notation: r s

 Let r and s be relations on schemas R
and S respectively.
Then, r s is a relation on schema R
S obtained as follows:
 Consider each pair of tuples tr from r and ts
from s.
 If tr and ts have the same value on each of
the attributes in R S, add a tuple t to the
result, where
 t has the same value as tr on r
 t has the same value as ts on s
 Example:

Natural Join Operation –
 Example
Relations r, s:
A B C D B D E

 1  a 1 a 
 2  a 3 a 
 4  b 1 a 
 1  a 2 b 
 2  b 3 b 

r s

r s
A B C D E

 1  a 
 1  a 
 1  a 
 1  a 
 2  b 

Division Operation
r s

 Suited to queries that include the
phrase ―for all‖.
 Let r and s be relations on
schemas R and S respectively
where
 R = (A1, …, Am, B1, …, Bn)
 S = (B1, …, Bn)
The result of r s is a relation on
schema
R – S = (A1, …, Am)

Division Operation – Example
Relations r, s: A B
B

 1 1
 2 2
 3
 1 s
 1
 1
 3
 4
 6
 1
 2

r s: A r




Another Division Example
Relations r, s: A B C D E D E

a 1
 a  a 1
b 1
 a  a 1
 a  b 1 s
 a  a 1
 a  b 3
 a  a 1
 a  b 1
 a  b 1

r

r s: A B C

 a 
 a 

Division Operation (Cont.)
 Property
 Let q – r s
 Then q is the largest relation satisfying
qxs r
 Definition in terms of the basic
algebra operation
Let r(R) and s(S) be relations, and let
S R

r s= R-S (r) – R-S (( R-S (r) x s) –
R-S,S(r))

Assignment Operation
 The assignment operation ( ) provides a convenient way
to express complex queries.
 Write query as a sequential program
consisting of
 a series of assignments
 followed by an expression whose value
is displayed as a result of the query.
 Assignment must always be made to a
temporary relation variable.
 Example: Write r s as
temp1 R-S (r)
temp2 R-S ((temp1 x s) – R-S,S (r))
result = temp1 – temp2

 The result to the right of the is
assigned to the relation variable on the

Example Queries
 Find all customers who have an
account from at least the ―Downtown‖
and the(BN=“Downtown (depositor account)) 
Uptown‖ branches.
Query 1
 CN ”

 (
CN BN=“Uptown”(depositor account))

where CN denotes customer-name and BN denotes
branch-name.

Query 2
 customer-name, branch-name (depositor account)
 temp(branch-name) ({(“Downtown”), (“Uptown”)})

Example Queries
account at all branches located in

Brooklyn city.
customer-name, branch-name (depositor account)
branch-name (branch-city = “Brooklyn” (branch))

Extended Relational-Algebra-
Operations
 Generalized Projection
 Outer Join
 Aggregate Functions

Generalized Projection

 Extends the projection operation by
allowing arithmetic functions to be
used in the projection list.

(E)
F1, F2, …, Fn

 E is any relational-algebra expression
 Each of F1, F2, …, Fn are are arithmetic
expressions involving constants and
attributes in the schema of E.
 Given relation credit-info(customer-

Aggregate Functions and
Operations
 Aggregation function takes a
collection of values and returns a
single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
 Aggregate operation in relational
algebra

Aggregate Operation – Example
 Relati
on r: A B C

  7
  7
  3
  10

sum-C
g sum(c) (r)
27

Aggregate Operation – Example

 Relation account grouped by
branch-name: account-number
branch-name balance

Perryridge A-102 400
Perryridge A-201 900
Brighton A-217 750
Brighton A-215 750
Redwood A-222 700

branch-name g sum(balance) (account)
branch-name balance
Perryridge 1300
Brighton 1500
Redwood 700

Aggregate Functions (Cont.)

 Result of aggregation does not have a
name
 branch-name g rename operation to give it a
Can use sum(balance) as sum-balance (account)
name
 For convenience, we permit renaming as
part of aggregate operation

Outer Join

 An extension of the join operation
that avoids loss of information.
 Computes the join and then adds
tuples form one relation that do not
match tuples in the other relation to
the result of the join.
 Uses null values:
 null signifies that the value is unknown or
does not exist
 All comparisons involving null are
(roughly speaking) false by definition.

Outer Join – Example
 Relation loan
loan-number branch-name amount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700

 Relation borrower
customer-name loan-number
Jones L-170
Smith L-230
Hayes L-155

 Inner Join
loan Borrower
loan-number branch-name amount customer-name

L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith

 Left Outer Join
loan Borrower

L-260 Perryridge 1700 null

 Right Outer Join
loan borrower
L-155 null null Hayes

 Full Outer Join
loan borrower


Null Values

 It is possible for tuples to have a null
value, denoted by null, for some of
their attributes
 null signifies an unknown value or
that a value does not exist.
 The result of any arithmetic
expression involving null is null.
 Aggregate functions simply ignore
null values
 Is an arbitrary decision. Could have

Null Values
 Comparisons with null values return
the special truth value unknown
 If false was used instead of unknown, then
not (A < 5)
would not be equivalent to
A >= 5
 Three-valued logic using the truth
value unknown:
 OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
 AND: (true and unknown) =
unknown,
(false and unknown) = false,

Modification of the Database

 The content of the database may be
modified using the following
operations:
 Deletion
 Insertion
 Updating
 All these operations are expressed
using the assignment operator.

Deletion

 A delete request is expressed similarly
to a query, except instead of
displaying tuples to the user, the
selected tuples are removed from the
database.
 Can delete only whole tuples; cannot
delete values on only particular
attributes
 A deletion is expressed in relational
algebra by:

Deletion Examples
 Delete all account records in the
Perryridge branch.
account account – 
branch-name = “Perryridge” (account)

Delete all loan records with amount in the range of 0 to 50

loan loan – amount  and amount 50 (loan)
0

Delete all accounts at branches located in Needham.

r1   ranch-city = “Needham” (account
b branch)
r2 branch-name, account-number, balance (r1)
r3 customer-name, account-number (r2 depositor)
account account – r2
depositor depositor – r3

Insertion

 To insert data into a relation, we
either:
 specify a tuple to be inserted
 write a query whose result is a set of
tuples to be inserted
 in relational algebra, an insertion is
expressed by:
r r E
where r is a relation and E is a
relational algebra expression.

Insertion Examples
 Insert information in the database
specifying that Smith has $1200 in
account account {(“Perryridge”, A-973, 1200)}
account  A-973 at the Perryridge
depositor  depositor {(“Smith”, A-973)}
branch.

 Provide as a gift for all loan customers in the Perryridge
branch, a $200 savings account. Let the loan number serve
as the account number for the new savings account.
r1 (branch-name = “Perryridge” (borrower loan))
account account  branch-name, account-number,200 (r1)
depositor depositor  customer-name, loan-number(r1)

Updating

 A mechanism to change a value in a
tuple without charging all values in
the tuple
 Use the generalized projection
operator to do this task
r F F F (r)
1, 2, …, I,

 Each Fi is either
 the ith attribute of r, if the ith attribute is
not updated, or,
 if the attribute is to be updated Fi is an

Update Examples
 Make interest payments by increasing all
balances by 5 percent.
account AN, BN, BAL * 1.05 (account)

where AN, BN and BAL stand for account-number, branch-name
and balance, respectively.

 Pay all accounts with balances over $10,000 6 percent interest
and pay all others 5 percent

account  AN, BN, BAL * 1.06 (BAL 10000 (account))
  BN, BAL * 1.05 ( 10000 (account))
AN, BAL

Views
 In some cases, it is not desirable for all
users to see the entire logical model
(i.e., all the actual relations stored in
the database.)
 Consider a person who needs to know
a customer‘s loan number but has no
need to see the loan amount. This
person should see a relation
described, in the relational algebra, by
customer-name, loan-number (borrower loan)
 Any relation that is not of the
conceptual model but is made visible

View Definition

 A view is defined using the create view
statement which has the form
create view v as <query expression
where <query expression> is any
legal relational algebra query
expression. The view name is
represented by v.
 Once a view is defined, the view name
can be used to refer to the virtual
relation that the view generates.

View Examples
 Consider the view (named all-
customer) consisting of branches and
their view all-customer as
create customers.
branch-name, customer-name (depositor account)
 branch-name, customer-name (borrower loan)

 We can find all customers of the Perryridge branch by writing:

customer-name
( branch-name = “Perryridge” (all-customer))

Updates Through View

 Database modifications expressed as
views must be translated to
modifications of the actual relations in
the database.
 Consider the person who needs to see
all loan data in the loan relation
except amount. The view given to the
person, branch-loan, is defined as:
create view branch-loan as
branch-name, loan-number (loan)

Updates Through Views (Cont.)
 The previous insertion must be
represented by an insertion into the
actual relation loan from which the view
branch-loan is constructed.
 An insertion into loan requires a value
for amount. The insertion can be dealt
with by either.
 rejecting the insertion and returning an
error message to the user.
 inserting a tuple (―L-37‖, ―Perryridge‖, null)
into the loan relation
 Some updates through views are

Views Defined Using Other
Views
 One view may be used in the
expression defining another view
 A view relation v1 is said to depend
directly on a view relation v2 if v2 is
used in the expression defining v1
 A view relation v1 is said to depend on
view relation v2 if either v1 depends
directly to v2 or there is a path of dependencies
from v1 to v2
 A view relation v is said to be

View Expansion

 A way to define the meaning of views
defined in terms of other views.
 Let view v1 be defined by an
expression e1 that may itself contain
uses of view relations.
 View expansion of an expression
repeats the following replacement
step:
repeat
Find any view relation vi in e1

Tuple Relational Calculus
 A nonprocedural query language, where
each query is of the form
{t | P (t) }
 It is the set of all tuples t such that
predicate P is true for t
 t is a tuple variable, t[A] denotes the
value of tuple t on attribute A
 t r denotes that tuple t is in relation r
 P is a formula similar to that of the
predicate calculus

Predicate Calculus Formula

1. Set of attributes and constants
2. Set of comparison operators: (e.g.,
, , , , , )
3. Set of connectives: and ( ), or (v)‚
not ( )
4. Implication ( ): x y, if x if true,
then y is true
x y xvy
5. Set of quantifiers:
 t r (Q(t)) ‖there exists‖ a tuple in t in

Banking Example

 branch (branch-name, branch-city,
assets)
 customer (customer-name, customer-
street, customer-city)
 account (account-number, branch-
name, balance)
 loan (loan-number, branch-name,
amount)
 depositor (customer-name, account-
number)

Example Queries
 Find the loan-number, branch-
name, and amount for loans of over
{t | t loan t [amount] 1200}
$1200
Find the loan number for each loan of an amount greater than $1200

{t |  s 
loan (t[loan-number] = s[loan-number] s [amount] 1200)}

Notice that a relation on schema [loan-number] is implicitly defined
by the query

Example Queries
 Find the names of all customers
having a loan, an account, or both at
the |bank
{t s borrower( t[customer-name] = s[customer-name])
u depositor( t[customer-name] = u[customer-name])

 Find the names of all customers who have a loan and an account
at the bank

{t | s borrower( t[customer-name] = s[customer-name])
u depositor( t[customer-name] = u[customer-name])

Example Queries
having a loan at the Perryridge branch
{t | s borrower(t[customer-name] = s[customer-name]
u loan(u[branch-name] = “Perryridge”
 u[loan-number] = s[loan-number]))}

 Find the names of all customers who have a loan at the
Perryridge branch, but no account at any branch of the bank

{t | s borrower( t[customer-name] = s[customer-name]
u loan(u[branch-name] = “Perryridge”
 u[loan-number] = s[loan-number]))
not v depositor (v[customer-name] =
t[customer-name]) }

Example Queries
having a loan from the Perryridge
{t | s loan(s[branch-name]cities they live in
branch, and the = “Perryridge”
u borrower (u[loan-number] = s[loan-number]
 t [customer-name] = u[customer-name])
 v customer (u[customer-name] = v[customer-name]
 t[customer-city] = v[customer-city])))}

Example Queries
have an account at all branches
located in Brooklyn: = c[customer-name]) 
{t |  c customer (t[customer.name]
 s branch(s[branch-city] = “Brooklyn” 
 u account ( s[branch-name] = u[branch-name]
 s depositor ( t[customer-name] = s[customer-name]
 s[account-number] = u[account-number] )) )}

Safety of Expressions

 It is possible to write tuple calculus
expressions that generate infinite
relations.
 For example, {t | t r} results in an
infinite relation if the domain of any
attribute of relation r is infinite
 To guard against the problem, we
restrict the set of allowable
expressions to safe expressions.
 An expression {t | P(t)} in the tuple

Domain Relational Calculus

 A nonprocedural query language
equivalent in power to the tuple
relational calculus
 Each query is an expression of the
form:

{ x1, x2, …, xn | P(x1, x2, …,
xn)}

 x1, x2, …, xn represent domain variables

Example Queries
 Find the loan-number, branch-name,
and amount l, b, loans of over $1200
{l, b, a |
for a loan a > 1200}

 Find the names of all customers who have a loan of over $1200

{c |  l, b, a (c, l borrower l, b, a loan a > 1200)}

 Find the names of all customers who have a loan from the
Perryridge branch and the loan amount:

{c, a |  l (c, l borrower b(l, b, a loan 
b = “Perryridge”))}
or {c, a |  l (c, l borrower l, “Perryridge”, a loan)}

Example Queries
having a loan, an account, or both at
the |  l ({c, l borrower
{c Perryridge branch:
 b,a(l, b, a loan b = “Perryridge”))
 a(c, a depositor
 b,n(a, b, n account b = “Perryridge”))}

 Find the names of all customers who have an account at all
branches located in Brooklyn:

{c |  s, n (c, s, n customer) 
 x,y,z(x, y, z branch y = “Brooklyn”) 
 a,b(x, y, z account c,a depositor)}

Safety of Expressions

{ x1, x2, …, xn | P(x1, x2, …, xn)}

is safe if all of the following hold:
1.All values that appear in tuples of
the expression are values from
dom(P) (that is, the values appear
either in P or in a tuple of a
relation mentioned in P).
2.For every ―there exists‖ subformula
of the form x (P1(x)), the

Result of branch-name = ―Perryridge‖
(loan)

Loan Number and the Amount of
the Loan

Names of All Customers Who
Have Either a Loan or an Account

Customers With An Account But
No Loan

Result of branch-name = ―Perryridge‖
(borrower loan)

Largest Account Balance in the
Bank

Customers Who Live on the Same Street and In the Same
City as Smith

Customers With Both an Account
and a Loan at the Bank

Result of customer-name, loan-number,
amount (borrower loan)

Result of branch-name( customer-city =
―Harrison‖(customer account depositor))

Result of branch-name( branch-city =
―Brooklyn‖(branch))

Result of customer-name, branch-name(depositor account)

Result of customer-name, (limit – credit-
balance) as credit-available(credit-info).

The pt-works Relation After
Grouping

Result of branch-name sum(salary) (pt-
works)

Result of branch-name sum salary,
max(salary) as max-salary (pt-works)

The employee and ft-works
Relations

The Result of employee ft-
works

Tuples Inserted Into loan and
borrower

Names of All Customers Who
Have a Loan at the Perryridge
Branch

Chapter 4: SQL
 Basic Structure
 Set Operations
 Aggregate Functions
 Null Values
 Nested Subqueries
 Derived Relations
 Views
 Joined Relations
 Data Definition Language
 Embedded SQL, ODBC and JDBC

Basic Structure
 SQL is based on set and relational operations with
certain modifications and enhancements
 A typical SQL query has the form:
select A1, A2, ..., An
from r1, r2, ..., rm
where P
 Ais represent attributes
 ris represent relations
 P is a predicate.
 This query is equivalent to the relational algebra
expression.
( P (r1 x r2 x ... x rm))
A1, A2, ..., An
 The result of an SQL query is a relation.

The select Clause
 The select clause list the attributes desired in the result
of a query
 corresponds to the projection operation of the relational
algebra
 E.g. find the names of all branches in the loan relation
select branch-name
from loan
 In the ―pure‖ relational algebra syntax, the query would
be:
branch-name(loan)
 NOTE: SQL does not permit the ‗-‘ character in names,
 Use, e.g., branch_name instead of branch-name in a real
implementation.
 We use ‗-‘ since it looks nicer!
 NOTE: SQL names are case insensitive, i.e. you can use
capital or small letters.
 You may wish to use upper case where-ever we use bold font.

The select Clause (Cont.)

 SQL allows duplicates in relations as well as
in query results.
 To force the elimination of duplicates, insert
the keyword distinct after select.
 Find the names of all branches in the loan
relations, and remove duplicates
select distinct branch-name
from loan
 The keyword all specifies that duplicates not
be removed.
select all branch-name
from loan

The select Clause (Cont.)

 An asterisk in the select clause denotes ―all attributes‖
select *
from loan
 The select clause can contain arithmetic expressions
involving the operation, +, –, , and /, and operating
on constants or attributes of tuples.
 The query:
select loan-number, branch-name,
amount 100
from loan
would return a relation which is the same as the loan
relations, except that the attribute amount is
multiplied by 100.

The where Clause

 The where clause specifies conditions that the
result must satisfy
 corresponds to the selection predicate of the relational
algebra.
 To find all loan number for loans made at the
Perryridge branch with loan amounts greater than
$1200.
select loan-number
from loan
where branch-name = ‗Perryridge‘ and
amount > 1200
 Comparison results can be combined using the
logical connectives and, or, and not.
 Comparisons can be applied to results of
arithmetic expressions.

The where Clause (Cont.)
 SQL includes a between comparison operator
 E.g. Find the loan number of those loans with
loan amounts between $90,000 and $100,000
(that is, $90,000 and $100,000)
select loan-number
from loan
where amount between 90000 and 100000

The from Clause
 The from clause lists the relations involved
in the query
 corresponds to the Cartesian product operation
of the relational algebra.
 Find the Cartesian product borrower x loan
select
 Find the name, loan number and loan amount of all customers
having a loan at the Perryridge branch.
from borrower, loan
select customer-name, borrower.loan-number, amount
from borrower, loan
where borrower.loan-number = loan.loan-number and
branch-name = „Perryridge‟

The Rename Operation
 The SQL allows renaming relations and
attributes using the as clause:
old-name as new-name

 Find the name, loan number and loan
select customer-name, borrower.loan-number as loan-id, amount
amount of all customers; rename the
from borrower, loan
column name loan-number as loan-id.
where borrower.loan-number = loan.loan-number

Tuple Variables
 Tuple variables are defined in the from
clause via the use of the as clause.
 Find the customer names and their
loan numbers for T.loan-number, S.amount
select customer-name, all customers having
from borrower as T, loan as S
a loan at some branch.
where T.loan-number = S.loan-number
 Find the names of all branches that have greater assets than
some branch located in Brooklyn.

select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = „Brooklyn‟

String Operations
 SQL includes a string-matching operator for comparisons on
character strings. Patterns are described using two special
characters:
 percent (%). The % character matches any
substring.
 underscore (_). The _ character matches
any character.
 Find the names of all customers whose street includes the
substring ―Main‖.
select customer-name
from customer
where customer-street like ‘%Main%’
 Match the name ―Main%‖
like ‘Main%’ escape ‘’
 SQL supports a variety of string operations such as
 concatenation (using ―||‖)

Ordering the Display of Tuples
 List in alphabetic order the names of
all customers having a loan in
Perryridge branch
select distinct customer-name
from borrower, loan
where borrower loan-number =
loan.loan-number and
branch-name = ‘Perryridge’
order by customer-name
 We may specify desc for descending
order or asc for ascending order, for
each attribute; ascending order is the

Duplicates

 In relations with duplicates, SQL can
define how many copies of tuples
appear in the result.
 Multiset versions of some of the
relational algebra operators – given
multiset relations r1 and r2:
1. (r1): If there are c1 copies of tuple t1 in
r1, and t1 satisfies selections ,, then
there are c1 copies of t1 in (r1).
2. A(r1): For each copy of tuple t1 in r1,
there is a copy of tuple A(t1) in A(r1)

Duplicates (Cont.)
 Example: Suppose multiset
relations r1 (A, B) and r2 (C) are as
follows:
r1 = {(1, a) (2,a)} r2 = {(2),
(3), (3)}
 Then B(r1) would be {(a), (a)}, while
B(r1) x r2 would be

{(a,2), (a,2), (a,3), (a,3), (a,3),
(a,3)}
 SQL duplicate semantics:
select A1,, A2, ..., An

Set Operations

 The set operations union, intersect,
and except operate on relations and
correspond to the relational algebra
operations
 Each of the above operations
automatically eliminates duplicates; to
retain all duplicates use the
corresponding multiset versions union
all, intersect all and except all.

Suppose a tuple occurs m times in r

Set Operations
 Find all customers who have a loan, an
account, or both: from depositor)
(select customer-name
union
(select customer-name from borrower)
 Find all customers who have both a loan and an account.
(select customer-name from depositor)
intersect
 Find all customers who have an account but no loan.

(select customer-name from depositor)
except

Aggregate Functions

 These functions operate on the
multiset of values of a column of a
relation, and return a value
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values

Aggregate Functions (Cont.)
 Find the average account balance at
the Perryridge branch.
select avg (balance)
from account
where branch-name = „Perryridge‟

 Find the number of tuples in the customer relation.

select count (*)
from customer

 Find the number of depositors in the bank.

select count (distinct customer-name)
from depositor

Aggregate Functions – Group By
 Find the number of depositors for each
branch.
select branch-name, count (distinct customer-name)
where depositor.account-number = account.account-number
group by branch-name

Note: Attributes in select clause outside of aggregate functions must
appear in group by list

Aggregate Functions – Having
Clause
 Find the names of all branches where
the average account balance is more
than $1,200.
select branch-name, avg (balance)
from account
having avg (balance) > 1200

Note: predicates in the having clause are applied after the
formation of groups whereas predicates in the where
clause are applied before forming groups

Null Values
 It is possible for tuples to have a null
value, denoted by null, for some of
their attributes
 null signifies an unknown value or
that a value does not exist.
 The predicate is null can be used to
check for null values.
 E.g. Find all loan number which appear in
the loan relation with null values for
amount.
select loan-number
from loan

Null Values and Three Valued
Logic
 Any comparison with null returns
unknown
 E.g. 5 < null or null <> null or null
= null
 Three-valued logic using the truth
value unknown:
 OR: (unknown or true) = true, (unknown
or false) = unknown
(unknown or unknown) = unknown
 AND: (true and unknown) = unknown,
(false and unknown) = false,

Null Values and Aggregates

 Total all loan amounts
select sum (amount)
from loan
 Above statement ignores null amounts
 result is null if there is no non-null
amount
 All aggregate operations except count(*)
ignore tuples with null values on the
aggregated attributes.

Nested Subqueries

 SQL provides a mechanism for the
nesting of subqueries.
 A subquery is a select-from-where
expression that is nested within
another query.
 A common use of subqueries is to
perform tests for set membership, set
comparisons, and set cardinality.

Example Query
 Find all customers who have both an
account and a loan at the bank.
from borrower
where customer-name in (select customer-name
from depositor)

 Find all customers who have a loan at the bank but do not have
an account at the bank

from borrower
where customer-name not in (select customer-name
from depositor)

Example Query
 Find all customers who have both an
account and a loan at the Perryridge
branch
from borrower, loan
where borrower.loan-number = loan.loan-number and
branch-name = “Perryridge” and
(branch-name, customer-name) in
(select branch-name, customer-name
where depositor.account-number =
account.account-number)

 Note: Above query can be written in a much simpler manner. The
formulation above is simply to illustrate SQL features.
(Schema used in this example)

Set Comparison
 Find all branches that have greater
assets than some branch located in
Brooklyn.from branch as T, branch as S
select distinct T.branch-name

where T.assets > S.assets and
S.branch-city = ‘Brooklyn’

 Same query using > some clause

select branch-name
from branch
where assets > some
(select assets
from branch
where branch-city = ‘Brooklyn’)

Definition of Some Clause
 F <comp> some r t r s.t.
(F <comp> t)
Where 0<comp> can be:
(5< some 5 ) = true
(read: 5 < some tuple in the relation)
6

0
(5< some 5 ) = false

0
(5 = some 5 ) = true

0
(5 some 5 ) = true (since 0 5)
(= some) in
However, ( some) not in

Definition of all Clause
 F <comp> all r t r (F
<comp> 0t)
(5< all 5 ) = false
6

6
(5< all 10 ) = true

4
(5 = all 5 ) = false

4
(5 all 6 ) = true (since 5 4 and 5 6)
( all) not in
However, (= all) in

Example Query
 Find the names of all branches that
have greater assets than all branches
located in Brooklyn.
select branch-name
from branch
where assets > all
(select assets
from branch
where branch-city = „Brooklyn‟)

Test for Empty Relations

 The exists construct returns the value
true if the argument subquery is
nonempty.
 exists r r Ø
 not exists r r=Ø

Example Query
Brooklyn.
select distinct S.customer-name
from depositor as S
where not exists (
(select branch-name
from branch
where branch-city = „Brooklyn‟)
except
(select R.branch-name
from depositor as T, account as R
where T.account-number = R.account-number and
S.customer-name = T.customer-name))

 (Schema used in this example)
 Note that X – Y = Ø X Y
 Note: Cannot write this query using = all and its variants

Test for Absence of Duplicate
Tuples
 The unique construct tests whether a
subquery has any duplicate tuples in
its result.
 Find all customers who have at most
one account at the Perryridge branch.
select T.customer-name
from depositor as T
where unique (
select R.customer-name
from account, depositor as R
where T.customer-name =
R.customer-name and

Example Query
 Find all customers who have at least
two accounts at the Perryridge branch.
select distinct T.customer-name
from depositor T
where not unique (
select R.customer-name
from account, depositor as R
where T.customer-name = R.customer-name
and
R.account-number = account.account-number
and
account.branch-name = ‘Perryridge’)

(Schema used in this example)

Views
 Provide a mechanism to hide certain
data from the view of certain users.
To create a view we use the command:
create view v as <query expression>

where:
<query expression> is any legal expression
The view name is represented by v

Example Queries
 A view consisting of branches and their
customers
create view all-customer as
where depositor.account-number = account.account-number)
union
from borrower, loan
where borrower.loan-number = loan.loan-number)

 Find all customers of the Perryridge branch

from all-customer

Derived Relations

 Find the average account balance of
those branches where the average
account balance is greater than
$1200.
select branch-name, avg-balance
from (select branch-name, avg
(balance)
from account
group by branch-name)
as result (branch-name, avg-
balance)

With Clause

 With clause allows views to be defined
locally to a query, rather than globally.
Analogous to procedures in a
programming language.
 Find all accounts with the maximum
balance

with max-balance(value) as
select max (balance)
from account
select account-number

Complex Query using With
 Clausebranches where the total
Find all
account deposit is greater than the
average of the total account deposits
with branch-total (branch-name, value) as
at all branches. sum (balance)
select branch-name,
from account
with branch-total-avg(value) as
select avg (value)
from branch-total
select branch-name
from branch-total, branch-total-avg
where branch-total.value >= branch-total-avg.value

Modification of the Database –
Deletion account records at the
 Delete all
Perryridge branch
delete from account
where branch-name =
‘Perryridge’
 Delete all accounts at every branch
located in Needham city.
delete from account
where branch-name in (select branch-
name
from branch
where branch-city =

Example Query
 Delete the record of all accounts with
balances below the average at the
bank.from account
delete
where balance < (select avg (balance)
from account)

 Problem: as we delete tuples from deposit, the average balance
changes
 Solution used in SQL:
1. First, compute avg balance and find all tuples to delete
2. Next, delete all tuples found above (without recomputing avg or
retesting the tuples)

Insertion
 Add a new tuple to account
insert into account
values (‗A-9732‘,
‗Perryridge‘,1200)
or equivalently

insert into account (branch-name,
balance, account-number)
values (‗Perryridge‘, 1200, ‗A-
9732‘)

Insertion a gift for all loan customers
 Provide as
of the Perryridge branch, a $200
savings account. Let the loan number
serve as the account number for the
new savings account
insert into account
select loan-number, branch-name,
200
from loan
where branch-name = ‗Perryridge‘
insert into depositor
select customer-name, loan-number

Updates
 Increase all accounts with balances
over $10,000 by 6%, all other
accounts receive 5%.
 Write two update statements:
update account
set balance = balance 1.06
where balance > 10000

update account
set balance = balance 1.05
where balance 10000

Case Statement for Conditional
Updates
 Same query as before: Increase all
accounts with balances over $10,000
by 6%, all other accounts receive 5%.

update account
set balance = case
when balance
<= 10000 then balance *1.05
else balance *
1.06
end

Update of all loanView relation, hiding the
 Create a view
of a data in loan
amount attribute
create view branch-loan as
select branch-name, loan-number
from loan
 Add a new tuple to branch-loan
insert into branch-loan
values (‗Perryridge‘, ‗L-307‘)
This insertion must be represented by the insertion of the
tuple
(‗L-307‘, ‗Perryridge‘, null)
into the loan relation
 Updates on more complex views are difficult or impossible to
translate, and hence are disallowed.
 Most SQL implementations allow updates only on simple views
(without aggregates) defined on a single relation

Transactions
 A transaction is a sequence of queries and update statements
executed as a single unit
 Transactions are started implicitly and
terminated by one of
 commit work: makes all updates of the
transaction permanent in the database
 rollback work: undoes all updates performed
by the transaction.
 Motivating example
 Transfer of money from one account to
another involves two steps:
 deduct from one account and credit to
another
 If one steps succeeds and the other fails,

Transactions (Cont.)

 In most database systems, each SQL
statement that executes successfully
is automatically committed.
 Each transaction would then consist of
only a single statement
 Automatic commit can usually be turned
off, allowing multi-statement
transactions, but how to do so depends
on the database system
 Another option in SQL:1999: enclose
statements within

Joined Relations
 Join operations take two relations and
return as a result another relation.
 These additional operations are
typically used as subquery expressions
in the from clause
 Join condition – defines which tuples in
the two relations match, and what
attributes are present in the result of
Join Types
inner join
Join Conditions
natural
the join.
left outer join on <predicate>
right outer join using (A1, A2, ..., An)
 Join outer join defines how tuples in each
full type –
relation that do not match any tuple in
the other relation (based on the join

Joined Relations – Datasets for
Examples
 Relation loan
loan-number branch-name amount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700

 Relation borrower
customer-name loan-number
Jones L-170
Smith L-230
Hayes L-155
 Note: borrower information missing for L-260 and loan
information missing for L-155

Joined Relations – Examples
 loan inner join borrower on
loan.loan-number =
borrower.loan-number
loan-number branch-name amount customer-name loan-number

L-170 Downtown 3000 Jones L-170
L-230 Redwood 4000 Smith L-230

 loan left outer join borrower on
loan.loan-number = borrower.loan-number

loan-number branch-name amount customer-name loan-number
L-170 Downtown 3000 Jones L-170
L-230 Redwood 4000 Smith L-230
L-260 Perryridge 1700 null null

 loan natural inner join borrower


 loan natural right outer join borrower


 loan full outer join borrower
using (loan-number)


 Find all customers who have either an account or a loan (but
not both) at the bank.

from (depositor natural full outer join borrower)
where account-number is null or loan-number is null

Data Definition Language (DDL)
Allows the specification of not only a set of relations but also
information about each relation, including:
 The schema for each relation.
 The domain of values associated
with each attribute.
 Integrity constraints
 The set of indices to be
maintained for each relations.
 Security and authorization
information for each relation.
 The physical storage structure of

Domain Types in SQL
 char(n). Fixed length character string, with user-specified length n.
 varchar(n). Variable length character strings, with user-specified
maximum length n.
 int. Integer (a finite subset of the integers that is machine-
dependent).
 smallint. Small integer (a machine-dependent subset of the integer
domain type).
 numeric(p,d). Fixed point number, with user-specified precision of
p digits, with n digits to the right of decimal point.
 real, double precision. Floating point and double-precision floating
point numbers, with machine-dependent precision.
 float(n). Floating point number, with user-specified precision of at
least n digits.
 Null values are allowed in all the domain types. Declaring an
attribute to be not null prohibits null values for that attribute.
 create domain construct in SQL-92 creates user-defined domain
types
create domain person-name char(20) not
null

Date/Time Types in SQL (Cont.)

 date. Dates, containing a (4 digit) year, month and date
 E.g. date ‗2001-7-27‘
 time. Time of day, in hours, minutes and seconds.
 E.g. time ‘09:00:30‘ time
‘09:00:30.75‘
 timestamp: date plus time of day
 E.g. timestamp ‗2001-7-27
09:00:30.75‘
 Interval: period of time
 E.g. Interval ‗1‘ day
 Subtracting a date/time/timestamp value
from another gives an interval value

Create Table Construct
 An SQL relation is defined using
the create table command:
create table r (A1 D1, A2 D2,
..., An Dn,
(integrity-
constraint1),
...,
(integrity-
constraintk))
 r is the name of the relation
 each Ai is an attribute name in the
schema of relation r

Integrity Constraints in Create
 not null
Table
 primary key (A1, ..., An)
 check (P), where P is a predicate
Example: Declare branch-name as the primary key for
branch and ensure that the values of assets are non-
negative.
create table branch
(branch-namechar(15),
branch-city char(30)
assets integer,
primary key (branch-name),
check (assets >= 0))

primary key declaration on an attribute automatically
ensures not null in SQL-92 onwards, needs to be
explicitly stated in SQL-89

Drop and Alter Table Constructs
 The drop table command deletes all
information about the dropped
relation from the database.
 The alter table command is used to
add attributes to an existing
relation.
alter table r add A D
where A is the name of the
attribute to be added to relation r
and D is the domain of A.
 All tuples in the relation are assigned
null as the value for the new attribute.

Embedded SQL

 The SQL standard defines embeddings
of SQL in a variety of programming
languages such as Pascal, PL/I,
Fortran, C, and Cobol.
 A language to which SQL queries are
embedded is referred to as a host
language, and the SQL structures
permitted in the host language
comprise embedded SQL.
 The basic form of these languages

Example Query
From within a host language, find the names and cities of
customers with more than the variable amount dollars in some
account.

 Specify the query in SQL and declare a
cursor for it
EXEC SQL
declare c cursor for
select customer-name, customer-city
from depositor, customer, account
where depositor.customer-name =
customer.customer-name
and depositor account-number =

Embedded SQL (Cont.)

 The open statement causes the query
to be evaluated
EXEC SQL open c END-EXEC
 The fetch statement causes the values
of one tuple in the query result to be
placed on host language variables.
EXEC SQL fetch c into :cn, :cc END-
EXEC
Repeated calls to fetch get successive
tuples in the query result

Updates Through Cursors
 Can update tuples fetched by cursor by declaring that the cursor
is for update
declare c cursor for
select *
from account
for update
 To update tuple at the current location of cursor
update account
set balance = balance + 100
where current of c

Dynamic SQL
 Allows programs to construct and
submit SQL queries at run time.
 Example of the use of dynamic SQL
from within a C program.

char * sqlprog = ―update account
set balance =
balance * 1.05
where account-number
= ?‖
EXEC SQL prepare dynprog from
:sqlprog;

ODBC
 Open DataBase Connectivity(ODBC)
standard
 standard for application program to
communicate with a database server.
 application program interface (API) to
 open a connection with a database,
 send queries and updates,
 get back results.
 Applications such as GUI,
spreadsheets, etc. can use ODBC

ODBC (Cont.)
 Each database system supporting ODBC provides a "driver"
library that must be linked with the client program.
 When client program makes an ODBC API call, the code in the
library communicates with the server to carry out the requested
action, and fetch results.
 ODBC program first allocates an SQL environment, then a
database connection handle.
 Opens database connection using SQLConnect(). Parameters for
SQLConnect:
 connection handle,
 the server to which to connect
 the user identifier,
 password
 Must also specify types of arguments:
 SQL_NTS denotes previous argument is a
null-terminated string.

ODBC Code
 int ODBCexample()
{
RETCODE error;
HENV env; /* environment */
HDBC conn; /* database connection */
SQLAllocEnv(&env);
SQLAllocConnect(env, &conn);
SQLConnect(conn, "aura.bell-
labs.com", SQL_NTS, "avi", SQL_NTS, "avipass
wd", SQL_NTS);
{ …. Do actual work … }

SQLDisconnect(conn);
SQLFreeConnect(conn);

ODBC Code (Cont.)
 Program sends SQL commands to the database by using
SQLExecDirect
 Result tuples are fetched using SQLFetch()
 SQLBindCol() binds C language variables to attributes of the query
result
 When a tuple is fetched, its attribute values
are automatically stored in corresponding C
variables.
 Arguments to SQLBindCol()
 ODBC stmt variable, attribute position in query
result
 The type conversion from SQL to C.
 The address of the variable.
 For variable-length types like character arrays,
 The maximum length of the variable
 Location to store actual length when a tuple is
fetched.

ODBC Code (Cont.)
 Main body of program
char branchname[80];
float balance;
int lenOut1, lenOut2;
HSTMT stmt;
SQLAllocStmt(conn, &stmt);
char * sqlquery = "select branch_name, sum
(balance)
from account
group by branch_name";
error = SQLExecDirect(stmt, sqlquery,
SQL_NTS);
if (error == SQL_SUCCESS) {
SQLBindCol(stmt, 1, SQL_C_CHAR,

More ODBC Features
 Prepared Statement
 SQL statement prepared: compiled at the
database
 Can have placeholders: E.g. insert into
account values(?,?,?)
 Repeatedly executed with actual values for the
placeholders
 Metadata features
 finding all the relations in the database and
 finding the names and types of columns of a
query result or a relation in the database.
 By default, each SQL statement is treated
as a separate transaction that is

ODBC Conformance Levels
 Conformance levels specify subsets of
the functionality defined by the
standard.
 Core
 Level 1 requires support for metadata
querying
 Level 2 requires ability to send and
retrieve arrays of parameter values and
more detailed catalog information.
 SQL Call Level Interface (CLI) standard
similar to ODBC interface, but with
some minor differences.

JDBC

 JDBC is a Java API for communicating
with database systems supporting SQL
 JDBC supports a variety of features for
querying and updating data, and for
retrieving query results
 JDBC also supports metadata retrieval,
such as querying about relations
present in the database and the
names and types of relation attributes
 Model for communicating with the

JDBC static void JDBCexample(String dbid,
public
Code
String userid, String passwd)
{
try {
Class.forName ("oracle.jdbc.driver.OracleDriver");
Connection conn =
DriverManager.getConnection(
"jdbc:oracle:thin:@aura.bell-
labs.com:2000:bankdb", userid, passwd);
Statement stmt = conn.createStatement();
… Do Actual Work ….
stmt.close();
conn.close();
}

JDBC Code (Cont.)
 Update to database
try {
stmt.executeUpdate( "insert into account
values
('A-
9732', 'Perryridge', 1200)");
} catch (SQLException sqle) {
System.out.println("Could not insert tuple.
" + sqle);
}
 Execute query and fetch and print
results
ResultSet rset = stmt.executeQuery( "select
branch_name, avg(balance)

JDBC Code Details

 Getting result fields:
 rs.getString(―branchname‖) and
rs.getString(1) equivalent if branchname is
the first argument of select result.
 Dealing with Null values
int a = rs.getInt(―a‖);
if (rs.wasNull()) Systems.out.println(―Got null
value‖);

Prepared Statement
 Prepared statement allows queries to be
compiled and executed multiple times
with different arguments
PreparedStatement pStmt =
conn.prepareStatement(
―insert into
account values(?,?,?)‖); pStmt.setString(1,
"A-9732");
pStmt.setString(2, "Perryridge");
pStmt.setInt(3, 1200);
pStmt.executeUpdate();

pStmt.setString(1, "A-9733");
pStmt.executeUpdate();

Other SQL Features
 SQL sessions
 client connects to an SQL server,
establishing a session
 executes a series of statements
 disconnects the session
 can commit or rollback the work
carried out in the session
 An SQL environment contains
several components, including a
user identifier, and a schema,
which identifies which of several
schemas a session is using.

Schemas, Catalogs, and

Environments for naming
Three-level hierarchy
relations.
 Database contains multiple catalogs
 each catalog can contain multiple schemas
 SQL objects such as relations and views
are contained within a schema
 e.g. catalog5.bank-schema.account
 Each user has a default catalog and
schema, and the combination is
unique to the user.
 Default catalog and schema are set up
for a connection

Procedural Extensions and Stored
Procedures
 SQL provides a module language
 permits definition of procedures in SQL,
with if-then-else statements, for and while
loops, etc.
 more in Chapter 9
 Stored Procedures
 Can store procedures in the database
 then execute them using the call statement
 permit external applications to operate on
the database without knowing about
internal details
 These features are covered in Chapter

Transactions in JDBC

 As with ODBC, each statement gets
committed automatically in JDBC
 To turn off auto commit use
conn.setAutoCommit(false);
 To commit or abort transactions use
conn.commit() or conn.rollback()
 To turn auto commit on again, use
conn.setAutoCommit(true);

Procedure and Function Calls in
JDBC provides a class CallableStatement
 JDBC
which allows SQL stored
procedures/functions to be invoked.
CallableStatement cs1 =
conn.prepareCall( ―{call proc (?,?)}‖ ) ;
CallableStatement cs2 =
conn.prepareCall( ―{? = call func (?,?)}‖ );

Result Set MetaData
 The class ResultSetMetaData provides
information about all the columns of
the ResultSet.
 Instance of this class is obtained by
getMetaData( ) function of ResultSet.
 Provides Functions for getting number
of columns, column name, type,
precision, scale, table from which the
column is derived etc.
ResultSetMetaData rsmd =
rs.getMetaData ( );

Database Meta Data
 The class DatabaseMetaData provides information about database
relations
 Has functions for getting all tables, all columns of the table, primary
keys etc.
 E.g. to print column names and types of a relation
DatabaseMetaData dbmd = conn.getMetaData( );
ResultSet rs = dbmd.getColumns( null, ―BANK-DB‖, ―account‖,
―%‖ );
//Arguments: catalog, schema-pattern, table-pattern,
column-pattern
// Returns: 1 row for each column, with several attributes
such as
// COLUMN_NAME, TYPE_NAME, etc.
while ( rs.next( ) ) {
System.out.println( rs.getString(―COLUMN_NAME‖) ,
rs.getString(―TYPE_NAME‖);
}
 There are also functions for getting information such as

Application Architectures

 Applications can be built using one of
two architectures
 Two tier model
 Application program running at user site
directly uses JDBC/ODBC to communicate
with the database
 Three tier model
 Users/programs running at user sites
communicate with an application server.
The application server in turn
communicates with the database

Two-tier Model
 E.g. Java code runs at client site and
uses JDBC to communicate with the
backend server
 Benefits:
 flexible, need not be restricted to
predefined queries
 Problems:
 Security: passwords available at client site,
all database operation possible
 More code shipped to client
 Not appropriate across organizations, or in
large ones like universities

Three Tier Model
CGI Program

Application/HTTP JDBC Database
Servlets Server
Server

HTTP/Application Specific Protocol

Network

Client Client Client

Three-tier Model (Cont.)
 E.g. Web client + Java Servlet using
JDBC to talk with database server
 Client sends request over http or
application-specific protocol
 Application or Web server receives
request
 Request handled by CGI program or
servlets
 Security handled by application at
server
 Better security

The loan and borrower
Relations

The Result of loan inner join
borrower on loan.loan-number =

The Result of loan left outer join
borrower on loan-number

The Result of loan natural inner
join borrower

Join Types and Join Conditions

The Result of loan natural right
outer join borrower

The Result of loan full outer join
borrower using(loan-number)

SQL Data Definition for Part of the Bank Database

Chapter 5: Other Relational
Languages
 Query-by-Example (QBE)
 Datalog

Query-by-Example (QBE)
 Basic Structure
 Queries on One Relation
 Queries on Several Relations
 The Condition Box
 The Result Relation
 Ordering the Display of Tuples
 Aggregate Operations

QBE — Basic Structure

 A graphical query language which is
based (roughly) on the domain
relational calculus
 Two dimensional syntax – system
creates templates of relations that are
requested by users
 Queries are expressed ―by example‖

QBE Skeleton Tables for the Bank
Example

Queries on One the Perryridge branch.
 Find all loan numbers at
Relation

• _x is a variable (optional; can be omitted in above query)
• P. means print (display)
• duplicates are removed by default
• To retain duplicates use P.ALL

Queries on One Relation (Cont.)
 Display full details of all loans
Method 1:

P._x P._y P._z

Method 2: Shorthand notation

 Find the loan number of all loans with a loan amount of more than $700

 Find names of all branches that are
not located in Brooklyn

 Find the loan numbers of all loans
made jointly to Smith and Jones.

 Find all customers who live in the same city as Jones

Queries on Several Relations
who have a loan from the
Perryridge branch.

Queries on Several Relations
 (Cont.) names of all customers who
Find the
have both an account and a loan at the
bank.

Negation in QBE
have an account at the bank, but do
not have a loan from the bank.

¬ means “there does not exist”

Negation in QBE (Cont.)
 Find all customers who have at
least two accounts.

¬ means “not equal to”

The Condition Box
 Allows the expression of constraints
on domain variables that are either
inconvenient or impossible to
express within the skeleton tables.
 Complex conditions can be used in
condition boxes
 E.g. Find the loan numbers of all
loans made to Smith, to Jones, or to
both jointly

Condition Box (Cont.)

 QBE supports an interesting syntax for
expressing alternative values


Condition Box (Cont.) a balance
Find all account numbers with
between $1,300 and $1,500

Find all account numbers with a balance between $1,300 and
$2,000 but not exactly $1,500.

Condition Box (Cont.)

 Find all branches that have assets
greater than those of at least one
branch located in Brooklyn

The Result Relation

 Find the customer-name, account-
number, and balance for alll
customers who have an account at the
Perryridge branch.
 We need to:
 Join depositor and account.
 Project customer-name, account-number
and balance.
 To accomplish this we:
 Create a skeleton table, called result, with
attributes customer-name, account-

The Result Relation (Cont.)
 The resulting query is:

Ordering the Display of Tuples
 AO = ascending order; DO = descending
order.
 E.g. list in ascending alphabetical order
all customers who have an account at
the bank

 When sorting on multiple attributes, the
sorting order is specified by including
with each sort operator (AO or DO) an

Aggregate Operations
 The aggregate operators are
AVG, MAX, MIN, SUM, and CNT
 The above operators must be
postfixed with ―ALL‖
(e.g., SUM.ALL.or AVG.ALL._x) to
ensure that duplicates are not
eliminated.
 E.g. Find the total balance of all the
accounts maintained at the
Perryridge branch.

Aggregate Operations (Cont.)
 UNQ is used to specify that we want to
eliminate duplicates
 Find the total number of customers
having an account at the bank.

Querythe average balance at each
 Find Examples
branch.

 The “G” in “P.G” is analogous to SQL‟s group by construct
 The “ALL” in the “P.AVG.ALL” entry in the balance column
ensures that all balances are considered
 To find the average account balance at only those branches
where the average account balance is more than $1,200, we
simply add the condition box:

Query Example
Brooklyn.
 Approach: for each customer, find the
number of branches in Brooklyn at which
they have accounts, and compare with
total number of branches in Brooklyn
 QBE does not provide subquery
 In the query on the next page
 CNT.UNQ.ALL._w specifies the number of distinct branches in Brooklyn.
Note: The variable _w is soconnected above tasksthe query to
functionality, not both to other variables in have
be combined in a single query.
 CNT.UNQ.ALL._z specifies the number of distinct branches in Brooklyn at
which customer x has an account.
 Can be done for this query, but there are
queries that require subqueries and cannot
be expressed in QBE always be done.

Deletion
 Deletion of tuples from a relation is
expressed by use of a D. command. In
the case where we delete information
in only some of the columns, null
values, specified by –, are inserted.
 Delete customer Smith

 Delete the branch-city value of the

Deletion Query Examples
 Delete all loans with a loan amount
between $1300 and $1500.
 For consistency, we have to delete
information from loan and borrower tables

Deletion Query Examples
(Cont.)
 Delete all accounts at branches
located in Brooklyn.

Insertion
 Insertion is done by placing the I.
operator in the query expression.
 Insert the fact that account A-
9732 at the Perryridge branch has
a balance of $700.

Insertion (Cont.)
 Provide as a gift for all loan customers of the Perryridge branch, a
new $200 savings account for every loan account they have, with
the loan number serving as the account number for the new
savings account.

Updates
 Use the U. operator to change a value
in a tuple without changing all values
in the tuple. QBE does not allow users
to update the primary key fields.
 Update the asset value of the
Perryridge branch to $10,000,000.

 Increase all balances by 5 percent.

Microsoft Access QBE
 Microsoft Access supports a variant of
QBE called Graphical Query By Example
(GQBE)
 GQBE differs from QBE in the following
ways
 Attributes of relations are listed vertically,
one below the other, instead of
horizontally
 Instead of using variables, lines (links)
between attributes are used to specify that
their values should be the same.
 Links are added automatically on the basis of
attribute name, and the user can then add

An Example Query in Microsoft Access QBE

 Example query: Find the customer-name, account-number and
balance for all accounts at the Perryridge branch

An Aggregation Query in Access QBE
 Find the name, street and city of all customers who have more
than one account at the bank

Aggregation in Access QBE

 The row labeled Total specifies
 which attributes are group by attributes
 which attributes are to be aggregated
upon (and the aggregate function).
 For attributes that are neither group by
nor aggregated, we can still specify
conditions by selecting where in the Total
row and listing the conditions below
 As in SQL, if group by is used, only
group by attributes and aggregate
results can be output

Datalog

 Basic Structure
 Syntax of Datalog Rules
 Semantics of Nonrecursive Datalog
 Safety
 Relational Operations in Datalog
 Recursion in Datalog
 The Power of Recursion

Basic Structure
 Prolog-like logic-based language that
allows recursive queries; based on
first-order logic.
 A Datalog program consists of a set of
rules that define views.
 Example: define a view relation v1
containing account numbers and
balances for accounts at the Perryridge
branch with a balance of over $700.
v1(A, B) :– account(A, ―Perryridge‖,
B), B > 700.
 Retrieve the balance of account

Example Queries
 Each rule defines a set of tuples that a
view relation must contain.
 E.g. v1(A, B) :– account(A, ―Perryridge‖, B), B > 700
is read as
for all A, B
if (A, ―Perryridge‖, B) account and B > 700
then (A, B) v1
 The set of tuples in a view relation is
then defined as the union of all the
sets of tuples defined by the rules for
the view relation.
 Example:
interest-rate(A, 5) :– account(A, N,

Negation in Datalog

 Define a view relation c that contains
the names of all customers who have
a deposit but no loan at the bank:
c(N) :– depositor(N, A), not is-
borrower(N).
is-borrower(N) :–borrower (N,L).
 NOTE: using not borrower (N, L) in
the first rule results in a different
meaning, namely there is some loan L
for which N is not a borrower.

Named Attribute Notation
 Datalog rules use a positional notation,
which is convenient for relations with a
small number of attributes
 It is easy to extend Datalog to support
named attributes.
 E.g., v1 can be defined using named
attributes as
v1(account-number A, balance B) :–
account(account-number A, branch-
name ―Perryridge‖, balance B),
B > 700.

Formal Syntax and Semantics of
Datalog
 We formally define the syntax and
semantics (meaning) of Datalog
programs, in the following steps
1. We define the syntax of predicates, and
then the syntax of rules
2. We define the semantics of individual
rules
3. We define the semantics of non-recursive
programs, based on a layering of rules
4. It is possible to write rules that can
generate an infinite number of tuples in

Syntax of Datalog Rules
 A positive literal has the form
p(t1, t2 ..., tn)
 p is the name of a relation with n
attributes
 each ti is either a constant or variable
 A negative literal has the form
not p(t1, t2 ..., tn)
 Comparison operations are treated as
positive predicates
 E.g. X > Y is treated as a predicate >(X,Y)
 ―>‖ is conceptually an (infinite) relation
that contains all pairs of values such that

Syntax of Datalog Rules (Cont.)

 Rules are built out of literals and have
the form:
p(t1, t2, ..., tn) :– L1, L2, ..., Lm.

head body
 each of the Li‘s is a literal
 head – the literal p(t1, t2, ..., tn)
 body – the rest of the literals
 A fact is a rule with an empty
body, written in the form:

Semantics of a Rule
 A ground instantiation of a rule (or
simply instantiation) is the result of
replacing each variable in the rule by
some constant.
 Eg. Rule defining v1
v1(A,B) :– account (A,―Perryridge‖, B), B > 700.

 An instantiation above rule:
v1(―A-217‖, 750) :–account(―A-217‖, ―Perryridge‖,
750),
750 > 700.

 The body of rule instantiation R‘ is
satisfied in a set of facts (database
instance) l if

Semantics of a Rule (Cont.)
 We define the set of facts that can be
inferred from a given set of facts l using
rule R as:
infer(R, l) = {p(t1, ..., tn) | there is a
ground instantiation R‘ of R
where p(t1, ..., tn ) is
the head of R‘, and
the body of R‘ is
satisfied in l }
 Given an set of rules = {R1, R2, ..., Rn},
we define
infer( , l) = infer(R , l) infer(R , l)

Layering of Rules
 Define the interest on each
account in Perryridge
interest(A, l) :– perryridge-
account(A,B),
interest-
rate(A,R), l = B * R/100.
perryridge-account(A,B) :–
account(A, ―Perryridge‖, B).
interest-rate(A,5) :–account(N, A,
B), B < 10000.
interest-rate(A,6) :–account(N, A,
B), B >= 10000.
 Layering of the view relations

Layering Rules (Cont.)
Formally:

 A relation is a layer 1 if all relations
used in the bodies of rules defining it
are stored in the database.
 A relation is a layer 2 if all relations
used in the bodies of rules defining it
are either stored in the database, or
are in layer 1.
 A relation p is in layer i + 1 if
 it is not in layers 1, 2, ..., i
 all relations used in the bodies of rules

Semantics of a Program
Let the layers in a given program be 1, 2, ..., n. Let i denote the
set of all rules defining view relations in layer i.
 Define I0 = set of facts stored in the database.
 Recursively define li+1 = li infer( i+1, li )
 The set of facts in the view relations defined by the
program (also called the semantics of the program) is
given by the set of facts ln corresponding to the highest
layer n.

Note: Can instead define semantics using view expansion like
in relational algebra, but above definition is better for handling
extensions such as recursion.

Safety

 It is possible to write rules that
generate an infinite number of
answers.
gt(X, Y) :– X > Y
not-in-loan(B, L) :– not loan(B,
L)
To avoid this possibility Datalog rules
must satisfy the following conditions.
 Every variable that appears in the head of
the rule also appears in a non-arithmetic
positive literal in the body of the rule.

Relational Operations in Datalog

 Project out attribute account-name
from account.
query(A) :–account(A, N, B).
 Cartesian product of relations r1 and
r2.
query(X1, X2, ..., Xn, Y1, Y1, Y2, ...,
Ym) :–
r1(X1, X2, ..., Xn), r2(Y1, Y2, ...,
Ym).
 Union of relations r1 and r2.

Updates in Datalog
 Some Datalog extensions support
database modification using + or – in
the rule head to indicate insertion and
deletion.
 E.g. to transfer all accounts at the
Perryridge branch to the Johnstown
branch, we can write
+ account(A, ―Johnstown‖, B) :- account
(A, ―Perryridge‖, B).
– account(A, ―Perryridge‖, B) :- account
(A, ―Perryridge‖, B)

Recursion in Datalog

 Suppose we are given a relation
manager(X, Y)
containing pairs of names X, Y such
that Y is a manager of X (or
equivalently, X is a direct employee of
Y).
 Each manager may have direct
employees, as well as indirect
employees
 Indirect employees of a manager, say
Jones, are employees of people who are

Semantics of Recursion in
Datalog
 Assumption (for now): program
contains no negative literals
 The view relations of a recursive
program containing a set of rules
are defined to contain exactly the set
of facts l
computed by the iterative procedure
Datalog-Fixpoint
procedure Datalog-Fixpoint
l = set of facts in the
database
repeat

Example of Datalog-FixPoint
Iteration

A More General View
 Create a view relation empl that
contains every tuple (X, Y) such that
X is directly or indirectly managed
by Y.
empl(X, Y) :–manager(X, Y).
empl(X, Y) :–
manager(X, Z), empl(Z, Y)
 Find the direct and indirect
employees of Jones.
? empl(X, ―Jones‖).
 Can define the view empl in another

The Power of Recursion
 Recursive views make it possible to
write queries, such as transitive
closure queries, that cannot be written
without recursion or iteration.
 Intuition: Without recursion, a non-
recursive non-iterative program can
perform only a fixed number of joins of
manager with itself
 This can give only a fixed number of levels
of managers
 Given a program we can construct a
database with a greater number of levels of
managers on which the program will not
work

Recursion in SQL

 SQL:1999 permits recursive view
definition
 E.g. query to find all employee-
manager pairs

with recursive empl (emp, mgr ) as (
select emp, mgr
from manager
union
select manager.emp,
empl.mgr

Monotonicity

 A view V is said to be monotonic if
given any two sets of facts
I1 and I2 such that l1 I2, then Ev(I1)
Ev(I2), where Ev is the expression used
to define V.
 A set of rules R is said to be
monotonic if
l1 I2 implies infer(R, I1)
infer(R, I2),
 Relational algebra views defined using

Non-Monotonicity

 Procedure Datalog-Fixpoint is sound
provided the rules in the program are
monotonic.
 Otherwise, it may make some inferences
in an iteration that cannot be made in a
later iteration. E.g. given the rules
a :- not b.
b :- c.
c.
Then a can be inferred initially, before b
is inferred, but not later.

Stratified Negation
 A Datalog program is said to be
stratified if its predicates can be given
layer numbers such that
1. For all positive literals, say q, in the body of
any rule with head, say, p
p(..) :- …., q(..), …
then the layer number of p is greater than
or equal to the layer number of q
2. Given any rule with a negative literal
p(..) :- …, not q(..), …
then the layer number of p is strictly greater
than the layer number of q
 Stratified programs do not have
recursion mixed with negation

Non-Monotonicity (Cont.)

 There are useful queries that cannot
be expressed by a stratified program
 E.g., given information about the number
of each subpart in each part, in a part-
subpart hierarchy, find the total number
of subparts of each part.
 A program to compute the above query
would have to mix aggregation with
recursion
 However, so long as the underlying data
(part-subpart) has no cycles, it is possible
to write a program that mixes aggregation

Forms and Graphical User
Interfaces
 Most naive users interact with
databases using form interfaces with
graphical interaction facilities
 Web interfaces are the most common
kind, but there are many others
 Forms interfaces usually provide
mechanisms to check for correctness of
user input, and automatically fill in fields
given key values
 Most database vendors provide
convenient mechanisms to create forms

Report Generators

 Report generators are tools to
generate human-readable summary
reports from a database
 They integrate database querying with
creation of formatted text and graphical
charts
 Reports can be defined once and executed
periodically to get current information
from the database.
 Example of report (next page)
 Microsoft‘s Object Linking and Embedding

An Example Query in Microsoft Access QBE

An Aggregation Query in Microsoft Access QBE

Chapter 6: Integrity and
Security Constraints
 Domain
 Referential Integrity
 Assertions
 Triggers
 Security
 Authorization
 Authorization in SQL

Domain Constraints
 Integrity constraints guard against accidental damage to
the database, by ensuring that authorized changes to the
database do not result in a loss of data consistency.
 Domain constraints are the most elementary form of
integrity constraint.
 They test values inserted in the database, and test
queries to ensure that the comparisons make sense.
 New domains can be created from existing data types
 E.g. create domain Dollars numeric(12, 2)
create domain Pounds numeric(12,2)
 We cannot assign or compare a value of type Dollars to a
value of type Pounds.
 However, we can convert type as below
(cast r.A as Pounds)
(Should also multiply by the dollar-to-pound conversion-rate)

Domain Constraints (Cont.)
 The check clause in SQL-92 permits domains to be
restricted:
 Use check clause to ensure that an hourly-wage
domain allows only values greater than a specified
value.
create domain hourly-wage numeric(5,2)
constraint value-test check(value > = 4.00)
 The domain has a constraint that ensures that the
hourly-wage is greater than 4.00
 The clause constraint value-test is optional; useful to
indicate which constraint an update violated.
 Can have complex conditions in domain check
 create domain AccountType char(10)
constraint account-type-test
check (value in (‗Checking‘, ‗Saving‘))
 check (branch-name in (select branch-name from
branch))

Referential Integrity
 Ensures that a value that appears in one relation for
a given set of attributes also appears for a certain
set of attributes in another relation.
 Example: If ―Perryridge‖ is a branch name appearing in
one of the tuples in the account relation, then there exists
a tuple in the branch relation for branch ―Perryridge‖.
 Formal Definition
 Let r1(R1) and r2(R2) be relations with primary keys K1 and
K2 respectively.
 The subset of R2 is a foreign key referencing K1 in
relation r1, if for every t2 in r2 there must be a tuple t1 in
r1 such that t1[K1] = t2[ ].
 Referential integrity constraint also called subset
dependency since its can be written as
(r2) K1 (r1)

Referential Integrity in the E-R
Model
 Consider relationship set R between entity
sets E1 and E2. The relational schema for R
includes the primary keys K1 of E1 and K2 of
E2.
Then K1 and K2 form foreign keys on the
E1 R E2

relational schemas for E1 and E2 respectively.

 Weak entity sets are also a source of
referential integrity constraints.
 For the relation schema for a weak entity set must
include the primary key attributes of the entity set
on which it depends

Checking Referential Integrity on
Databasereferential integrity constraint: preserve
Modification
 The following tests must be made in order to
the following
(r2) K (r1)
 Insert. If a tuple t2 is inserted into r2, the system must
ensure that there is a tuple t1 in r1 such that t1[K] =
t2[ ]. That is
t2 [ ] K (r1)
 Delete. If a tuple, t1 is deleted from r1, the system
must compute the set of tuples in r2 that reference t1:
= t1[K] (r2)

If this set is not empty
 either the delete command is rejected as an error, or
 the tuples that reference t1 must themselves be deleted
(cascading deletions are possible).


Database Modification (Cont.)
Update. There are two cases:
 If a tuple t2 is updated in relation r2 and the update
modifies values for foreign key , then a test similar to
the insert case is made:
 Let t2‘ denote the new value of tuple t2. The system must
ensure that
t2‘[ ] K(r1)
 If a tuple t1 is updated in r1, and the update modifies
values for the primary key (K), then a test similar to the
delete case is made:
1. The system must compute
= t1[K] (r2)
using the old value of t1 (the value before the update is
applied).
2. If this set is not empty
1. the update may be rejected as an error, or
2. the update may be cascaded to the tuples in the set, or
3. the tuples in the set may be deleted.

Referential Integrity in SQL
 Primary and candidate keys and foreign keys can be specified as part of the
SQL create table statement:
 The primary key clause lists attributes that
comprise the primary key.
 The unique key clause lists attributes that
comprise a candidate key.
 The foreign key clause lists the attributes that
comprise the foreign key and the name of the
relation referenced by the foreign key.
 By default, a foreign key references the primary key attributes of the
referenced table
foreign key (account-number) references account
 Short form for specifying a single column as foreign key
account-number char (10) references account
 Reference columns in the referenced table can be explicitly specified
 but must be declared as primary/candidate keys
foreign key (account-number) references
account(account-number)

Referential Integrity in SQL –
Example
create table customer
(customer-name char(20),
customer-street char(30),
customer-city char(30),
primary key (customer-name))
create table branch
(branch-name char(15),
branch-city char(30),
assets integer,
primary key (branch-name))

Referential Integrity in SQL – Example
(Cont.)
create table account
(account-number char(10),
branch-name char(15),
balance integer,
primary key (account-number),
foreign key (branch-name)
references branch)
create table depositor
(customer-name char(20),
account-number char(10),
primary key (customer-name,

Cascading Actions in SQL

create table account
...
foreign key(branch-name) references
branch
on delete cascade
on update cascade
...)
 Due to the on delete cascade clauses,
if a delete of a tuple in branch results
in referential-integrity constraint

Cascading Actions in SQL
 If there is a chain of foreign-key
(Cont.)
dependencies across multiple relations,
with on delete cascade specified for
each dependency, a deletion or update
at one end of the chain can propagate
across the entire chain.
 If a cascading update to delete causes a
constraint violation that cannot be
handled by a further cascading
operation, the system aborts the
transaction.
 As a result, all the changes caused by the

Referential Integrity in SQL
(Cont.)
 Alternative to cascading:
 on delete set null
 on delete set default
 Null values in foreign key attributes
complicate SQL referential integrity
semantics, and are best prevented
using not null
 if any attribute of a foreign key is null, the
tuple is defined to satisfy the foreign key
constraint!

Assertions

 An assertion is a predicate expressing
a condition that we wish the database
always to satisfy.
 An assertion in SQL takes the form
create assertion <assertion-name>
check <predicate>
 When an assertion is made, the
system tests it for validity, and tests it
again on every update that may violate
the assertion

Assertion Example

 The sum of all loan amounts for each
branch must be less than the sum of
all account balances at the branch.
create assertion sum-constraint
check
(not exists (select * from branch
where (select
sum(amount) from loan
where
loan.branch-name =

Assertion Example
 Every loan has at least one borrower who
maintains an account with a minimum
balance or $1000.00
create assertion balance-constraint
check
(not exists (
select * from loan
where not exists (
select *
from borrower, depositor, account
where loan.loan-number =

Triggers

 A trigger is a statement that is
executed automatically by the system
as a side effect of a modification to
the database.
 To design a trigger mechanism, we
must:
 Specify the conditions under which the
trigger is to be executed.
 Specify the actions to be taken when the
trigger executes.
 Triggers introduced to SQL standard

Trigger Example

 Suppose that instead of allowing
negative account balances, the bank
deals with overdrafts by
 setting the account balance to zero
 creating a loan in the amount of the
overdraft
 giving this loan a loan number identical to
the account number of the overdrawn
account
 The condition for executing the
trigger is an update to the account

Trigger Example in SQL:1999
create trigger overdraft-trigger after
update on account
referencing new row as nrow
for each row
when nrow.balance < 0
begin atomic
insert into borrower
(select customer-name, account-
number
from depositor
where nrow.account-number =
depositor.account-

Triggering Events and Actions
in Triggering event can be insert, delete or
 SQL
update
 Triggers on update can be restricted to
specific attributes
 E.g. create trigger overdraft-trigger after
update of balance on account
 Values of attributes before and after an
update can be referenced
 referencing old row as : for deletes and
updates
 referencing new row as : for inserts and
updates

Statement Level Triggers

 Instead of executing a separate action
for each affected row, a single action
can be executed for all rows affected
by a transaction
 Use for each statement instead of
for each row
 Use referencing old table or
referencing new table to refer to
temporary tables (called transition tables)
containing the affected rows
 Can be more efficient when dealing with

External World Actions
 We sometimes require external world
actions to be triggered on a database
update
 E.g. re-ordering an item whose quantity in a
warehouse has become small, or turning on
an alarm light,
 Triggers cannot be used to directly
implement external-world actions, BUT
 Triggers can be used to record actions-to-
be-taken in a separate table
 Have an external process that repeatedly
scans the table, carries out external-world
actions and deletes action from table
 E.g. Suppose a warehouse has the

External World Actions (Cont.)
create trigger reorder-trigger after update
of amount on inventory
referencing old row as orow, new row as
nrow
for each row
when nrow.level < = (select level
from minlevel
where minlevel.item
= orow.item)
and orow.level > (select level
from minlevel
where
minlevel.item = orow.item)
begin
insert into orders

Triggers in MS-SQLServer
create trigger overdraft-trigger on
Syntax
account
for update
as
if inserted.balance < 0
begin
insert into borrower
(select customer-name,account-
number
from depositor, inserted
where inserted.account-number
=
depositor.account-
number)

When Not To Use Triggers

Triggers were used earlier for tasks
such as
 maintaining summary data (e.g. total salary of
each department)
 Replicating databases by recording changes
to special relations (called change or delta
relations) and having a separate process that
applies the changes over to a replica
There are better ways of doing these
now:
 Databases today provide built in materialized
view facilities to maintain summary data

Security
 Security - protection from malicious
attempts to steal or modify data.
 Database system level
 Authentication and authorization mechanisms
to allow specific users access only to required
data
 We concentrate on authorization in the rest of
this chapter
 Operating system level
 Operating system super-users can do anything
they want to the database! Good operating
system level security is required.
 Network level: must use encryption to

Security (Cont.)

 Physical level
 Physical access to computers allows
destruction of data by intruders; traditional
lock-and-key security is needed
 Computers must also be protected from
floods, fire, etc.
 More in Chapter 17 (Recovery)
 Human level
 Users must be screened to ensure that an
authorized users do not give access to
intruders
 Users should be trained on password

Authorization
Forms of authorization on parts of the
database:
 Read authorization - allows reading, but

not modification of data.
 Insert authorization - allows insertion of
new data, but not modification of
existing data.
 Update authorization - allows
modification, but not deletion of data.
 Delete authorization - allows deletion of

Authorization (Cont.)
Forms of authorization to modify the
database schema:
 Index authorization - allows creation
and deletion of indices.
 Resources authorization - allows
creation of new relations.
 Alteration authorization - allows
addition or deletion of attributes in a
relation.
 Drop authorization - allows deletion of
relations.

Authorization and Views
 Users can be given authorization on
views, without being given any
authorization on the relations used in
the view definition
 Ability of views to hide data serves
both to simplify usage of the system
and to enhance security by allowing
users access only to data they need for
their job
 A combination or relational-level
security and view-level security can be
used to limit a user‘s access to

View Example
 Suppose a bank clerk needs to know
the names of the customers of each
branch, but is not authorized to see
specific loan information.
 Approach: Deny direct access to the loan
relation, but grant access to the view cust-
loan, which consists only of the names of
customers and the branches at which they
have a loan.
 The cust-loan view is defined in SQL as
follows:
create view cust-loan as
select branchname, customer-

View Example (Cont.)
 The clerk is authorized to see the
result of the query:
select *
from cust-loan
 When the query processor translates
the result into a query on the actual
relations in the database, we obtain a
query on borrower and loan.
 Authorization must be checked on the
clerk‘s query before query processing
replaces a view by the definition of the
view.

Authorization on Views
 Creation of view does not require
resources authorization since no real
relation is being created
 The creator of a view gets only those
privileges that provide no additional
authorization beyond that he already
had.
 E.g. if creator of view cust-loan had
only read authorization on borrower
and loan, he gets only read
authorization on cust-loan

Granting of Privileges
 The passage of authorization from one
user to another may be represented by
an authorization graph.
 The nodes of this graph are the users.
 The root of the graph is the database
administrator.U1 U4
 Consider graph for update
authorization on loan.
DBA U2
 An edge Ui Uj indicates that user Ui
U5

has granted update authorization on
loan to Uj. U3

Authorization Grant Graph
 Requirement: All edges in an
authorization graph must be part of
some path originating with the
database administrator
 If DBA revokes grant from U1:
 Grant must be revoked from U4 since U1
no longer has authorization
 Grant must not be revoked from U5 since
U5 has another authorization path from
DBA through U2
 Must prevent cycles of grants with no
path from the root:

Security Specification in SQL
 The grant statement is used to confer
authorization
grant <privilege list>
on <relation name or view name>
to <user list>
 <user list> is:
 a user-id
 public, which allows all valid users the
privilege granted
 A role (more on this later)
 Granting a privilege on a view does not

Privileges in SQL
 select: allows read access to relation,or the ability to query
using the view
 Example: grant users U1, U2, and U3
select authorization on the branch
relation:
grant select on branch to U1, U2, U3
 insert: the ability to insert tuples
 update: the ability to update using the SQL update statement
 delete: the ability to delete tuples.
 references: ability to declare foreign keys when creating
relations.
 usage: In SQL-92; authorizes a user to use a specified
domain
 all privileges: used as a short form for all the allowable
privileges

Privilege To Grant Privileges
 with grant option: allows a user who is
granted a privilege to pass the
privilege on to other users.
 Example:
grant select on branch to U1 with grant option
gives U1 the select privileges on branch and
allows U1 to grant this
privilege to others

Roles
 Roles permit common privileges for a
class of users can be specified just once
by creating a corresponding ―role‖
 Privileges can be granted to or revoked
from roles, just like user
 Roles can be assigned to users, and
even to other roles
 SQL:1999 supports roles
create role teller
create role manager

grant select on branch to teller
grant update (balance) on account to teller

Revoking Authorization in SQL
 The revoke statement is used to
revoke authorization.
revoke<privilege list>
on <relation name or view name> from
<user list> [restrict|cascade]
 Example:
revoke select on branch from U1, U2, U3
cascade
 Revocation of a privilege from a user
may cause other users also to lose that
privilege; referred to as cascading of
the revoke.

Revoking Authorization in SQL
 (Cont.)
<privilege-list> may be all to revoke
all privileges the revokee may hold.
 If <revokee-list> includes public all
users lose the privilege except those
granted it explicitly.
 If the same privilege was granted twice
to the same user by different grantees,
the user may retain the privilege after
the revocation.
 All privileges that depend on the
privilege being revoked are also
revoked.

Limitations of SQL
Authorization
 SQL does not support authorization at a tuple level
 E.g. we cannot restrict students to see only
(the tuples storing) their own grades
 With the growth in Web access to databases, database accesses
come primarily from application servers.
 End users don't have database user ids,
they are all mapped to the same database
user id
 All end-users of an application (such as a web application) may
be mapped to a single database user
 The task of authorization in above cases falls on the application
program, with no support from SQL
 Benefit: fine grained authorizations, such
as to individual tuples, can be
implemented by the application.
 Drawback: Authorization must be done in

Audit Trails
 An audit trail is a log of all changes
(inserts/deletes/updates) to the
database along with information such
as which user performed the change,
and when the change was performed.
 Used to track erroneous/fraudulent
updates.
 Can be implemented using triggers, but
many database systems provide direct
support.

Encryption
 Data may be encrypted when database
authorization provisions do not offer
sufficient protection.
 Properties of good encryption
technique:
 Relatively simple for authorized users to
encrypt and decrypt data.
 Encryption scheme depends not on the
secrecy of the algorithm but on the
secrecy of a parameter of the algorithm
called the encryption key.
 Extremely difficult for an intruder to
determine the encryption key.

Encryption (Cont.)
 Data Encryption Standard (DES) substitutes characters and
rearranges their order on the basis of an encryption key which is
provided to authorized users via a secure mechanism. Scheme is
no more secure than the key transmission mechanism since the
key has to be shared.
 Advanced Encryption Standard (AES) is a new standard replacing
DES, and is based on the Rijndael algorithm, but is also
dependent on shared secret keys
 Public-key encryption is based on each user having two keys:
 public key – publicly published key used
to encrypt data, but cannot be used to
decrypt data
 private key -- key known only to
individual user, and used to decrypt data.
Need not be transmitted to the site doing
encryption.
Encryption scheme is such that it is impossible or extremely
hard to decrypt data given only the public key.

Authentication
 Password based authentication is
widely used, but is susceptible to
sniffing on a network
 Challenge-response systems avoid
transmission of passwords
 DB sends a (randomly generated) challenge
string to user
 User encrypts string and returns result.
 DB verifies identity by decrypting result
 Can use public-key encryption system by
DB sending a message encrypted using
user‘s public key, and user decrypting and
sending the message back

Digital Certificates
 Digital certificates are used to verify
authenticity of public keys.
 Problem: when you communicate with a
web site, how do you know if you are
talking with the genuine web site or an
imposter?
 Solution: use the public key of the web site
 Problem: how to verify if the public key itself
is genuine?
 Solution:
 Every client (e.g. browser) has public keys of
a few root-level certification authorities
 A site can get its name/URL and public key
signed by a certification authority: signed

Statistical Databases
 Problem: how to ensure privacy of
individuals while allowing use of data
for statistical purposes (e.g., finding
median income, average bank balance
etc.)
 Solutions:
 System rejects any query that involves
fewer than some predetermined number of
individuals.
Still possible to use results of multiple
overlapping queries to deduce data
about an individual

Attempt to Defeat Authorization
Revocation

Physical Level Security
 Protection of equipment from floods, power
failure, etc.
 Protection of disks from theft, erasure, physical
damage, etc.
 Protection of network and terminal cables from
wiretaps non-invasive electronic eavesdropping,
physical damage, etc.
Solutions:
 Replicated hardware:
 mirrored disks, dual busses, etc.
 multiple access paths between every pair of devises
 Physical security: locks,police, etc.
 Software techniques to detect physical security
breaches.

Human Level Security

 Protection from stolen passwords, sabotage,
etc.
 Primarily a management problem:
 Frequent change of passwords
 Use of ―non-guessable‖ passwords
 Log all invalid access attempts
 Data audits
 Careful hiring practices

Operating System Level Security
 Protection from invalid logins
 File-level access protection (often not very helpful for
database security)
 Protection from improper use of ―superuser‖ authority.
 Protection from improper use of privileged machine
intructions.

Network-Level Security

 Each site must ensure that it communicate
with trusted sites (not intruders).
 Links must be protected from theft or
modification of messages
 Mechanisms:
 Identification protocol (password-based),
 Cryptography.

Database-Level Security
 Assume security at network, operating
system, human, and physical levels.
 Database specific issues:
 each user may have authority to read only part
of the data and to write only part of the data.
 User authority may correspond to entire files
or relations, but it may also correspond only
to parts of files or relations.
 Local autonomy suggests site-level
authorization control in a distributed
database.
 Global control suggests centralized
control.

Chapter 7: Relational Database
Design
 First Normal Form
 Pitfalls in Relational Database Design
 Functional Dependencies
 Decomposition
 Boyce-Codd Normal Form
 Third Normal Form
 Multivalued Dependencies and Fourth
Normal Form
 Overall Database Design Process

First Normal Form

 Domain is atomic if its elements are considered to
be indivisible units
 Examples of non-atomic domains:
 Set of names, composite attributes
 Identification numbers like CS101 that can be broken
up into parts
 A relational schema R is in first normal form if the
domains of all attributes of R are atomic
 Non-atomic values complicate storage and
encourage redundant (repeated) storage of data
 E.g. Set of accounts stored with each customer, and
set of owners stored with each account
 We assume all relations are in first normal form (revisit
this in Chapter 9 on Object Relational Databases)

First Normal Form (Contd.)
 Atomicity is actually a property of how the
elements of the domain are used.
 E.g. Strings would normally be considered indivisible
 Suppose that students are given roll numbers which are
strings of the form CS0012 or EE1127
 If the first two characters are extracted to find the
department, the domain of roll numbers is not atomic.
 Doing so is a bad idea: leads to encoding of
information in application program rather than in the
database.

Pitfalls in Relational Database
Design database design requires
 Relational
that we find a ―good‖ collection of
relation schemas. A bad design may
lead to
 Repetition of Information.
 Inability to represent certain information.
 Design Goals:
 Avoid redundant data
 Ensure that relationships among attributes
are represented
 Facilitate the checking of updates for
violation of database integrity constraints.

 Example
Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets,
customer-name, loan-number, amount)

 Redundancy:
 Data for branch-name, branch-city, assets are repeated for each loan that a
branch makes
 Wastes space
 Complicates updating, introducing possibility of inconsistency of assets value
 Null values
 Cannot store information about a branch if no loans exist
 Can use null values, but they are difficult to handle.

Decomposition
 Decompose the relation schema Lending-schema
into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name,
amount)
 All attributes of an original schema (R) must appear
in the decomposition (R1, R2):
R = R 1 R2
 Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)

Example of Non Lossless-Join
Decomposition
 Decomposition of R = (A, B)
R1 = (A) R2 = (B)
A B A B

1 1
2 2
1
A(r)
B(r)
r
A B
A (r) B (r)
1
2
1
2

Goal — Devise a Theory for the
Following
 Decide whether a particular relation R
is in ―good‖ form.
 In the case that a relation R is not in
―good‖ form, decompose it into a set
of relations {R1, R2, ..., Rn} such that
 each relation is in good form
 the decomposition is a lossless-join
decomposition
 Our theory is based on:
 functional dependencies

Functional Dependencies

 Constraints on the set of legal
relations.
 Require that the value for a certain set
of attributes determines uniquely the
value for another set of attributes.
 A functional dependency is a
generalization of the notion of a key.

 Let R be a relation schema

(Cont.)
 The functional dependency
R and R

holds on R if and only if for any legal relations r(R), whenever any two tuples
t1 and t2 of r agree on the attributes , they also agree on the attributes .
That is,
t1[ ] = t2 [ ] t1[ ] = t2 [ ]
 Example: Consider r(A,B) with the following instance of r.

1 4
 On this instance, A B does1NOT hold, but B
5 A does hold.
3 7

(Cont.)
K is a superkey for relation schema R if
and only if K R
K is a candidate key for R if and only if
K R, and
 for no K, R
Functional dependencies allow us to
express constraints that cannot be
expressed using superkeys. Consider
the schema:
Loan-info-schema = (customer-
name, loan-number,
branch-name,
amount).

Use of Functional Dependencies
 We use functional dependencies to:
 test relations to see if they are legal under a
given set of functional dependencies.
 If a relation r is legal under a set F of
functional dependencies, we say that r
satisfies F.
 specify constraints on the set of legal
relations
 We say that F holds on R if all legal relations
on R satisfy the set of functional
dependencies F.
 Note: A specific instance of a relation
schema may satisfy a functional

(Cont.)
 A functional dependency is trivial if it
is satisfied by all instances of a
relation
 E.g.
 customer-name, loan-number
customer-name
 customer-name customer-name
 In general, is trivial if

Closure of a Set of Functional
Dependencies of functional
 Given a set F set
dependencies, there are certain other
functional dependencies that are
logically implied by F.
 E.g. If A B and B C, then we can infer
that A C
 The set of all functional dependencies
logically implied by F is the closure of F.
 We denote the closure of F by F+.
 We can find all of F+ by applying
Armstrong‘s Axioms:
 if , then

Example
 R = (A, B, C, G, H, I)
F={ A B
A C
CG H
CG I
B H}
 some members of F+
 A H
 by transitivity from A B and B H
 AG I
 by augmenting A C with G, to get AG CG
and then transitivity with CG I
 CG HI
 from CG H and CG I : ―union rule‖ can be

Procedure for Computing F+
 To compute the closure of a set of
functional dependencies F:

F+ = F
repeat
for each functional dependency f in
F+
apply reflexivity and
augmentation rules on f
add the resulting functional
dependencies to F+
for each pair of functional

Closure of Functional
Dependencies (Cont.)
 We can further simplify manual
computation of F+ by using the
following additional rules.
 If holds and holds, then
holds (union)
 If holds, then holds and
holds (decomposition)
 If holds and holds, then
holds (pseudotransitivity)
The above rules can be inferred from
Armstrong‘s axioms.

Closure of Attribute Sets

 Given a set of attributes define the
closure of under F (denoted by +)
as the set of attributes that are
functionally determined by under F:
is in F+  +

 Algorithm to compute +, the closure
of under F
result := ;
while (changes to result) do
for each in F do
begin


Example of Attribute Set
R = (A, B, C, G, H, I)

Closure
F = {A
A
CG
B
C
H
CG I
B H}
 (AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG R? == Is (AG)+ R
2. Is any subset of AG a superkey?
1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R

Uses of Attribute Closure
There are several uses of the attribute
closure algorithm:
 Testing for superkey:
 To test ifis a superkey, we compute +,
and check if + contains all attributes of R.
 Testing functional dependencies
 To check if a functional dependency
holds (or, in other words, is in F+), just
check if +.

 That is, we compute + by using attribute
closure, and then check if it contains .
 Is a simple and cheap test, and very useful

Canonical Cover

 Sets of functional dependencies may
have redundant dependencies that can
be inferred from the others
 Eg: A C is redundant in: {A B, B
C, A C}
 Parts of a functional dependency may be
redundant
 E.g. on RHS: {A B, B C, A CD}
can be simplified to
{A B, B C, A D}
 E.g. on LHS: {A B, B C, AC D}
can be simplified to

Extraneous Attributes
 Consider a set F of functional
dependencies and the functional
dependency in F.
 Attribute A is extraneous in if A
and F logically implies (F – { })
{( – A) }.
 Attribute A is extraneous in if A
and the set of functional dependencies
(F – { }) { ( – A)} logically
implies F.
 Note: implication in the opposite
direction is trivial in each of the cases
above, since a ―stronger‖ functional
dependency always implies a weaker
one

Testing if an Attribute is
Extraneous
 Consider a set F of functional
dependencies and the functional
dependency in F.
 To test if attribute A is extraneous
in
1. compute ({ } – A)+ using the dependencies
in F
2. check that ({ } – A)+ contains A; if it does,
A is extraneous
 To test if attribute A is
extraneous in

Canonical Cover
 A canonical cover for F is a set of
dependencies Fc such that
 F logically implies all dependencies in Fc, and
 Fc logically implies all dependencies in F, and
 No functional dependency in Fc contains an
extraneous attribute, and
 Each left side of functional dependency in Fc
is unique.
 To compute a canonical cover for F:
repeat
Use the union rule to replace any
dependencies in F
1 1 and 1 2 with 1

Example of Computing a
 Canonical Cover
R = (A, B, C)
F = {A BC
B C
A B
AB C}
 Combine A BC and A B into A BC
 Set is now {A BC, B C, AB C}
 A is extraneous in AB C
 Check if the result of deleting A from AB C is implied by the other dependencies
 Yes: in fact, B C is already present!
 Set is now {A BC, B C}
 C is extraneous in A BC
 Check if A C is logically implied by A B and the other dependencies
 Yes: using transitivity on A B and B C.
 Can use attribute closure of A in more complex cases
 The canonical cover is: A B
B C

Goals of Normalization
 Decide whether a particular relation R
is in ―good‖ form.
 In the case that a relation R is not in
―good‖ form, decompose it into a set
of relations {R1, R2, ..., Rn} such that
 each relation is in good form
 the decomposition is a lossless-join
decomposition
 Our theory is based on:
 functional dependencies
 multivalued dependencies

Decomposition
 Decompose the relation schema Lending-schema into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name, amount)
 All attributes of an original schema (R) must appear in the decomposition
(R1, R2):
R = R1 R2
 Lossless-join decomposition.
For all possible relations r on schema R
r= R1 (r) R2 (r)
 A decomposition of R into R1 and R2 is lossless join if and only if at least one
+
of the following dependencies is in F :
 R1 R2 R1
 R1 R2 R2

Example of Lossy-Join
Decomposition
 Lossy-join decompositions
result in information loss.
 Example: Decomposition of R =
(A, B)
A B A B

1
R1 = (A) R2 = (B)
1
2 2
1
A(r)
B(r)
r
A B
A (r) B (r)
1
2
1
2

Normalization Using Functional
Dependenciesrelation schema R with a set of functional
 When we decompose a
dependencies F into R1, R2,.., Rn we want
 Lossless-join decomposition: Otherwise
decomposition would result in information
loss.
 No redundancy: The relations Ri preferably
should be in either Boyce-Codd Normal Form
or Third Normal Form.
 Dependency preservation: Let Fi be the set of
dependencies F+ that include only attributes
in Ri.
 Preferably the decomposition should be
dependency preserving, that is, (F1 F2
… Fn)+ = F+

Example
 R = (A, B, C)
F = {A B, B C)
 Can be decomposed in two different
ways
 R1 = (A, B), R2 = (B, C)
 Lossless-join decomposition:
R1 R2 = {B} and B BC
 Dependency preserving
 R1 = (A, B), R2 = (A, C)
 Lossless-join decomposition:
R1 R2 = {A} and A AB

Testing for Dependency
 To Preservation
check if a dependency is
preserved in a decomposition of R into
R1, R2, …, Rn we apply the following
simplified test (with attribute closure
done w.r.t. F)
 result =
while (changes to result) do
for each Ri in the decomposition
t = (result Ri)+ Ri
result = result t
 If result contains all attributes in , then the
functional dependency
is preserved.

Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional
dependencies if for all functional dependencies in F+ of the form
 , where R and R, at least one of the following holds:

  is trivial (i.e., )
 is a superkey for R

Example
 R = (A, B, C)
F = {A B
B C}
Key = {A}
 R is not in BCNF
 Decomposition R1 = (A, B), R2 = (B, C)
 R1 and R2 in BCNF
 Lossless-join decomposition
 Dependency preserving

Testing for BCNF
 To check if a non-trivial dependency
 causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R,
that is, it is a superkey of R.
 Simplified test: To check if a relation
schema R is in BCNF, it suffices to
check only the dependencies in the
given set F for violation of BCNF, rather
than checking all dependencies in F+.
 If none of the dependencies in F causes a
violation of BCNF, then none of the
dependencies in F+ will cause a violation of
BCNF either.

BCNF Decomposition Algorithm
result := {R};
done := false;
compute F+;
while (not done) do

if (there is a schema R in result that is not in BCNF)
i
then begin
let  be a nontrivial functional
dependency that holds on Ri
such that  Ri is not in F+,
and = ;
result := (result – Ri ) (Ri – ) ( , );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.

Example of BCNF
Decomposition branch-city,
R = (branch-name,
assets,
customer-name, loan-number, amount)
F = {branch-name assets branch-city
loan-number amount branch-name}
Key = {loan-number, customer-name}
 Decomposition
 R1 = (branch-name, branch-city, assets)
 R2 = (branch-name, customer-name,
loan-number, amount)
 R3 = (branch-name, loan-number,
amount)

Testing Decomposition for
 BCNF
To check if a relation Ri in a
decomposition of R is in BCNF,
 Either test Ri for BCNF with respect to the
restriction of F to Ri (that is, all FDs in F+
that contain only attributes from Ri)
 or use the original set of dependencies F
that hold on R, but with the following test:
 for every set of attributes Ri, check that +
(the attribute closure of ) either includes no
attribute of Ri- , or includes all attributes of
Ri.
 If the condition is violated by some  in
F, the dependency
 ( + - ) Ri

BCNF and Dependency
Preservationto get a BCNF decomposition that is
It is not always possible
dependency preserving

 R = (J, K, L)
F = {JK L
L K}
Two candidate keys = JK and JL
 R is not in BCNF
 Any decomposition of R will fail
to preserve

JK L

Third Normal Form: Motivation
 There are some situations where
 BCNF is not dependency preserving, and
 efficient checking for FD violation on
updates is important
 Solution: define a weaker normal form,
called Third Normal Form.
 Allows some redundancy (with resultant
problems; we will see examples later)
 But FDs can be checked on individual
relations without computing a join.
 There is always a lossless-join, dependency-
preserving decomposition into 3NF.

Third Normal Form

 A relation schema R is in third normal
form (3NF) if for all:
in F+
at least one of the following holds:
 is trivial (i.e., )
 is a superkey for R
 Each attribute A in – is contained in a
candidate key for R.
(NOTE: each attribute may be in a
different candidate key)

3NF (Cont.)
 Example
 R = (J, K, L)
F = {JK L, L K}
 Two candidate keys: JK and JL
 R is in 3NF
JK L JK is a superkey
L K K is contained in a candidate key
 BCNF decomposition has (JL) and (LK)
 Testing for JK L requires a join
 There is some redundancy in this schema
 Equivalent to example in book:
Banker-schema = (branch-name, customer-name, banker-name)
banker-name branch name
branch name customer-name banker-name

Testing for 3NF

 Optimization: Need to check only FDs
in F, need not check all FDs in F+.
 Use attribute closure to check for each
dependency , if is a superkey.
 If is not a superkey, we have to
verify if each attribute in is
contained in a candidate key of R
 this test is rather more expensive, since it
involve finding candidate keys
 testing for 3NF has been shown to be NP-
hard

3NF Decomposition Algorithm
Let Fc be a canonical cover for F;
i := 0;
for each functional dependency
in Fc do
if none of the schemas Rj, 1 j i
contains
then begin
i := i + 1;
Ri :=
end
if none of the schemas Rj, 1 j i
contains a candidate key for R
then begin

(Cont.)
 Above algorithm ensures:

 each relation schema Ri is in 3NF
 decomposition is dependency preserving and lossless-join
 Proof of correctness is at end of this file (click here)

Example
 Relation schema:
Banker-info-schema = (branch-name,
customer-name,
banker-name, office-
number)
 The functional dependencies for this
relation schema are:
banker-name branch-name
office-number
customer-name branch-name
banker-name
 The key is:

Applying 3NF to Banker-info-
schema
 The for loop in the algorithm
causes us to include the following
schemas in our decomposition:
Banker-office-schema =
(banker-name, branch-name,
office-number)
Banker-schema = (customer-
name, branch-name,
banker-name)

 Since Banker-schema contains a

Comparison of BCNF and 3NF

 It is always possible to decompose a
relation into relations in 3NF and
 the decomposition is lossless
 the dependencies are preserved
 It is always possible to decompose a
relation into relations in BCNF and
 the decomposition is lossless
 it may not be possible to preserve
dependencies.

Comparison of BCNF and 3NF
(Cont.)
 Example of problems due to
redundancy in 3NF
J L K

 R = (J, K, L) k
j l 1 1 1

F = {JKj lL, L K}
j l 2
k 1 1

3
k 1 1

null l2 k2

A schema that is in 3NF but not in BCNF has the problems of
 repetition of information (e.g., the relationship l1, k1)
 need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).

Design Goals

 Goal for a relational database design
is:
 BCNF.
 Lossless join.
 Dependency preservation.
 If we cannot achieve this, we accept
one of
 Lack of dependency preservation
 Redundancy due to use of 3NF
 Interestingly, SQL does not provide a

Testing for FDs Across
Relations
 If decomposition is not dependency preserving, we can have an extra materialized
view for each dependency in Fc that is not preserved in the decomposition
 The materialized view is defined as a projection on of the join of the relations in
the decomposition
 Many newer database systems support materialized views and database system
maintains the view when the relations are updated.
 No extra coding effort for programmer.
 The functional dependency is expressed by declaring as a candidate key on
the materialized view.
 Checking for candidate key cheaper than checking
 BUT:
 Space overhead: for storing the materialized view
 Time overhead: Need to keep materialized view up to date when
relations are updated
 Database system may not support key declarations on
materialized views

Multivalued Dependencies

 There are database schemas in BCNF
that do not seem to be sufficiently
normalized
 Consider a database
classes(course, teacher, book)
such that (c,t,b) classes means that t
is qualified to teach c, and b is a
required textbook for c
 The database is supposed to list for
each course the set of teachers any

(Cont.)course teacher book

database Avi DB Concepts
database Avi Ullman
database Hank DB Concepts
database Hank Ullman
database Sudarshan DB Concepts
database Sudarshan Ullman
operating systems Avi OS Concepts
operating systems Avi Shaw
operating systems Jim OS Concepts
operating systems Jim Shaw

classes

 There are no non-trivial functional
dependencies and therefore the
relation is in BCNF
 Insertion anomalies – i.e., if Sara is a
new teacher that can teach database,

 (Cont.)
Therefore, it is better to
decompose classes into:
course teacher

database Avi
database Hank
database Sudarshan
operating systems Avi
operating systems Jim

teaches
course book

database DB Concepts
database Ullman
operating systems OS Concepts
operating systems Shaw
text

We shall see that these two relations are in Fourth Normal
Form (4NF)

(MVDs)be a relation schema and let
 Let R
R and R. The
multivalued dependency

holds on R if in any legal relation
r(R), for all pairs for tuples t1 and
t2 in r such that t1[ ] = t2 [ ],
there exist tuples t3 and t4 in r
such that:
t1[ ] = t2 [ ] = t3 [ ] = t4
[ ]
t [ ] = t [ ]

MVD (Cont.)
 Tabular representation of

Example

 Let R be a relation schema with a set
of attributes that are partitioned into
3 nonempty subsets.
Y, Z, W
 We say that Y Z (Y multidetermines
Z)
if and only if for all possible relations
r(R)
< y1, z1, w1 > r and < y2, z2, w2
> r

Example (Cont.)
 In our example:
course teacher
course book
 The above formal definition is
supposed to formalize the notion
that given a particular value of Y
(course) it has associated with it a
set of values of Z (teacher) and a
set of values of W (book), and
these two sets are in some sense
independent of each other.
 Note:

Use of Multivalued
Dependencies
 We use multivalued dependencies
in two ways:
1. To test relations to determine
whether they are legal under a given
set of functional and multivalued
dependencies
2. To specify constraints on the set of
legal relations. We shall thus concern
ourselves only with relations that
satisfy a given set of functional and
multivalued dependencies.
 If a relation r fails to satisfy a given

Theory of MVDs

 From the definition of multivalued
dependency, we can derive the
following rule:
 If , then
That is, every functional dependency
is also a multivalued dependency
 The closure D+ of D is the set of all
functional and multivalued
dependencies logically implied by D.
 We can compute D+ from D, using the

Fourth Normal Form
 A relation schema R is in 4NF with
respect to a set D of functional and
multivalued dependencies if for all
multivalued dependencies in D+ of the
form , where R and R, at
least one of the following hold:
 is trivial (i.e., or = R)
 is a superkey for schema R
 If a relation is in 4NF it is in BCNF

Restriction of Multivalued
Dependencies
 The restriction of D to Ri is the set Di
consisting of
 All functional dependencies in D+ that
include only attributes of Ri
 All multivalued dependencies of the form
( Ri)
where Ri and is in D+

result: = {R};
done := false;
compute D+;
Let D denote the restriction of D+ to R
i i

while (not done)
if (there is a schema Ri in result that
is not in 4NF) then
begin
let be a nontrivial multivalued
dependency that holds
on Ri such that Ri is not in
D , and
i ;

 Example
R =(A, B, C, G, H, I)

F ={ A B
B HI
CG H}
 R is not in 4NF since A B and A is not a superkey for R
 Decomposition
a) R1 = (A, B) (R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in 4NF)
c) R3 = (C, G, H) (R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF)
 Since A B and B HI, A HI, A I
e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)

Further Normal Forms
 Join dependencies generalize
multivalued dependencies
 lead to project-join normal form (PJNF)
(also called fifth normal form)
 A class of even more general
constraints, leads to a normal form
called domain-key normal form.
 Problem with these generalized
constraints: are hard to reason with,
and no set of sound and complete set
of inference rules exists.

Overall Database Design
 Process
We have assumed schema R is given
 R could have been generated when
converting E-R diagram to a set of tables.
 R could have been a single relation
containing all attributes that are of interest
(called universal relation).
 Normalization breaks R into smaller
relations.
 R could have been the result of some ad hoc
design of relations, which we then
test/convert to normal form.

ER Model and Normalization
 When an E-R diagram is carefully
designed, identifying all entities correctly,
the tables generated from the E-R
diagram should not need further
normalization.
 However, in a real (imperfect) design
there can be FDs from non-key attributes
of an entity to other attributes of the
entity
 E.g. employee entity with attributes
department-number and department-
address, and an FD department-number

Universal Relation Approach
 Dangling tuples – Tuples that
―disappear‖ in computing a join.
 Let r1 (R1), r2 (R2), …., rn (Rn) be a set of
relations
 A tuple r of the relation ri is a dangling
tuple if r is not in the relation:
Ri (r1 r2 … rn)
 The relation r1 r2
… rn is called a
universal relation since it involves all
the attributes in the ―universe‖ defined
by
R1 R2 … Rn


 Dangling tuples may occur in practical
database applications.
 They represent incomplete
information
 E.g. may want to break up information
about loans into:
(branch-name, loan-number)
(loan-number, amount)
(loan-number, customer-name)
 Universal relation would require null

 (Contd.) decomposition defines a
A particular
restricted form of incomplete
information that is acceptable in our
database.
 Above decomposition requires at least one
of customer-name, branch-name or
amount in order to enter a loan number
without using null values
 Rules out storing of customer-name,
amount without an appropriate loan-
number (since it is a key, it can't be null
either!)
 Universal relation requires unique

Denormalization for
 Performance non-normalized
May want to use
schema for performance
 E.g. displaying customer-name along
with account-number and balance
requires join of account with depositor
 Alternative 1: Use denormalized
relation containing attributes of
account as well as depositor with all
above attributes
 faster lookup
 Extra space and extra execution time for
updates

Other Design Issues

 Some aspects of database design are
not caught by normalization
 Examples of bad database design, to
be avoided:
Instead of earnings(company-id, year,
amount), use
 earnings-2000, earnings-2001, earnings-
2002, etc., all on the schema (company-
id, earnings).
 Above are in BCNF, but make querying
across years difficult and needs new table
each year

Correctness of 3NF
Decomposition Algorithm
 3NF decomposition algorithm is
dependency preserving (since there is
a relation for every FD in Fc)
 Decomposition is lossless join
 A candidate key (C) is in one of the
relations Ri in decomposition
 Closure of candidate key under Fc must
contain all attributes in R.
 Follow the steps of attribute closure
algorithm to show there is only one tuple
in the join result for each tuple in Ri

Correctness of 3NF
Decomposition Algorithm
(Contd.) a relation Ri is in the
Claim: if
decomposition generated by the
above algorithm, then Ri satisfies 3NF.
 Let Ri be generated from the
dependency
 Let B be any non-trivial functional
dependency on Ri. (We need only
consider FDs whose right-hand side is
a single attribute.)
 Now, B can be in either or but not

Correctness of 3NF
Decomposition (Contd.)
 Case 1: If B in :
 If is a superkey, the 2nd condition of 3NF
is satisfied
 Otherwise must contain some attribute
not in
 Since B is in F+ it must be derivable
from Fc, by using attribute closure on .
 Attribute closure not have used - if it
had been used, must be contained in the
attribute closure of , which is not
possible, since we assumed is not a
superkey.

Correctness of 3NF
Decomposition (Contd.)
 Case 2: B is in .
 Since is a candidate key, the third
alternative in the definition of 3NF is
trivially satisfied.
 In fact, we cannot show that is a
superkey.
 This shows exactly why the third
alternative is present in the definition of
3NF.
Q.E.D.

The Relation branch-customer customer-loan

Relation bc: An Example of Reduncy in a BCNF Relation

Chapter 8: Object-Oriented
Databases Complex Data Types
 Need for
 The Object-Oriented Data Model
 Object-Oriented Languages
 Persistent Programming
Languages
 Persistent C++ Systems

Need for Complex Data Types
 Traditional database applications in data
processing had conceptually simple data types
 Relatively few data types, first normal form holds
 Complex data types have grown more important
in recent years
 E.g. Addresses can be viewed as a
 Single string, or
 Separate attributes for each part, or
 Composite attributes (which are not in first normal
form)
 E.g. it is often convenient to store multivalued
attributes as-is, without creating a separate relation to
store the values in first normal form
 Applications
 computer-aided design, computer-aided software
engineering
 multimedia and image databases, and
document/hypertext databases.

Object-Oriented Data Model
 Loosely speaking, an object
corresponds to an entity in the E-R
model.
 The object-oriented paradigm is based
on encapsulating code and data related
to an object into single unit.
 The object-oriented data model is a
logical data model (like the E-R model).
 Adaptation of the object-oriented
programming paradigm (e.g., Smalltalk,
C++) to database systems.

Object Structure
 An object has associated with it:
 A set of variables that contain the data for the object.
The value of each variable is itself an object.
 A set of messages to which the object responds; each
message may have zero, one, or more parameters.
 A set of methods, each of which is a body of code to
implement a message; a method returns a value as the
response to the message
 The physical representation of data is visible only to
the implementor of the object
 Messages and responses provide the only external
interface to an object.
 The term message does not necessarily imply
physical message passing. Messages can be
implemented as procedure invocations.

Messages and Methods
 Methods are programs written in general-purpose
language with the following features
 only variables in the object itself may be referenced directly
 data in other objects are referenced only by sending
messages.
 Methods can be read-only or update methods
 Read-only methods do not change the value of the object
 Strictly speaking, every attribute of an entity must be
represented by a variable and two methods, one to
read and the other to update the attribute
 e.g., the attribute address is represented by a variable
address and two messages get-address and set-address.
 For convenience, many object-oriented data models permit
direct access to variables of other objects.

Object Classes

 Similar objects are grouped into a class;
each such object is called an instance of its
class
 All objects in a class have the same
 Variables, with the same types
 message interface
 methods
The may differ in the values assigned to variables
 Example: Group objects for people into a
person class
 Classes are analogous to entity sets in the
E-R model

Class Definition Example
class employee {
/*Variables */
string name;
string address;
date start-date;
int salary;
/* Messages */
int annual-salary();
string get-name();
string get-address();
int set-address(string new-address);
int employment-length();
};
 Methods to read and set the other
variables are also needed with strict
encapsulation
 Methods are defined separately
 E.g. int employment-length() { return today() –
start-date;}
int set-address(string new-address) {
address = new-address;}

Inheritance
 E.g., class of bank customers is similar to class of
bank employees, although there are differences
 both share some variables and messages, e.g., name
and address.
 But there are variables and messages specific to each
class e.g., salary for employees and credit-rating for
customers.
 Every employee is a person; thus employee is a
specialization of person
 Similarly, customer is a specialization of person.
 Create classes person, employee and customer
 variables/messages applicable to all persons associated
with class person.
 variables/messages specific to employees associated
with class employee; similarly for customer

Inheritance (Cont.)
 Place classes into a
specialization/IS-A hierarchy
 variables/messages belonging to
class person are inherited by class
employee as well as customer
 Result is a class hierarchy

Note analogy with ISA Hierarchy in the E-R model

Class Hierarchy Definition
class person{
string name;
string address:
};
class customer isa person {
int credit-rating;
};
class employee isa person {
date start-date;
int salary;
.
.
. };
class officer isa employee {

Class Hierarchy Example (Cont.)
 Full variable list for objects in the class officer:

 office-number, expense-account-number:
defined locally
 start-date, salary: inherited from
employee
 name, address: inherited from person
 Methods inherited similar to variables.
 Substitutability — any method of a class, say person, can be
invoked equally well with any object belonging to any subclass,
such as subclass officer of person.
 Class extent: set of all objects in the class. Two options:

1.Class extent of employee includes all
officer, teller and secretary objects.
2. Class extent of employee includes only

Example of Multiple Inheritance

Class DAG for banking example.


Multiple Inheritance more than one
With multiple inheritance a class may have
superclass.
 The class/subclass relationship is
represented by a directed acyclic graph
(DAG)
 Particularly useful when objects can be
classified in more than one way, which are
independent of each other
 E.g. temporary/permanent is independent of
Officer/secretary/teller
 Create a subclass for each combination of
subclasses
 Need not create subclasses for combinations
that are not possible in the database being
modeled
 A class inherits variables and methods from all its superclasses
 There is potential for ambiguity when a variable/message N with

More Examples of Multiple
Inheritance an object can belong to
Conceptually,
each of several subclasses
 A person can play the roles of student, a
teacher or footballPlayer, or any
combination of the three
 E.g., student teaching assistant who also
play football
 Can use multiple inheritance to model
―roles‖ of an object
 That is, allow an object to take on any one
or more of a set of types
 But many systems insist an object
should have a most-specific class

Object Identity
 An object retains its identity even if
some or all of the values of
variables or definitions of methods
change over time.
 Object identity is a stronger notion
of identity than in programming
languages or data models not
based on object orientation.
 Value – data value; e.g. primary key
value used in relational systems.
 Name – supplied by user; used for
variables in procedures.

Object Identifiers
 Object identifiers used to uniquely
identify objects
 Object identifiers are unique:
 no two objects have the same identifier
 each object has only one object identifier
 E.g., the spouse field of a person object
may be an identifier of another person
object.
 can be stored as a field of an object, to
refer to another object.
 Can be
 system generated (created by database) or
 external (such as social-security number)

Object Containment

 Each component in a design may
contain other components
 Can be modeled as containment of
objects. Objects containing; other
objects are called composite objects.
 Multiple levels of containment create a
containment hierarchy

Object-Oriented Languages
 Object-oriented concepts can be
used in different ways
 Object-orientation can be used as a
design tool, and be encoded into, for
example, a relational database
 analogous to modeling data with E-R
diagram and then converting to a set
of relations)
 The concepts of object orientation can
be incorporated into a programming
language that is used to manipulate
the database.
 Object-relational systems – add
complex types and object-orientation

Persistent Programming
 Persistent Programming languages
Languages to be created and stored
allow objects
in a database, and used directly from a
programming language
 allow data to be manipulated directly from
the programming language
 No need to go through SQL.
 No need for explicit format (type) changes
 format changes are carried out transparently
by system
 Without a persistent programming language,
format changes becomes a burden on the
programmer
 More code to be written
 More chance of bugs

Persistent Prog. Languages
(Cont.)
 Drawbacks of persistent programming
languages
 Due to power of most programming
languages, it is easy to make
programming errors that damage the
database.
 Complexity of languages makes automatic
high-level optimization more difficult.
 Do not support declarative querying as
well as relational databases

Persistence of Objects

 Approaches to make transient objects
persistent include establishing
 Persistence by Class – declare all objects
of a class to be persistent; simple but
inflexible.
 Persistence by Creation – extend the
syntax for creating objects to specify that
that an object is persistent.
 Persistence by Marking – an object that is
to persist beyond program execution is
marked as persistent before program
termination.

Object Identity and Pointers
 A persistent object is assigned a
persistent object identifier.
 Degrees of permanence of identity:
 Intraprocedure – identity persists only
during the executions of a single
procedure
 Intraprogram – identity persists only
during execution of a single program or
query.
 Interprogram – identity persists from
one program execution to another, but
may change if the storage organization
is changed

Object Identity and Pointers
(Cont.) languages such as C++, an
 In O-O
object identifier is actually an in-
memory pointer.
 Persistent pointer – persists
beyond program execution
 can be thought of as a pointer into
the database
 E.g. specify file identifier and offset
into the file
 Problems due to database
reorganization have to be dealt with
by keeping forwarding pointers

Storage and Access of Persistent
Objects
How to find objects in the database:
 Name objects (as you would name
files)
 Cannot scale to large number of objects.
 Typically given only to class extents and
other collections of objects, but not
objects.
 Expose object identifiers or
persistent pointers to the objects
 Can be stored externally.
 All objects have object identifiers.
 Store collections of objects, and
allow programs to iterate over the

Persistent C++ Systems
 C++ language allows support for
persistence to be added without
changing the language
 Declare a class called Persistent_Object with
attributes and methods to support
persistence
 Overloading – ability to redefine standard
function names and operators (i.e., +, –,
the pointer deference operator –>) when
applied to new types
 Template classes help to build a type-safe
type system supporting collections and
persistent types.
 Providing persistence without

ODMG C++ Object Definition
 The Object Database Management
Language industry consortium
Group is an
aimed at standardizing object-
oriented databases
 in particular persistent programming
languages
 Includes standards for C++, Smalltalk and
Java
 ODMG-93
 ODMG-2.0 and 3.0 (which is 2.0 plus
extensions to Java)
 Our description based on ODMG-2.0
 ODMG C++ standard avoids changes

ODMG Types

 Template class d_Ref<class> used to
specify references (persistent
pointers)
 Template class d_Set<class> used to
define sets of objects.
 Methods include insert_element(e) and
delete_element(e)
 Other collection classes such as d_Bag
(set with duplicates allowed), d_List
and d_Varray (variable length array)

ODMG C++ ODL: Example
class Branch : public d_Object {
….
}
class Person : public d_Object {
public:
d_String name; // should not use String!
d_String address;
};
class Account : public d_Object {
private:
d_Long balance;
public:
d_Long number;
d_Set <d_Ref<Customer>> owners;
int find_balance();
int update_balance(int delta);
};

ODMG C++ ODL: Example
(Cont.)
class Customer : public Person {
public:
d_Date member_from;
d_Long customer_id;
d_Ref<Branch> home_branch;
d_Set <d_Ref<Account>> accounts; };

Implementing Relationships

 Relationships between classes
implemented by references
 Special reference types enforces
integrity by adding/removing inverse
links.
 Type d_Rel_Ref<Class, InvRef> is a
reference to Class, where attribute InvRef
of Class is the inverse reference.
 Similarly, d_Rel_Set<Class, InvRef> is used
for a set of references
 Assignment method (=) of class
d_Rel_Ref is overloaded

Implementing Relationships
 E.g.
extern const char _owners[ ], _accounts[
];
class Account : public d.Object {
….
d_Rel_Set <Customer, _accounts>
owners;
}
// .. Since strings can‟t be used in
templates …
const char _owners= “owners”;
const char _accounts= “accounts”;

ODMG C++ Object Manipulation
Language
 Uses persistent versions of C++
operators such as new(db)
d_Ref<Account> account = new(bank_db,
“Account”) Account;
 new allocates the object in the specified
database, rather than in memory.
 The second argument (―Account‖) gives
typename used in the database.
 Dereference operator -> when applied
on a d_Ref<Account> reference loads
the referenced object in memory (if not
already present) before continuing with
usual C++ dereference.
 Constructor for a class – a special

ODMG C++OML: Database and
Object Functions
 Class d_Database provides methods
to
 open a database:
open(databasename)
 give names to objects:
set_object_name(object, name)
 look up objects by name:
lookup_object(name)
 rename objects:
rename_object(oldname, newname)
 close a database (close());
 Class d_Object is inherited by all

ODMG C++ OML: Example
int create_account_owner(String name, String
Address){
Database bank_db.obj;
Database * bank_db= & bank_db.obj;
bank_db =>open(“Bank-DB”);
d.Transaction Trans;
Trans.begin();

d_Ref<Account> account = new(bank_db)
Account;
d_Ref<Customer> cust = new(bank_db)
Customer;

ODMG C++ OML: Example
 (Cont.)
Class extents maintained automatically
in the database.
 To access a class extent:
d_Extent<Customer>
customerExtent(bank_db);
 Class d_Extent provides method
d_Iterator<T> create_iterator()
to create an iterator on the class extent
 Also provides select(pred) method to
return iterator on objects that satisfy
selection predicate pred.

ODMG C++ OML: Example of
Iterators
int print_customers() {
Database bank_db_obj;
Database * bank_db = &bank_db_obj;
bank_db->open (“Bank-DB”);
d_Transaction Trans; Trans.begin ();

d_Extent<Customer>
all_customers(bank_db);
d_Iterator<d_Ref<Customer>> iter;
iter = all_customers–>create_iterator();
d_Ref <Customer> p;
while{iter.next (p))

ODMG C++ Binding: Other
Features
 Declarative query language OQL, looks
like SQL
 Form query as a string, and execute it to
get a set of results (actually a bag, since
duplicates may be present)
d_Set<d_Ref<Account>> result;
d_OQL_Query q1("select a
from Customer c,
c.accounts a
where c.name=„Jones‟
and
a.find_balance() > 100");
d_oql_execute(q1, result);

Making Pointer Persistence
Transparent
 Drawback of the ODMG C++
approach:
 Two types of pointers
 Programmer has to ensure
mark_modified() is called, else database
can become corrupted
 ObjectStore approach
 Uses exactly the same pointer type for in-
memory and database objects
 Persistence is transparent applications
 Except when creating objects
 Same functions can be used on in-

Persistent Java Systems
 ODMG-3.0 defines extensions to Java
for persistence
 Java does not support templates, so
language extensions are required
 Model for persistence: persistence by
reachability
 Matches Java‘s garbage collection model
 Garbage collection needed on the database
also
 Only one pointer type for transient and
persistent pointers
 Class is made persistence capable by

ODMG Java

 Transaction must start accessing
database from one of the root object
(looked up by name)
 finds other objects by following pointers from
the root objects
 Objects referred to from a fetched object
are allocated space in memory, but not
necessarily fetched
 Fetching can be done lazily
 An object with space allocated but not yet
fetched is called a hollow object
 When a hollow object is accessed, its data is
fetched from disk.

Specialization Hierarchy for the
Bank Example

Class Hierarchy Corresponding
to Figure 8.2

Class DAG for the Bank Example

Containment Hierarchy for Bicycle-Design Database

Chapter 9: Object-Relational
Databases
 Nested Relations
 Complex Types and Object
Orientation
 Querying with Complex Types
 Creation of Complex Values and
Objects
 Comparison of Object-Oriented and
Object-Relational Databases

Object-Relational Data Models

 Extend the relational data model by
including object orientation and
constructs to deal with added data types.
 Allow attributes of tuples to have
complex types, including non-atomic
values such as nested relations.
 Preserve relational foundations, in
particular the declarative access to data,
while extending modeling power.
 Upward compatibility with existing
relational languages.

Nested Relations
 Motivation:
 Permit non-atomic domains (atomic
indivisible)
 Example of non-atomic domain: set of
integers,or set of tuples
 Allows more intuitive modeling for applications
with complex data
 Intuitive definition:
 allow relations whenever we allow atomic
(scalar) values — relations within relations
 Retains mathematical foundation of relational
model
 Violates first normal form.

Example of a Nested Relation
 Example: library information system
 Each book has
 title,
 a set of authors,
 Publisher, and
 a set of keywords
 Non-1NF relation books

1NF Version of Nested Relation
 1NF version of books

flat-books

4NF Decomposition of by assuming
 Remove awkwardness of flat-books
Nested
Relation
that the following multivalued dependencies hold:
 title author
 title keyword
 title pub-name, pub-branch
 Decompose flat-doc into 4NF using the schemas:
 (title, author)
 (title, keyword)
 (title, pub-name, pub-branch)

4NF Decomposition of flat–
books

Problems with 4NF Schema

 4NF design requires users to include
joins in their queries.
 1NF relational view flat-books defined
by join of 4NF relations:
 eliminates the need for users to perform
joins,
 but loses the one-to-one correspondence
between tuples and documents.
 And has a large amount of redundancy
 Nested relations representation is
much more natural here.

Complex Types and SQL:1999
 Extensions to SQL to support
complex types include:
 Collection and large object types
 Nested relations are an example of
collection types
 Structured types
 Nested record structures like composite
attributes
 Inheritance
 Object orientation
 Including object identifiers and
references
 Our description is mainly based on
the SQL:1999 standard

Collection Types
 Set type (not in SQL:1999)
create table books (
…..
keyword-set setof(varchar(20))
……
)

 Sets are an instance of collection types.
Other instances include
 Arrays (are supported in SQL:1999)
 E.g. author-array varchar(20) array[10]
 Can access elements of array in usual fashion:
 E.g. author-array[1]
 Multisets (not supported in SQL:1999)
 I.e., unordered collections, where an element may
occur multiple times
 Nested relations are sets of tuples

Large Object Types
 Large object types
 clob: Character large objects

book-review clob(10KB)
 blob: binary large objects

image blob(10MB)
movie blob (2GB)
 JDBC/ODBC provide special methods to
access large objects in small pieces
 Similar to accessing operating system files
 Application retrieves a locator for the large
object and then manipulates the large object
from the host language

Structured and Collection Types
 Structured types can be declared and used in SQL
create type Publisher as
(name varchar(20),
branch varchar(20))
create type Book as
(title varchar(20),
author-array varchar(20) array [10],
pub-date date,
publisher Publisher,
keyword-set setof(varchar(20)))

 Note: setof declaration of keyword-set is
not supported by SQL:1999
 Using an array to store authors lets us
record the order of the authors
 Structured types can be used to create tables
create table books of Book

Structured and Collection Types
(Cont.)
 Structured types allow composite
attributes of E-R diagrams to be
represented directly.
 Unnamed row types can also be used
in SQL:1999 to define composite
attributes
 E.g. we can omit the declaration of type
Publisher and instead use the following in
declaring the type Book
publisher row (name varchar(20),
branch
varchar(20))
 Similarly, collection types allow

 Structured Typesan intermediate type
We can create tables without creating (Cont.)
 For example, the table books could also be defined
as follows:
create table books
(title varchar(20),
author-array varchar(20) array[10],
pub-date date,
publisher Publisher
keyword-list setof(varchar(20)))
 Methods can be part of the type definition of a structured type:
create type Employee as (
name varchar(20),
salary integer)
method giveraise (percent integer)
 We create the method body separately
create method giveraise (percent integer) for Employee
begin
set self.salary = self.salary + (self.salary * percent) / 100;
end

 Creation of Values of Complex
Values of structured types are created using
Types functions
constructor
 E.g. Publisher(„McGraw-Hill‟, „New York‟)
 Note: a value is not an object

 SQL:1999 constructor functions
 E.g.
create function Publisher (n varchar(20), b varchar(20))
returns Publisher
begin
set name=n;
set branch=b;
end
 Every structured type has a default constructor with no arguments,
others can be defined as required
 Values of row type can be constructed by
listing values in parantheses
 E.g. given row type row (name varchar(20),

Creation of Values of Complex
 Array construction
Types array
[„Silberschatz‟,`Korth‟,`Sudarshan‟]
 Set value attributes (not supported in
SQL:1999)
 set( v1, v2, …, vn)

 To create a tuple of the books relation
(„Compilers‟,
array[`Smith‟,`Jones‟],
Publisher(`McGraw-Hill‟,`New
York‟),
set(`parsing‟,`analysis‟))
 To insert the preceding tuple into the
relation books

 Inheritance we have the following type
Suppose that
definition for people:
create type Person
(name varchar(20),
address varchar(20))
 Using inheritance to define the student and
teacher types
create type Student
under Person
(degree varchar(20),
department varchar(20))
create type Teacher
under Person

Multiple Inheritance
 SQL:1999 does not support multiple
inheritance
 If our type system supports multiple
inheritance, we can define a type for
teaching assistant as follows:
create type Teaching Assistant
under Student, Teacher
 To avoid a conflict between the two
occurrences of department we can rename
them
create type Teaching Assistant
under
Student with (department as

Table Inheritance
 Table inheritance allows an object to
have multiple types by allowing an
entity to exist in more than one table
at once.
 E.g. people table: create table people of Person
 We can then define the students and
teachers tables as subtables of people
create table students of Student
under people
create table teachers of Teacher
under people

 Each tuple in a subtable (e.g. students
and teachers) is implicitly present in its

Table Inheritance: Roles
 Table inheritance is useful for
modeling roles
 permits a value to have multiple types,
without having a
most-specific type (unlike type
inheritance).
 e.g., an object can be in the students and
teachers subtables simultaneously, without
having to be in a subtable student-
teachers that is under both students and
teachers
 object can gain/lose roles: corresponds to

Table Inheritance: Consistency
Requirements
 Consistency requirements on
subtables and supertables.
 Each tuple of the supertable (e.g. people)
can correspond to at most one tuple in
each of the subtables (e.g. students and
teachers)
 Additional constraint in SQL:1999:
All tuples corresponding to each other
(that is, with the same values for inherited
attributes) must be derived from one tuple
(inserted into one table).
 That is, each entity must have a most
specific type

Table Inheritance: Storage
Alternatives
 Storage alternatives
1.Store only local attributes and the primary
key of the supertable in subtable
 Inherited attributes derived by means of a
join with the supertable
2.Each table stores all inherited and locally
defined attributes
 Supertables implicitly contain (inherited
attributes of) all tuples in their subtables
 Access to all attributes of a tuple is faster:
no join required
 If entities must have most specific type,

Reference Types
 Object-oriented languages provide the
ability to create and refer to objects.
 In SQL:1999
 References are to tuples, and
 References must be scoped,
 I.e., can only point to tuples in one specified
table
 We will study how to define references
first, and later see how to use references

Reference Declaration in
 E.g. define a type Department with a
SQL:1999 and a field head which is a
field name
reference to the type Person, with
table people as scope
create type Department(
name varchar(20),
head ref(Person) scope people)
 We can then create a table
departments as follows
create table departments of
Department
 We can omit the declaration scope

Initializing Reference Typed
 In Oracle, to create a tuple with a
Values value, we can first create the
reference
tuple with a null reference and then
set the reference separately by using
the function ref(p) applied to a tuple
variable
 E.g. to create a department with name
CS and head being the person named
John, we use
insert into departments
values (`CS‘, null)
update departments
set head = (select ref(p)

Initializing Reference Typed
 SQL:1999 does not support the ref()
Values (Cont.)
function, and instead requires a
special attribute to be declared to
store the object identifier
 The self-referential attribute is
declared by adding a ref is clause to
the create table statement:
create table people of Person
ref is oid system generated
 Here, oid is an attribute name, not a
keyword.
 To get the reference to a tuple, the
subquery shown earlier would use

User Generated Identifiers
 SQL:1999 allows object identifiers to be
user-generated
 The type of the object-identifier must be
specified as part of the type definition of the
referenced table, and
 The table definition must specify that the
reference is user generated
 E.g.
create type Person
(name varchar(20)
ref using varchar(20)
create table people of Person
ref is oid user generated

User Generated Identifiers value
 We can then use the identifier
(Cont.) inserting a tuple into
when
departments
 Avoids need for a separate query to
retrieve the identifier:
E.g. insert into departments
values(`CS‘, `02184567‘)
 It is even possible to use an existing
primary key value as the identifier, by
including the ref from clause, and
declaring the reference to be derived
create type Person
(name varchar(20) primary key,

Path Expressions
 Find the names and addresses of the
heads of all departments:
select head –>name, head –
>address
from departments
 An expression such as ―head–>name‖
is called a path expression
 Path expressions help avoid explicit
joins
 If department head were not a reference, a
join of departments with people would be
required to get at the address
 Makes expressing the query much easier
for the user

Querying with Structured Types
 Find the title and the name of the
publisher of each book.
select title, publisher.name
from books
Note the use of the dot notation to access fields of the composite
attribute (structured type) publisher

Collection-Value Attributes
 Collection-valued attributes can be
treated much like relations, using the
keyword unnest
 The books relation has array-valued
attribute author-array and set-valued
attribute keyword-set
 To find all books that have the word
―database‖ as one of their keywords,

select title
from books
where ‗database‘ in (unnest(keyword-
set))
 Note: Above syntax is valid in SQL:1999, but

Collection Valued Attributes
 We can access individual elements of
(Cont.) by using indices
an array
 E.g. If we know that a particular book has
three authors, we could write:
select author-array[1], author-
array[2], author-array[3]
from books
where title = `Database System
Concepts‘

Unnesting
 The transformation of a nested relation
into a form with fewer (or no) relation-
valued attributes us called unnesting.
 E.g.
select title, A as author, publisher.name
as pub_name,
publisher.branch as pub_branch,
K as keyword
from books as B, unnest(B.author-
array) as A, unnest (B.keyword-list) as K


Nesting
Nesting is the opposite of unnesting, creating a collection-valued
attribute
 NOTE: SQL:1999 does not support nesting
 Nesting can be done in a manner similar to aggregation, but using
the function set() in place of an aggregation operation, to create a
set
 To nest the flat-books relation on the attribute keyword:
select title, author, Publisher(pub_name, pub_branch) as
publisher,
set(keyword) as keyword-list
from flat-books
groupby title, author, publisher
 To nest on both authors and keywords:
select title, set(author) as author-list,
Publisher(pub_name, pub_branch) as publisher,
set(keyword) as keyword-list
from flat-books
groupby title, publisher

Nesting (Cont.)
 Another approach to creating nested
relations is to use subqueries in the
select clause.
select title,
( select author
from flat-books as M
where M.title=O.title) as author-
set,
Publisher(pub-name, pub-branch)
as publisher,
(select keyword
from flat-books as N

Functions and Procedures
 SQL:1999 supports functions and
procedures
 Functions/procedures can be written in
SQL itself, or in an external programming
language
 Functions are particularly useful with
specialized data types such as images and
geometric objects
 E.g. functions to check if polygons overlap,
or to compare images for similarity
 Some databases support table-valued
functions, which can return a relation as a
result

SQL Functions

 Define a function that, given a book
title, returns the count of the number
of authors (on the 4NF schema with
relations books4 and authors).
create function author-count(name
varchar(20))
returns integer
begin
declare a-count integer;
select count(author) into a-
count
from authors

SQL Methods
 Methods can be viewed as functions
associated with structured types
 They have an implicit first parameter
called self which is set to the structured-
type value on which the method is invoked
 The method code can refer to attributes of
the structured-type value using the self
variable
 E.g. self.a

SQL Functions andinstead be written as procedure:
 The author-count function could
Procedures
(cont.)
create procedure author-count-proc (in title varchar(20),
out a-count integer)
begin
select count(author) into a-count
from authors
where authors.title = title
end
 Procedures can be invoked either from an SQL procedure or from
embedded SQL, using the call statement.
 E.g. from an SQL procedure
declare a-count integer;
call author-count-proc(`Database systems Concepts‘, a-
count);
 SQL:1999 allows more than one function/procedure of the same
name (called name overloading), as long as the number of
arguments differ, or at least the types of the arguments differ

External Language
Functions/Procedures of
 SQL:1999 permits the use
functions and procedures written in
other languages such as C or C++
 Declaring external language
procedures and functions

create procedure author-count-
proc(in title varchar(20),

out count integer)
language C
external name‘

External Language Routines
 Benefits of external language
(Cont.)
functions/procedures:
 more efficient for many operations, and more
expressive power
 Drawbacks
 Code to implement function may need to be loaded into
database system and executed in the database system‘s
address space
 risk of accidental corruption of database structures
 security risk, allowing users access to unauthorized
data
 There are alternatives, which give good security at the
cost of potentially worse performance
 Direct execution in the database system‘s space is used
when efficiency is more important than security

Security with External Language
 To deal with security problems
Routines techniques
 Use sandbox
 that is use a safe language like Java, which cannot be
used to access/damage other parts of the database
code
 Or, run external language functions/procedures in a
separate process, with no access to the database
process‘ memory
 Parameters and results communicated via inter-
process communication
 Both have performance overheads
 Many database systems support both above
approaches as well as direct executing in
database system address space

Procedural Constructsprocedural constructs
 SQL:1999 supports a rich variety of
 Compound statement
 is of the form begin … end,
 may contain multiple SQL statements between begin and end.
 Local variables can be declared within a compound statements
 While and repeat statements
declare n integer default 0;
while n < 10 do
set n = n+1
end while

repeat
set n = n – 1
until n = 0
end repeat

Procedural Constructs (Cont.)

 For loop
 Permits iteration over all results of a query
 E.g. find total of all balances at the Perryridge branch

declare n integer default 0;
for r as
select balance from account
where branch-name = ‗Perryridge‘
do
set n = n + r.balance
end for


Procedural Constructs (cont.)
Conditional statements (if-then-else)
E.g. To find sum of balances for each of three categories of
accounts (with balance <1000, >=1000 and <5000, >= 5000)
if r.balance < 1000
then set l = l + r.balance
elseif r.balance < 5000
then set m = m + r.balance
else set h = h + r.balance
end if
 SQL:1999 also supports a case statement similar to C case
statement
 Signaling of exception conditions, and declaring handlers for
exceptions
declare out_of_stock condition
declare exit handler for out_of_stock
begin
…
.. signal out-of-stock
end
 The handler here is exit -- causes
enclosing begin..end to be exited

Comparison of O-O and O-R
 Summary of strengths of various
Databases
database systems:
 Relational systems
 simple data types, powerful query
languages, high protection.
 Persistent-programming-language-
based OODBs
 complex data types, integration with
programming language, high
performance.
 Object-relational systems
 complex data types, powerful query


Finding employees who work directly or indirectly for mgr
Procedure to find all
all employees of a

whom
manager
Relation manager(empname, mgrname)specifies who directly works for

 Result is stored in empl(name)
create procedure findEmp(in mgr char(10))
begin
create temporary table newemp(name char(10));
create temporary table temp(name char(10));
insert into newemp -- store all direct employees of mgr in
newemp
select empname
from manager
where mgrname = mgr

Finding all employees of a manager(cont.)
repeat
insert into empl -- add all
new employees found to empl
select name
from newemp;
insert into temp -- find all employees
of people already found
(select manager.empname
from newemp, manager
where newemp.empname =
manager.mgrname;
)
except ( -- but remove those
who were found earlier
select empname

A Partially Nested Version of the flat-books Relation

Introduction
 XML: Extensible Markup Language
 Defined by the WWW Consortium (W3C)
 Originally intended as a document
markup language not a database
language
 Documents have tags giving extra
information about sections of the document
 E.g. <title> XML </title> <slide>
Introduction …</slide>
 Derived from SGML (Standard Generalized
Markup Language), but simpler to use than
SGML

XML Introduction (Cont.)
 The ability to specify new tags, and to
create nested tag structures made XML a
great way to exchange data, not just
documents.
 Much of the use of XML has been in data
exchange applications, not as a replacement
for HTML
 Tags make data (relatively) self-
documenting
 E.g.
<bank>
<account>
<account-number> A-101

XML: Motivation
 Data interchange is critical in today‘s
networked world
 Examples:
 Banking: funds transfer
 Order processing (especially inter-company
orders)
 Scientific data
 Chemistry: ChemML, …
 Genetics: BSML (Bio-Sequence Markup
Language), …
 Paper flow of information between
organizations is being replaced by
electronic flow of information

XML Motivation (Cont.) based on
 Earlier generation formats were
plain text with line headers indicating
the meaning of fields
 Similar in concept to email headers
 Does not allow for nested structures, no
standard ―type‖ language
 Tied too closely to low level document
structure (lines, spaces, etc)
 Each XML based standard defines what
are valid elements, using
 XML type specification languages to specify
the syntax
 DTD (Document Type Descriptors)

Structure of XML Data

 Tag: label for a section of data
 Element: section of data beginning
with <tagname> and ending with
matching </tagname>
 Elements must be properly nested
 Proper nesting
 <account> … <balance> …. </balance>
</account>
 Improper nesting
 <account> … <balance> …. </account>
</balance>

Example of Nested Elements
<bank-1>
<customer>
<customer-name> Hayes </customer-name>
<customer-street> Main </customer-street>
<customer-city> Harrison </customer-city>
<account>
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
<account>
…
</account>
</customer>
.
.
</bank-1>

Motivation for Nesting
 Nesting of data is useful in data
transfer
 Example: elements representing
customer-id, customer name, and address
nested within an order element
 Nesting is not supported, or
discouraged, in relational databases
 With multiple orders, customer name and
address are stored redundantly
 normalization replaces nested structures in
each order by foreign key into table storing
customer name and address information
 Nesting is supported in object-relational

Structure of XML Data (Cont.)
 Mixture of text with sub-elements is
legal in XML.
 Example:
<account>
This account is seldom used any more.
<account-number> A-
102</account-number>
<branch-name> Perryridge</branch-
name>
<balance>400 </balance>
</account>
 Useful for document markup, but
discouraged for data representation

Attributes
 Elements can have attributes
 <account acct-type = ―checking‖ >
<account-number> A-102
</account-number>
<branch-name> Perryridge
</branch-name>
</account>

 Attributes are specified by
name=value pairs inside the starting
tag of an element
 An element may have several
attributes, but each attribute name can

Attributes Vs. Subelements

 Distinction between subelement and
attribute
 In the context of documents, attributes
are part of markup, while subelement
contents are part of the basic document
contents
 In the context of data representation, the
difference is unclear and may be
confusing
 Same information can be represented in two
ways
 <account account-number = ―A-101‖> ….

More on XML Syntax
 Elements without subelements or text
content can be abbreviated by ending
the start tag with a /> and deleting
the end tag
 <account number=―A-101‖
branch=―Perryridge‖ balance=―200 />
 To store string data that may contain
tags, without the tags being interpreted
as subelements, use CDATA as below
 <![CDATA[<account> … </account>]]>
 Here, <account> and </account> are treated as just
strings

Namespaces
 XML data has to be exchanged between
organizations
 Same tag name may have different
meaning in different organizations,
causing confusion on exchanged
documents
 Specifying a unique string as an
element name avoids confusion
 Better solution: use unique-
name:element-name
 Avoid using long unique names all over

XML Document Schema

 Database schemas constrain what
information can be stored, and the
data types of stored values
 XML documents are not required to
have an associated schema
 However, schemas are very important
for XML data exchange
 Otherwise, a site cannot automatically
interpret data received from another site
 Two mechanisms for specifying XML

Document Type Definition
(DTD)
 The type of an XML document can be
specified using a DTD
 DTD constraints structure of XML data
 What elements can occur
 What attributes can/must an element have
 What subelements can/must occur inside
each element, and how many times.
 DTD does not constrain data types
 All values represented as strings in XML
 DTD syntax

Element Specification in DTD

 Subelements can be specified as
 names of elements, or
 #PCDATA (parsed character data), i.e.,
character strings
 EMPTY (no subelements) or ANY (anything
can be a subelement)
 Example
<! ELEMENT depositor (customer-name account-
number)>
<! ELEMENT customer-name (#PCDATA)>
<! ELEMENT account-number (#PCDATA)>
 Subelement specification may have
regular expressions

Bank DTD

<!DOCTYPE bank [
<!ELEMENT bank ( ( account | customer | depositor)+)>
<!ELEMENT account (account-number branch-name
balance)>
<! ELEMENT customer(customer-name customer-street
customer-
city)>
<! ELEMENT depositor (customer-name account-number)>
<! ELEMENT account-number (#PCDATA)>
<! ELEMENT branch-name (#PCDATA)>
<! ELEMENT balance(#PCDATA)>
<! ELEMENT customer-name(#PCDATA)>
<! ELEMENT customer-street(#PCDATA)>
<! ELEMENT customer-city(#PCDATA)>
]>

Attribute Specification in DTD
 Attribute specification : for each attribute
 Name
 Type of attribute
 CDATA
 ID (identifier) or IDREF (ID reference) or IDREFS
(multiple IDREFs)
 more on this later
 Whether
 mandatory (#REQUIRED)
 has a default value (value),
 or neither (#IMPLIED)
 Examples
 <!ATTLIST account acct-type CDATA
―checking‖>

IDs and IDREFs

 An element can have at most one
attribute of type ID
 The ID attribute value of each element
in an XML document must be distinct
 Thus the ID attribute value is an object
identifier
 An attribute of type IDREF must
contain the ID value of an element in
the same document
 An attribute of type IDREFS contains a

Bank DTD with Attributes

 Bank DTD with ID and IDREF attribute
types.
<!DOCTYPE bank-2[
<!ELEMENT account (branch, balance)>
<!ATTLIST account
account-number ID # REQUIRED
owners IDREFS # REQUIRED>
<!ELEMENT customer(customer-name, customer-
street,
customer-city)>
<!ATTLIST customer
customer-id ID # REQUIRED
accounts IDREFS # REQUIRED>
… declarations for branch, balance, customer-
name,
customer-street and customer-

XML data with ID and IDREF
attributes
<bank-2>
<account account-number=―A-401‖
owners=―C100 C102‖>
<branch-name> Downtown </branch-name>
</account>
<customer customer-id=―C100‖
accounts=―A-401‖>
<customer-name>Joe </customer-name>
<customer-street> Monroe </customer-street>
<customer-city> Madison</customer-city>
</customer>
<customer customer-id=―C102‖
accounts=―A-401 A-402‖>
<customer-name> Mary </customer-name>
<customer-street> Erin </customer-street>
<customer-city> Newark </customer-city>
</customer>

Limitations of DTDs

 No typing of text elements and
attributes
 All values are strings, no integers, reals,
etc.
 Difficult to specify unordered sets of
subelements
 Order is usually irrelevant in databases
 (A | B)* allows specification of an
unordered set, but
 Cannot ensure that each of A and B occurs
only once

XML Schema

 XML Schema is a more sophisticated
schema language which addresses the
drawbacks of DTDs. Supports
 Typing of values
 E.g. integer, string, etc
 Also, constraints on min/max values
 User defined types
 Is itself specified in XML syntax, unlike
DTDs
 More standard representation, but verbose
 Is integrated with namespaces

XML Schema Version of Bank
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

DTD
<xsd:element name=―bank‖ type=―BankType‖/>
<xsd:element name=―account‖>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=―account-number‖ type=―xsd:string‖/>
<xsd:element name=―branch-name‖ type=―xsd:string‖/>
<xsd:element name=―balance‖
type=―xsd:decimal‖/>
</xsd:squence>
</xsd:complexType>
</xsd:element>
….. definitions of customer and depositor ….
<xsd:complexType name=―BankType‖>
<xsd:squence>
<xsd:element ref=―account‖ minOccurs=―0‖
maxOccurs=―unbounded‖/>
<xsd:element ref=―customer‖ minOccurs=―0‖
<xsd:element ref=―depositor‖ minOccurs=―0‖
</xsd:sequence>
</xsd:complexType>

Querying and Transforming
XML Data
 Translation of information from one
XML schema to another
 Querying on XML data
 Above two are closely related, and
handled by the same tools
 Standard XML querying/translation
languages
 XPath
 Simple language consisting of path
expressions
 XSLT

Tree Model of XML Data
 Query and transformation languages are
based on a tree model of XML data
 An XML document is modeled as a tree,
with nodes corresponding to elements
and attributes
 Element nodes have children nodes, which
can be attributes or subelements
 Text in an element is modeled as a text node
child of the element
 Children of a node are ordered according to
their order in the XML document
 Element and attribute nodes (except for the
root node) have a single parent, which is an
element node

XPath

 XPath is used to address (select) parts
of documents using
path expressions
 A path expression is a sequence of
steps separated by ―/‖
 Think of file names in a directory
hierarchy
 Result of path expression: set of
values that along with their containing
elements/attributes match the

XPath (Cont.)
 The initial ―/‖ denotes root of the
document (above the top-level tag)
 Path expressions are evaluated left to
right
 Each step operates on the set of instances
produced by the previous step
 Selection predicates may follow any step
in a path, in [ ]
 E.g. /bank-2/account[balance > 400]
 returns account elements with a balance value
greater than 400
 /bank-2/account[balance] returns account

Functions in XPath

 XPath provides several functions
 The function count() at the end of a path
counts the number of elements in the set
generated by the path
 E.g. /bank-2/account[customer/count() >
2]
 Returns accounts with > 2 customers
 Also function for testing position (1, 2, ..)
of node w.r.t. siblings
 Boolean connectives and and or and
function not() can be used in
predicates


More ―|‖ used Features union
Operator
XPath to implement
 E.g. /bank-2/account/id(@owner) | /bank-
2/loan/id(@borrower)
 gives customers with either accounts or loans
 However, ―|‖ cannot be nested inside other
operators.
 ―//‖ can be used to skip multiple levels of
nodes
 E.g. /bank-2//customer-name
 finds any customer-name element anywhere
under the /bank-2 element, regardless of the
element in which it is contained.
 A step in the path can go to:

XSLT

 A stylesheet stores formatting options
for a document, usually separately
from document
 E.g. HTML style sheet may specify font
colors and sizes for headings, etc.
 The XML Stylesheet Language (XSL)
was originally designed for generating
HTML from XML
 XSLT is a general-purpose
transformation language

XSLT Templates
 Example of XSLT template with match
and select part
<xsl:template match=―/bank-
2/customer‖>
<xsl:value-of select=―customer-name‖/>
</xsl:template>
<xsl:template match=―*‖/>
 The match attribute of xsl:template
specifies a pattern in XPath
 Elements in the XML document matching
the pattern are processed by the actions
within the xsl:template element
 xsl:value-of selects (outputs) specified values
(here, customer-name)
 For elements that do not match any

XSLT Templates (Cont.)

 If an element matches several
templates, only one is used
 Which one depends on a complex priority
scheme/user-defined priorities
 We assume only one template matches
any element

Creating XML Output

 Any text or tag in the XSL stylesheet
that is not in the xsl namespace is
output as is
 E.g. to wrap results in new XML
elements.
<xsl:template match=―/bank-
2/customer‖>
<customer>
<xsl:value-of select=―customer-name‖/>
</customer>
</xsl;template>
<xsl:template match=―*‖/>
 Example output:

Creating XML Output (Cont.)
 Note: Cannot directly insert a xsl:value-
of tag inside another tag
 E.g. cannot create an attribute for
<customer> in the previous example by
directly using xsl:value-of
 XSLT provides a construct xsl:attribute to
handle this situation
 xsl:attribute adds attribute to the preceding
element
 E.g. <customer>
<xsl:attribute name=―customer-id‖>
<xsl:value-of select = ―customer-
id‖/>
</xsl:attribute>
</customer>
results in output of the form

Structural Recursion
 Action of a template can be to recursively apply templates to the
contents of a matched element
 E.g.
<xsl:template match=“/bank”>
<customers>
<xsl:template apply-templates/>
</customers >
</xsl:template>
<xsl:template match=“/customer”>
<customer>
<xsl:value-of select=“customer-name”/>
</customer>
</xsl:template>
<xsl:template match=“*”/>
 Example output:
<customers>
<customer> John </customer>
<customer> Mary </customer>
</customers>

Joins in XSLT to be looked up (indexed) by values of
 XSLT keys allow elements
subelements or attributes
 Keys must be declared (with a name) and, the key() function can then be used
for lookup. E.g.
 <xsl:key name=“acctno” match=“account”
use=“account-number”/>
 <xsl:value-of select=key(“acctno”, “A-101”)
 Keys permit (some) joins to be expressed in XSLT
<xsl:key name=“acctno” match=“account” use=“account-number”/>
<xsl:key name=“custno” match=“customer” use=“customer-name”/>
<xsl:template match=“depositor”>
<cust-acct>
<xsl:value-of select=key(“custno”, “customer-name”)/>
<xsl:value-of select=key(“acctno”, “account-number”)/>
</cust-acct>
</xsl:template>
<xsl:template match=“*”/>

Sorting in XSLT
 Using an xsl:sort directive inside a
template causes all elements matching
the template to be sorted
 Sorting is done before applying other
templates
 E.g.
<xsl:template match=―/bank‖>
<xsl:apply-templates
select=―customer‖>
<xsl:sort select=―customer-
name‖/>
</xsl:apply-templates>
</xsl:template>

XQuery

 XQuery is a general purpose query language for XML data
 Currently being standardized by the World Wide Web
Consortium (W3C)
 The textbook description is based on a
March 2001 draft of the standard. The
final version may differ, but major
features likely to stay unchanged.
 Alpha version of XQuery engine available free from Microsoft
 XQuery is derived from the Quilt query language, which itself
borrows from SQL, XQL and XML-QL
 XQuery uses a
for … let … where .. result …
syntax
for  SQL from
where  SQL where

FLWR Syntax in XQuery
 For clause uses XPath expressions, and
variable in for clause ranges over values
in the set returned by XPath
 Simple FLWR expression in XQuery
 find all accounts with balance > 400, with each result
enclosed in an <account-number> .. </account-number>
tag
for $x in /bank-2/account
let $acctno := $x/@account-number
where $x/balance > 400
return <account-number> $acctno </account-
number>
 Let clause not really needed in this
query, and selection can be done In

Path Expressions and Functions

 Path expressions are used to bind
variables in the for clause, but can
also be used in other places
 E.g. path expressions can be used in let
clause, to bind variables to results of path
expressions
 The function distinct( ) can be used to
removed duplicates in path
expression results
 The function document(name) returns
root of named document

 Joinsare specified in a manner very
Joins
similar to SQL
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account-number =
$d/account-number
and $c/customer-name =
$d/customer-name
return <cust-acct> $c $a </cust-
acct>
 The same query can be expressed with
the selections specified as XPath
selections:

Changing Nesting Structure
 The following query converts data from the
flat structure for bank information into the
nested structure used in bank-1
<bank-1>
for $c in /bank/customer
return
<customer>
$c/*
for $d in /bank/depositor[customer-name = $c/customer-
name],
$a in /bank/account[account-number=$d/account-
number]
return $a
</customer>
</bank-1>
 $c/* denotes all the children of the node to
which $c is bound, without the enclosing
top-level tag

XQuery Path Expressions

 $c/text() gives text content of an
element without any
subelements/tags
 XQuery path expressions support the
―–>‖ operator for dereferencing IDREFs
 Equivalent to the id( ) function of XPath,
but simpler to use
 Can be applied to a set of IDREFs to get a
set of results
 June 2001 version of standard has
changed ―–>‖ to ―=>‖

Sorting in XQuery
 Sortby clause can be used at the end of
any expression. E.g. to return customers
sorted by name
return <customer> $c/* </customer>
sortby(name)
 Can sort at multiple levels of nesting (sort
by customer-name, and by account-
number within each customer)
<bank-1>
return

Functions and Other XQuery
 Features functions with the type
User defined
system of XMLSchema
function balances(xsd:string $c)
returns list(xsd:numeric) {
for $d in /bank/depositor[customer-
name = $c],
$a in /bank/account[account-
number=$d/account-number]
return $a/balance
}
 Types are optional for function

Application Program Interface
 There are two standard application
program interfaces to XML data:
 SAX (Simple API for XML)
 Based on parser model, user provides event handlers
for parsing events
 E.g. start of element, end of element
 Not suitable for database applications
 DOM (Document Object Model)
 XML data is parsed into a tree representation
 Variety of functions provided for traversing the DOM
tree
 E.g.: Java DOM API provides Node class with methods
getParentNode( ), getFirstChild( ),
getNextSibling( )
getAttribute( ), getData( ) (for text node)
getElementsByTagName( ), …
 Also provides functions for updating DOM tree

Storage of XML Data
 XML data can be stored in
 Non-relational data stores
 Flat files
 Natural for storing XML
 But has all problems discussed in Chapter 1 (no
concurrency, no recovery, …)
 XML database
 Database built specifically for storing XML data,
supporting DOM model and declarative
querying
 Currently no commercial-grade systems
 Relational databases
 Data must be translated into relational form
 Advantage: mature database systems

Storage of XML in Relational
Databases
 Alternatives:
 String Representation
 Tree Representation
 Map to relations

String Representation
 Store each top level element as a
string field of a tuple in a relational
database
 Use a single relation to store all elements,
or
 Use a separate relation for each top-level
element type
 E.g. account, customer, depositor relations
 Each with a string-valued attribute to store the
element
 Indexing:
 Store values of subelements/attributes to
be indexed as extra fields of the relation,

String Representation (Cont.)

 Benefits:
 Can store any XML data even without DTD
 As long as there are many top-level
elements in a document, strings are small
compared to full document
 Allows fast access to individual elements.
 Drawback: Need to parse strings to
access values inside the elements
 Parsing is slow.

Tree Representation
 Tree representation: model XML data
as tree and store using relations
nodes(id, type, label, (id:1)
bank value)

child (child-id, parent-id) (id: 5)
customer (id:2) account

customer-name account-number
(id: 3) (id: 7)

 Each element/attribute is given a
unique identifier
 Type indicates element/attribute

Tree Representation (Cont.)

 Benefit: Can store any XML data, even
without DTD
 Drawbacks:
 Data is broken up into too many pieces,
increasing space overheads
 Even simple queries require a large
number of joins, which can be slow

Mapping XML Data to Relations
 Map to relations
 If DTD of document is known, can map
data to relations
 A relation is created for each element type
 Elements (of type #PCDATA), and attributes
are mapped to attributes of relations
 More details on next slide …
 Benefits:
 Efficient storage
 Can translate XML queries into SQL,
execute efficiently, and then translate SQL
results back to XML
 Drawbacks: need to know DTD,

 Relation created for each element type
(Cont.)
contains
 An id attribute to store a unique id for each
element
 A relation attribute corresponding to each
element attribute
 A parent-id attribute to keep track of
parent element
 As in the tree representation
 Position information (ith child) can be store
too
 All subelements that occur only once
can become relation attributes

 E.g. For bank-1 DTD with account elements
(Cont.)
nested within customer elements, create
relations
 customer(id, parent-id, customer-name,
customer-stret, customer-city)
 parent-id can be dropped here since parent is the
sole root element
 All other attributes were subelements of type
#PCDATA, and occur only once
 account (id, parent-id, account-number, branch-
name, balance)
 parent-id keeps track of which customer an
account occurs under
 Same account may be represented many times with
different parents

Chapter 11: Storage and File
Structure of Physical Storage Media
 Overview
 Magnetic Disks
 RAID
 Tertiary Storage
 Storage Access
 File Organization
 Organization of Records in Files
3. Data-Dictionary Storage
 DATA & File Structure-Disks-RAID-File Organization-Indexing
STORAGE AND INDEXING 9
Storage
 Storage Structures for Object-Oriented
&Hashing-B+ TREE-B Tree-Static Hashing-Dynamic Hashing-Multiple
Key Access
Databases

Classification of Physical Storage
Media with which data can be
 Speed
accessed
 Cost per unit of data
 Reliability
 data loss on power failure or system
crash
 physical failure of the storage device
 Can differentiate storage into:
 volatile storage: loses contents when
power is switched off
 non-volatile storage:
 Contents persist even when power is
switched off.
 Includes secondary and tertiary

Physical Storage Media

 Cache – fastest and most costly form
of storage; volatile; managed by the
computer system hardware.
 Main memory:
 fast access (10s to 100s of nanoseconds;
1 nanosecond = 10–9 seconds)
 generally too small (or too expensive) to
store the entire database
 capacities of up to a few Gigabytes widely
used currently
 Capacities have gone up and per-byte costs

Physical Storage Media (Cont.)

 Flash memory
 Data survives power failure
 Data can be written at a location only
once, but location can be erased and
written to again
 Can support only a limited number of
write/erase cycles.
 Erasing of memory has to be done to an
entire bank of memory
 Reads are roughly as fast as main memory
 But writes are slow (few microseconds),

 Magnetic-disk
 Data is stored on spinning disk, and
read/written magnetically
 Primary medium for the long-term storage
of data; typically stores entire database.
 Data must be moved from disk to main
memory for access, and written back for
storage
 Much slower access than main memory (more
on this later)
 direct-access – possible to read data on
disk in any order, unlike magnetic tape
 Hard disks vs floppy disks
 Capacities range up to roughly 100 GB
currently

 Optical storage
 non-volatile, data is read optically from a
spinning disk using a laser
 CD-ROM (640 MB) and DVD (4.7 to 17 GB)
most popular forms
 Write-one, read-many (WORM) optical
disks used for archival storage (CD-R and
DVD-R)
 Multiple write versions also available (CD-
RW, DVD-RW, and DVD-RAM)
 Reads and writes are slower than with
magnetic disk
 Juke-box systems, with large numbers of


 Tape storage
 non-volatile, used primarily for backup (to
recover from disk failure), and for archival
data
 sequential-access – much slower than
disk
 very high capacity (40 to 300 GB tapes
available)
 tape can be removed from drive
storage costs much cheaper than disk, but
drives are expensive

Storage Hierarchy (Cont.)

 primary storage: Fastest media but
volatile (cache, main memory).
 secondary storage: next level in
hierarchy, non-volatile, moderately
fast access time
 also called on-line storage
 E.g. flash memory, magnetic disks
 tertiary storage: lowest level in
hierarchy, non-volatile, slow access
time

Magnetic Hard Disk Mechanism

NOTE: Diagram is schematic, and simplifies the structure of actual disk drives

Magnetic Disks
 Read-write head
 Positioned very close to the platter surface
(almost touching it)
 Reads or writes magnetically encoded
information.
 Surface of platter divided into circular tracks
 Over 16,000 tracks per platter on typical
hard disks
 Each track is divided into sectors.
 A sector is the smallest unit of data that
can be read or written.
 Sector size typically 512 bytes
 Typical sectors per track: 200 (on inner
tracks) to 400 (on outer tracks)
 To read/write a sector

Magnetic Disks (Cont.)
 Earlier generation disks were
susceptible to head-crashes
 Surface of earlier generation disks had
metal-oxide coatings which would
disintegrate on head crash and damage all
data on disk
 Current generation disks are less
susceptible to such disastrous failures,
although individual sectors may get
corrupted
 Disk controller – interfaces between the
computer system and the disk drive
hardware.
 accepts high-level commands to read or

Disk Subsystem

 Multiple disks connected to a computer
system through a controller
 Controllers functionality (checksum, bad
sector remapping) often carried out by
individual disks; reduces load on controller
 Disk interface standards families
 ATA (AT adaptor) range of standards

 Performance Measures of Disksis
Access time – the time it takes from when a read or write request
issued to when data transfer begins. Consists of:
 Seek time – time it takes to reposition the
arm over the correct track.
 Average seek time is 1/2 the worst case seek
time.
 Would be 1/3 if all tracks had the same number
of sectors, and we ignore the time to start and
stop arm movement
 4 to 10 milliseconds on typical disks
 Rotational latency – time it takes for the
sector to be accessed to appear under the
head.
 Average latency is 1/2 of the worst case
latency.
 4 to 11 milliseconds on typical disks (5400 to
15000 r.p.m.)

Performance Measures (Cont.)

 Mean time to failure (MTTF) – the
average time the disk is expected to
run continuously without any failure.
 Typically 3 to 5 years
 Probability of failure of new disks is quite
low, corresponding to a
―theoretical MTTF‖ of 30,000 to 1,200,000
hours for a new disk
 E.g., an MTTF of 1,200,000 hours for a new
disk means that given 1000 relatively new
disks, on an average one will fail every 1200
hours

Optimization of Disk-Block
 Block – a contiguous sequence of
Accessfrom a single track
sectors
 data is transferred between disk and
main memory in blocks
 sizes range from 512 bytes to several
kilobytes
 Smaller blocks: more transfers from disk
 Larger blocks: more space wasted due to
partially filled blocks
 Typical block sizes today range from 4 to
16 kilobytes
 Disk-arm-scheduling algorithms
order pending accesses to tracks so

Optimization of Disk Block
Access (Cont.)
 File organization – optimize block
access time by organizing the blocks
to correspond to how data will be
accessed
 E.g. Store related information on the
same or nearby cylinders.
 Files may get fragmented over time
 E.g. if data is inserted to/deleted from the
file
 Or free blocks on disk are scattered, and
newly created file has its blocks scattered


Optimization of Disk Block
Nonvolatile write buffers speed up disk writes by writing blocks to

Access (Cont.)battery backed up RAM
a non-volatile RAM buffer immediately
 Non-volatile RAM:
or flash memory
 Even if power fails, the data is safe and will be
written to disk when power returns
 Controller then writes to disk whenever the
disk has no other requests or request has
been pending for some time
 Database operations that require data to be
safely stored before continuing can continue
without waiting for data to be written to disk
 Writes can be reordered to minimize disk
arm movement
 Log disk – a disk devoted to writing a sequential log of block

RAID
 RAID: Redundant Arrays of Independent Disks
 disk organization techniques that manage a
large numbers of disks, providing a view of
a single disk of
 high capacity and high speed by using
multiple disks in parallel, and
 high reliability by storing data redundantly, so
that data can be recovered even if a disk fails
 The chance that some disk out of a set of N disks will fail is much
higher than the chance that a specific single disk will fail.
 E.g., a system with 100 disks, each with
MTTF of 100,000 hours (approx. 11 years),
will have a system MTTF of 1000 hours
(approx. 41 days)

Improvement of Reliability via
Redundancy
 Redundancy – store extra information
that can be used to rebuild
information lost in a disk failure
 E.g., Mirroring (or shadowing)
 Duplicate every disk. Logical disk
consists of two physical disks.
 Every write is carried out on both disks
 Reads can take place from either disk
 If one disk in a pair fails, data still
available in the other
 Data loss would occur only if a disk fails,
and its mirror disk also fails before the
system is repaired
 Probability of combined event is very small
 Except for dependent failure modes such as fire

Improvement in Performance via
Parallelism goals of parallelism in a
 Two main
disk system:
1. Load balance multiple small accesses to
increase throughput
2. Parallelize large accesses to reduce
response time.
 Improve transfer rate by striping data
across multiple disks.
 Bit-level striping – split the bits of
each byte across multiple disks
 In an array of eight disks, write bit i of
each byte to disk i.
 Each access can read data at eight times
the rate of a single disk.

RAID Levels
 Schemes to provide redundancy at
lower cost by using disk striping
combined with parity bits
RAID Level 0: Block striping; non-redundant.
 Different RAID organizations, or RAID
 Used in high-performance applications where data lost is not critical.
levels, have differing cost, performance
RAID Level 1: Mirrored disks with block striping
and reliability characteristics
 Offers best write performance.
 Popular for applications such as storing log files in a database system.

RAID Levels (Cont.)
 RAID Level 2: Memory-Style Error-
Correcting-Codes (ECC) with bit striping.
 RAID Level 3: Bit-Interleaved Parity
 a single parity bit is enough for error
correction, not just detection, since we know
which disk has failed
 When writing data, corresponding parity bits
must also be computed and written to a parity
bit disk
 To recover data in a damaged disk, compute
XOR of bits from other disks (including parity
bit disk)

RAID Levels (Cont.)
 RAID Level 3 (Cont.)
 Faster data transfer than with a single
disk, but fewer I/Os per second since
every disk has to participate in every I/O.
 Subsumes Level 2 (provides all its benefits,
at lower cost).
 RAID Level 4: Block-Interleaved Parity;
uses block-level striping, and keeps a
parity block on a separate disk for
corresponding blocks from N other
disks.
 When writing data block, corresponding
block of parity bits must also be computed

RAID Levels (Cont.)

 Provides higher I/O rates for independent
block reads than Level 3
 block read goes to a single disk, so blocks
stored on different disks can be read in
parallel
 Provides high transfer rates for reads of
multiple blocks than no-striping
 Before writing a block, parity data must be
computed
 Can be done by using old parity block, old
value of current block and new value of

RAID Levels (Cont.)
 RAID Level 5: Block-Interleaved
Distributed Parity; partitions data
and parity among all N + 1 disks,
rather than storing data in N disks
and parity in 1 disk.
 E.g., with 5 disks, parity block for nth
set of blocks is stored on disk (n mod 5)
+ 1, with the data blocks stored on the
other 4 disks.

RAID Levels (Cont.)

 Higher I/O rates than Level 4.
 Block writes occur in parallel if the blocks
and their parity blocks are on different
disks.
 Subsumes Level 4: provides same
benefits, but avoids bottleneck of parity
disk.
 RAID Level 6: P+Q Redundancy
scheme; similar to Level 5, but stores
extra redundant information to guard

Choice of RAID Level
 Factors in choosing RAID level
 Monetary cost
 Performance: Number of I/O operations
per second, and bandwidth during normal
operation
 Performance during failure
 Performance during rebuild of failed disk
 Including time taken to rebuild failed disk
 RAID 0 is used only when data safety is not important
 E.g. data can be recovered quickly from
other sources
 Level 2 and 4 never used since they are subsumed by 3 and 5
 Level 3 is not used anymore since bit-striping forces single

Choice of RAID Level (Cont.)
 Level 1 provides much better write
performance than level 5
 Level 5 requires at least 2 block reads and 2
block writes to write a single block, whereas
Level 1 only requires 2 block writes
 Level 1 preferred for high update
environments such as log disks
 Level 1 had higher storage cost than
level 5
 disk drive capacities increasing rapidly
(50%/year) whereas disk access times have
decreased much less (x 3 in 10 years)
 I/O requirements have increased

Hardware Issues

 Software RAID: RAID implementations
done entirely in software, with no
special hardware support
 Hardware RAID: RAID
implementations with special
hardware
 Use non-volatile RAM to record writes that
are being executed
 Beware: power failure during write can
result in corrupted disk
 E.g. failure after writing one block but

Hardware Issues (Cont.)

 Hot swapping: replacement of disk
while system is running, without
power down
 Supported by some hardware RAID
systems,
 reduces time to recovery, and improves
availability greatly
 Many systems maintain spare disks
which are kept online, and used as
replacements for failed disks
immediately on detection of failure

Optical Disks
 Compact disk-read only memory (CD-
ROM)
 Disks can be loaded into or removed from a
drive
 High storage capacity (640 MB per disk)
 High seek times or about 100 msec (optical
read head is heavier and slower)
 Higher latency (3000 RPM) and lower data-
transfer rates (3-6 MB/s) compared to
magnetic disks
 Digital Video Disk (DVD)
 DVD-5 holds 4.7 GB , and DVD-9 holds 8.5
GB
 DVD-10 and DVD-18 are double sided

Magnetic Tapes
 Hold large volumes of data and provide high transfer rates
 Few GB for DAT (Digital Audio Tape)
format, 10-40 GB with DLT (Digital Linear
Tape) format, 100 GB+ with Ultrium
format, and 330 GB with Ampex helical
scan format
 Transfer rates from few to 10s of MB/s
 Currently the cheapest storage medium
 Tapes are cheap, but cost of drives is very
high
 Very slow access time in comparison to magnetic disks and
optical disks
 limited to sequential access.
 Some formats (Accelis) provide faster seek

Storage Access

 A database file is partitioned into
fixed-length storage units called
blocks. Blocks are units of both
storage allocation and data transfer.
 Database system seeks to minimize
the number of block transfers
between the disk and memory. We
can reduce the number of disk
accesses by keeping as many blocks
as possible in main memory.

Buffer Manager
 Programs call on the buffer manager
when they need a block from disk.
1. If the block is already in the buffer, the
requesting program is given the address
of the block in main memory
2. If the block is not in the buffer,
1. the buffer manager allocates space in the
buffer for the block, replacing (throwing
out) some other block, if required, to make
space for the new block.
2. The block that is thrown out is written back
to disk only if it was modified since the
most recent time that it was written
to/fetched from the disk.

Buffer-Replacement Policies
 Most operating systems replace the
block least recently used (LRU strategy)
 Idea behind LRU – use past pattern of
block references as a predictor of future
references
 Queries have well-defined access
patterns (such as sequential scans), and
a database system can use the
information in a user‘s query to predict
future references
 LRU can be a bad strategy for certain access
patterns involving repeated scans of data

Buffer-Replacementblock that is
 Pinned block – memory
Policies
(Cont.)
not allowed to be written back to
disk.
 Toss-immediate strategy – frees the
space occupied by a block as soon
as the final tuple of that block has
been processed
 Most recently used (MRU) strategy –
system must pin the block currently
being processed. After the final
tuple of that block has been
processed, the block is unpinned,
and it becomes the most recently
used block.

File Organization

 The database is stored as a collection
of files. Each file is a sequence of
records. A record is a sequence of
fields.
 One approach:
 assume record size is fixed
 each file has records of one particular
type only
 different files are used for different
relations
This case is easiest to implement; will


Fixed-Length Records
Simple approach:
 Store record i starting from byte n (i – 1),
where n is the size of each record.
 Record access is simple but records may
cross blocks
 Modification: do not allow records to cross
block boundaries
 Deletion of record I:
alternatives:
 move records i + 1, . . ., n
to i, . . . , n – 1
 move record n to i
 do not move records, but

 Free Listsof the first deleted record in the file header.
Store the address
 Use this first record to store the address of the second deleted
record, and so on
 Can think of these stored addresses as pointers since they
―point‖ to the location of a record.
 More space efficient representation: reuse space for normal
attributes of free records to store pointers. (No pointers stored
in in-use records.)

Variable-Length Records
 Variable-length records arise in
database systems in several ways:
 Storage of multiple record types in a
file.
 Record types that allow variable
lengths for one or more fields.
 Record types that allow repeating fields
(used in some older data models).
 Byte string representation
 Attach an end-of-record ( ) control
character to the end of each record
 Difficulty with deletion
 Difficulty with growth

Variable-Length Records: Slotted Page Structure

 Slotted page header contains:
 number of record entries
 end of free space in the block
 location and size of each record
 Records can be moved around within
a page to keep them contiguous with
no empty space between them; entry
in the header must be updated.

Variable-Length Records (Cont.)
 Fixed-length representation:
 reserved space
 pointers
 Reserved space – can use fixed-
length records of a known
maximum length; unused space in
shorter records filled with a null or
end-of-record symbol.

Pointer Method

 Pointer method
 A variable-length record is represented
by a list of fixed-length records,
chained together via pointers.
 Can be used even if the maximum
record length is not known

Pointer Method (Cont.)
 Disadvantage to pointer
structure; space is wasted in all
records except the first in a a
chain.
 Solution is to allow two kinds of
block in file:
 Anchor block – contains the first
records of chain
 Overflow block – contains records
other than those that are the first
records of chairs.

Organization of Records in Files

 Heap – a record can be placed
anywhere in the file where there is
space
 Sequential – store records in
sequential order, based on the value
of the search key of each record
 Hashing – a hash function computed
on some attribute of each record; the
result specifies in which block of the
file the record should be placed

Sequential File Organization
 Suitable for applications that
require sequential processing of
the entire file
 The records in the file are
ordered by a search-key

Sequential File Organization
 Deletion – use pointer chains

(Cont.) –locate the position where the
Insertion
record is to be inserted
 if there is free space insert there
 if no free space, insert the record in an
overflow block
 In either case, pointer chain must be updated
 Need to reorganize the file
from time to time to restore
sequential order

Clustering File Organization
 Simple file structure stores each
relation in a separate file
 Can instead store several relations
in one file using a clustering file
organization
 E.g., clustering organization of
customer and depositor:

 good for queries involving depositor customer, and for queries
involving one single customer and his accounts
 bad for queries involving only customer
 results in variable size records

Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata:
that is, data about data, such as
 Information about relations
 names of relations
 names and types of attributes of each
relation
 names and definitions of views
 integrity constraints
 User and accounting information, including passwords
 Statistical and descriptive data
 number of tuples in each relation
 Physical file organization information
 How relation is stored
(sequential/hash/…)
 Physical location of relation

Data Dictionary Storage (Cont.)
 Catalog structure: can use either
 specialized data structures designed for
efficient access
 a set of relations, with existing system
features used to ensure efficient access
Relation-metadata = alternative is usually preferred
The latter (relation-name, number-of-attributes,
 A possible catalog representation:
storage-organization, location)
Attribute-metadata = (attribute-name, relation-name, domain-type,
position, length)
User-metadata = (user-name, encrypted-password, group)
Index-metadata = (index-name, relation-name, index-type,
index-attributes)
View-metadata = (view-name, definition)

Mapping of Objects to Files
 Mapping objects to files is similar to
mapping tuples to files in a relational
system; object data can be stored
using file structures.
 Objects in O-O databases may lack
uniformity and may be very large; such
objects have to managed differently
from records in a relational system.
 Set fields with a small number of elements
may be implemented using data structures
such as linked lists.
 Set fields with a larger number of elements

Mapping of Objects to Files
 (Cont.)are identified by an object
Objects
identifier (OID); the storage system
needs a mechanism to locate an object
given its OID (this action is called
dereferencing).
 logical identifiers do not directly specify
an object‘s physical location; must
maintain an index that maps an OID to the
object‘s actual location.
 physical identifiers encode the location of
the object so the object can be found
directly. Physical OIDs typically have the
following parts:
1. a volume or file identifier

ManagementmayPersistent
 Physical OIDs
of be a unique
Pointers This identifier is stored
identifier.
in the object also and is used to
detect references via dangling
pointers.

Management of Persistent
 Implement persistent pointers using
Pointers (Cont.)
OIDs; persistent pointers are
substantially longer than are in-
memory pointers
 Pointer swizzling cuts down on cost
of locating persistent objects already
in-memory.
 Software swizzling (swizzling on
pointer deference)
 When a persistent pointer is first
dereferenced, the pointer is swizzled
(replaced by an in-memory pointer) after
the object is located in memory.

Hardware Swizzling
 With hardware swizzling,
persistent pointers in objects need
the same amount of space as in-
memory pointers — extra storage
external to the object is used to
store rest of pointer information.
 Uses virtual memory translation
mechanism to efficiently and
transparently convert between
persistent pointers and in-memory
pointers.
 All persistent pointers in a page

Hardware Swizzling

 Persistent pointer is conceptually split
into two parts: a page identifier, and
an offset within the page.
 The page identifier in a pointer is a short
indirect pointer: Each page has a
translation table that provides a mapping
from the short page identifiers to full
database page identifiers.
 Translation table for a page is small (at
most 1024 pointers in a 4096 byte page
with 4 byte pointer)

Hardware Swizzling (Cont.)

 Page image before swizzling
(page located on disk)

 When system loads a page into memory
the persistent pointers in the page are
swizzled as described below
1. Persistent pointers in each object in the page
are located using object type information
2. For each persistent pointer (pi, oi) find its
full page ID Pi
1. If Pi does not already have a virtual memory
page allocated to it, allocate a virtual memory
page to Pi and read-protect the page
 Note: there need not be any physical space
(whether in memory or on disk swap-space)
allocated for the virtual memory page at this
point. Space can be allocated later if (and

 When an in-memory pointer is
dereferenced, if the operating
system detects the page it points to
has not yet been allocated storage,
or is read-protected, a
segmentation violation occurs.
 The mmap() call in Unix is used to
specify a function to be invoked on
segmentation violation
 The function does the following
when it is invoked
1. Allocate storage (swap-space) for the
page containing the referenced


Page image after swizzling
 Page with short page identifier
2395 was allocated address 5001.
Observe change in pointers and
translation table.

 After swizzling, all short page
identifiers point to virtual memory
addresses allocated for the
corresponding pages
 functions accessing the objects are not even
aware that it has persistent pointers, and do
not need to be changed in any way!
 can reuse existing code and libraries that
use in-memory pointers
 After this, the pointer dereference that
triggered the swizzling can continue
 Optimizations:
 If all pages are allocated the same address
as in the short page identifier, no changes

Disk versus Memory Structure of
Objects
 The format in which objects are stored
in memory may be different from the
formal in which they are stored on disk
in the database. Reasons are:
 software swizzling – structure of persistent
and in-memory pointers are different
 database accessible from different
machines, with different data
representations
 Make the physical representation of
objects in the database independent of the
machine and the compiler.
 Can transparently convert from disk

Large Objects

 Large objects : binary large objects
(blobs) and character large objects
(clobs)
 Examples include:
 text documents
 graphical data such as images and
computer aided designs audio and video
data
 Large objects may need to be stored
in a contiguous sequence of bytes
when brought into memory.

Modifying Large Objects
 If the application requires insert/delete
of bytes from specified regions of an
object:
 B+-tree file organization (described later in
Chapter 12) can be modified to represent
large objects
 Each leaf page of the tree stores between
half and 1 page worth of data from the
object
 Special-purpose application programs
outside the database are used to
manipulate large objects:

File Containing account
Records

File of Figure 11.6, with Record 2
Deleted and All Records Moved

File of Figure 11.6, With Record 2
deleted and Final Record Moved

Byte-String Representation of
Variable-Length Records

Clustering File Structure With
Pointer Chains

Chapter 12: Indexing and
Hashing
 Basic Concepts
 Ordered Indices
 B+-Tree Index Files
 B-Tree Index Files
 Static Hashing
 Dynamic Hashing
Organization-Indexing &Hashing-B+ TREE-B Tree-Static Hashing-Dynamic Hashing-Multiple

 Comparison of Ordered Indexing and
Key Access

Hashing
 Index Definition in SQL

Basic Concepts
 Indexing mechanisms used to speed
up access to desired data.
 E.g., author catalog in library
 Search Key - attribute to set of
attributes search-keyto pointer up records in
used look
a file.
 An index file consists of records
(called index entries) of the form

 Index files are typically much smaller
than the original file

Index Evaluation Metrics

 Access types supported efficiently.
E.g.,
 records with a specified value in the
attribute
 or records with an attribute value falling
in a specified range of values.
 Access time
 Insertion time
 Deletion time
 Space overhead

Ordered Indices
Indexing techniques evaluated on basis of:

 In an ordered index, index entries are
stored sorted on the search key value.
E.g., author catalog in library.
 Primary index: in a sequentially
ordered file, the index whose search
key specifies the sequential order of
the file.
 Also called clustering index
 The search key of a primary index is
usually but not necessarily the primary
key.
 Secondary index: an index whose

Dense Index Files
 Dense index — Index record appears
for every search-key value in the file.

Sparse Index Files

 Sparse Index: contains index records
for only some search-key values.
 Applicable when records are sequentially
ordered on search-key
 To locate a record with search-key
value K we:
 Find index record with largest search-key
value < K
 Search file sequentially starting at the
record to which the index record points
 Less space and less maintenance

Multilevel Index

 If primary index does not fit in
memory, access becomes expensive.
 To reduce number of disk accesses to
index records, treat primary index
kept on disk as a sequential file and
construct a sparse index on it.
 outer index – a sparse index of primary
index
 inner index – the primary index file
 If even outer index is too large to fit in

Index Update: Deletion

 If deleted record was the only record
in the file with its particular search-
key value, the search-key is deleted
from the index also.
 Single-level index deletion:
 Dense indices – deletion of search-key is
similar to file record deletion.
 Sparse indices – if an entry for the search
key exists in the index, it is deleted by
replacing the entry in the index with the
next search-key value in the file (in

Index Update: Insertion

 Single-level index insertion:
 Perform a lookup using the search-key
value appearing in the record to be
inserted.
 Dense indices – if the search-key value
does not appear in the index, insert it.
 Sparse indices – if index stores an entry
for each block of the file, no change
needs to be made to the index unless a
new block is created. In this case, the
first search-key value appearing in the
new block is inserted into the index.

Secondary Indices
 Frequently, one wants to find all
the records whose values in a
certain field (which is not the
search-key of the primary index
satisfy some condition.
 Example 1: In the account database
stored sequentially by account
number, we may want to find all
accounts in a particular branch
 Example 2: as above, but where we
want to find all accounts with a
specified balance or range of
balances

Secondary Index on balance
field of account

Primary and Secondary Indices

 Secondary indices have to be dense.
 Indices offer substantial benefits when
searching for records.
 When a file is modified, every index
on the file must be updated, Updating
indices imposes overhead on database
modification.
 Sequential scan using primary index is
efficient, but a sequential scan using a
secondary index is expensive

B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.

 Disadvantage of indexed-sequential
files: performance degrades as file
grows, since many overflow blocks
get created. Periodic reorganization
of entire file is required.
 Advantage of B+-tree index files:
automatically reorganizes itself with
small, local, changes, in the face of
insertions and deletions.
Reorganization of entire file is not
required to maintain performance.

B+-Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following properties:

 All paths from root to leaf are of
the same length
 Each node that is not a root or a
leaf has between [n/2] and n
children.
 A leaf node has between [(n–1)/2]
and n–1 values
 Special cases:
 If the root is not a leaf, it has at least 2
children.

B+-Tree Node Structure

 Typical node

 Ki are the search-key values
 Pi are pointers to children (for non-leaf
nodes) or pointers to records or buckets
of records (for leaf nodes).
 The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1

Leaf Nodes in B+-Trees
Properties of a leaf node:

 For i = 1, 2, . . ., n–1, pointer Pi either
points to a file record with search-key
value Ki, or to a bucket of pointers to
file records, each record having
search-key value Ki. Only need bucket
structure if search-key does not form
a primary key.
 If Li, Lj are leaf nodes and i < j, Li‘s
search-key values are less than Lj‘s
search-key values
 Pn points to next leaf node in search-

Non-Leaf Nodes in B+-Trees
 Non leaf nodes form a multi-level
sparse index on the leaf nodes. For a
non-leaf node with m pointers:
 All the search-keys in the subtree to which
P1 points are less than K1
 For 2 i n – 1, all the search-keys in the
subtree to which Pi points have values
greater than or equal to Ki–1 and less than
Km–1

Example of a B+-tree

B+-tree for account file (n = 3)

Example of B+-tree

B+-tree for account file (n = 5)

 Leaf nodes must have between 2
and 4 values
( (n–1)/2 and n –1, with n = 5).
 Non-leaf nodes other than root
must have between 3 and 5
children ( (n/2 and n with n

Observations about B+-trees

 Since the inter-node connections are
done by pointers, ―logically‖ close
blocks need not be ―physically‖ close.
 The non-leaf levels of the B+-tree
form a hierarchy of sparse indices.
 The B+-tree contains a relatively small
number of levels (logarithmic in the
size of the main file), thus searches
can be conducted efficiently.
 Insertions and deletions to the main

Queries on B+-Trees
 Find all records with a search-key
value of k.
1. Start with the root node
1. Examine the node for the smallest
search-key value > k.
2. If such a value exists, assume it is Kj.
Then follow Pi to the child node
3. Otherwise k Km–1, where there are m
pointers in the node. Then follow Pm to
the child node.
2. If the node reached by following the
pointer above is not a leaf node, repeat
the above procedure on the node, and
follow the corresponding pointer.

Queries on B+-Trees (Cont.)
 In processing a query, a path is
traversed in the tree from the root
to some leaf node.
 If there are K search-key values in
the file, the path is no longer than
log n/2 (K) .
 A node is generally the same size
as a disk block, typically 4
kilobytes, and n is typically
around 100 (40 bytes per index
entry).
 With 1 million search key values

Updates on B+-Trees: Insertion

 Find the leaf node in which the
search-key value would appear
 If the search-key value is already
there in the leaf node, record is added
to file and if necessary a pointer is
inserted into the bucket.
 If the search-key value is not there,
then add the record to the main file
and create a bucket if necessary.
Then:

(Cont.)
 Splitting a node:
 take the n(search-key value, pointer) pairs
(including the one being inserted) in
sorted order. Place the first n/2 in the
original node, and the rest in a new node.
 let the new node be p, and let k be the
least key value in p. Insert (k,p) in the
parent of the node being split. If the
parent is full, split it and propagate the
split further up.
Result of splitting node containing Brighton and Downtown on
 inserting splitting of nodes proceeds
The Clearview

(Cont.)

B+-Tree before and after insertion of “Clearview”

Updates on B+-Trees: Deletion
 Find the record to be deleted,
and remove it from the main file
and from the bucket (if present)
 Remove (search-key value,
pointer) from the leaf node if
there is no bucket or if the
bucket has become empty
 If the node has too few entries
due to the removal, and the
entries in the node and a sibling
fit into a single node, then
 Insert all the search-key values in

Updates on B+-Trees: Deletion

 Otherwise, if the node has too few
entries due to the removal, and the
entries in the node and a sibling fit
into a single node, then
 Redistribute the pointers between the
node and a sibling such that both have
more than the minimum number of
entries.
 Update the corresponding search-key
value in the parent of the node.
 The node deletions may cascade

Examples of B+-Tree Deletion

Before and after deleting “Downtown”
The removal of the leaf node containing ―Downtown‖ did
not result in its parent having too little pointers. So
the cascaded deletions stopped with the deleted leaf
node‘s parent.

Examples of B+-Tree Deletion
(Cont.)

Deletion of “Perryridge” from result of previous example

 Node with ―Perryridge‖ becomes
underfull (actually empty, in this
special case) and merged with its
sibling.

Example of B+-tree Deletion
(Cont.)

Before and after deletion of “Perryridge” from earlier example
 Parent of leaf containing Perryridge became underfull, and
borrowed a pointer from its left sibling
 Search-key value in the parent‘s parent changes as a result

B+-Tree File Organization
 Index file degradation problem is
solved by using B+-Tree indices.
Data file degradation problem is
solved by using B+-Tree File
Organization.
 The leaf nodes in a B+-tree file
organization store records, instead
of pointers.
 Since records are larger than
pointers, the maximum number of
records that can be stored in a leaf
node is less than the number of

B+-Tree File Organization
(Cont.)

Example of B+-tree File Organization

Good space utilization important since records use more space than pointers.
To improve space utilization, involve more sibling nodes in redistribution during
splits and merges
 Involving 2 siblings in redistribution (to avoid split /
merge where possible) results in each node having at
least entries
2n / 3

B-Tree Index Files
 Similar to B+-tree, but B-tree allows search-key values to appear only
once; eliminates redundant storage of search keys.
 Search keys in nonleaf nodes appear nowhere else in the B-tree; an
additional pointer field for each search key in a nonleaf node must be
included.
 Generalized B-tree leaf node

 Nonleaf node – pointers Bi are
the bucket or file record
pointers.

B-Tree Index File Example

B-tree (above) and B+-tree (below) on
same data

B-Tree Index Files (Cont.)
 Advantages of B-Tree indices:
 May use less tree nodes than a
corresponding B+-Tree.
 Sometimes possible to find search-key value
before reaching leaf node.
 Disadvantages of B-Tree indices:
 Only small fraction of all search-key values
are found early
 Non-leaf nodes are larger, so fan-out is
reduced. Thus, B-Trees typically have
greater depth than corresponding B+-Tree
 Insertion and deletion more complicated
than in B+-Trees

Static Hashing
 A bucket is a unit of storage containing
one or more records (a bucket is typically
a disk block).
 In a hash file organization we obtain the
bucket of a record directly from its
search-key value using a hash function.
 Hash function h is a function from the set
of all search-key values K to the set of all
bucket addresses B.
 Hash function is used to locate records
for access, insertion as well as deletion.

Example of Hash File Organization
(Cont.)
Hash file organization of account file, using branch-name as key
(See figure in next slide.)

 There are 10 buckets,
 The binary representation of the ith
character is assumed to be the integer i.
 The hash function returns the sum of the
binary representations of the characters
modulo 10
 E.g. h(Perryridge) = 5 h(Round Hill) = 3
h(Brighton) = 3

Example of Hash File Organization
Hash file organization of account file, using branch-name as key
(see previous slide for details).

Hash Functions
 Worst hash function maps all search-key
values to the same bucket; this makes
access time proportional to the number of
search-key values in the file.
 An ideal hash function is uniform, i.e.,
each bucket is assigned the same number
of search-key values from the set of all
possible values.
 Ideal hash function is random, so each
bucket will have the same number of
records assigned to it irrespective of the
actual distribution of search-key values in

Handling of Bucket Overflows of
 Bucket overflow can occur because
 Insufficient buckets
 Skew in distribution of records. This can
occur due to two reasons:
 multiple records have same search-key value
 chosen hash function produces non-uniform
distribution of key values
 Although the probability of bucket
overflow can be reduced, it cannot be
eliminated; it is handled by using overflow
buckets.

Handling of Bucket Overflows (Cont.)
 Overflow chaining – the overflow buckets of
a given bucket are chained together in a
linked list.
 Above scheme is called closed hashing.
 An alternative, called open hashing, which
does not use overflow buckets, is not suitable
for database applications.

Hash Indices

 Hashing can be used not only for file
organization, but also for index-structure
creation.
 A hash index organizes the search
keys, with their associated record
pointers, into a hash file structure.
 Strictly speaking, hash indices are always
secondary indices
 if the file itself is organized using hashing, a
separate primary hash index on it using the

Deficiencies of Static Hashing
 In static hashing, function h maps search-
key values to a fixed set of B of bucket
addresses.
 Databases grow with time. If initial number of
buckets is too small, performance will degrade
due to too much overflows.
 If file size at some point in the future is
anticipated and number of buckets allocated
accordingly, significant amount of space will be
wasted initially.
 If database shrinks, again space will be
wasted.
 One option is periodic re-organization of the

Dynamic Hashing grows and shrinks
 Good for database that
in size
 Allows the hash function to be modified
dynamically
 Extendable hashing – one form of dynamic
hashing
 Hash function generates values over a large
range — typically b-bit integers, with b = 32.
 At any time use only a prefix of the hash
function to index into a table of bucket
addresses.
 Let the length of the prefix be i bits, 0 i 32.
 Bucket address table size = 2i. Initially i = 0
 Value of i grows and shrinks as the size of the

General Extendable Hash
Structure

In this structure, i2 = i3 = i, whereas i1 = i – 1 (see next slide for
details)

Use of Extendable Hashi ;Structure
 Each bucket j stores a value all thej
entries that point to the same bucket have
the same values on the first ij bits.
 To locate the bucket containing search-key
Kj:
1.Compute h(Kj) = X
2.Use the first i high order bits of X as a
displacement into bucket address table, and
follow the pointer to appropriate bucket
 To insert a record with search-key value Kj
 follow same procedure as look-up and locate
the bucket, say j.
 If there is room in the bucket j insert record in

Updates in Extendable Hash
To split a bucket j when inserting record with search-key value Kj:

Structure than one pointer to bucket j)
If i > ij (more
 allocate a new bucket z, and set ij and iz to the
old ij -+ 1.
 make the second half of the bucket address
table entries pointing to j to point to z
 remove and reinsert each record in bucket j.
 recompute new bucket for Kj and insert record
in the bucket (further splitting is required if the
bucket is still full)
 If i = ij (only one pointer to bucket j)
 increment i and double the size of the bucket
address table.

Updates in Extendable Hash
Structure (Cont.)value, if the bucket is full
 When inserting a
after several splits (that is, i reaches some
limit b) create an overflow bucket instead
of splitting bucket entry table further.
 To delete a key value,
 locate it in its bucket and remove it.
 The bucket itself can be removed if it becomes
empty (with appropriate updates to the bucket
address table).
 Coalescing of buckets can be done (can
coalesce only with a ―buddy‖ bucket having
same value of ij and same ij –1 prefix, if it is
present)

Use of Extendable Hash Structure:
Example

Initial Hash structure, bucket size = 2

Example (Cont.)
 Hash structure after insertion of one
Brighton and two Downtown records

Example (Cont.)
Hash structure after insertion of Mianus record

Example (Cont.)

Hash structure after insertion of three Perryridge records

Example (Cont.)

 Hash structure after insertion of
Redwood and Round Hill records

Extendable Hashing vs. Other
Schemes
 Benefits of extendable hashing:
 Hash performance does not degrade with
growth of file
 Minimal space overhead
 Disadvantages of extendable hashing
 Extra level of indirection to find desired record
 Bucket address table may itself become very
big (larger than memory)
 Need a tree structure to locate desired record in
the structure!
 Changing size of bucket address table is an

Comparison of Ordered Indexing and
Hashing

 Cost of periodic re-organization
 Relative frequency of insertions and
deletions
 Is it desirable to optimize average access
time at the expense of worst-case access
time?
 Expected type of queries:
 Hashing is generally better at retrieving
records having a specified value of the key.
 If range queries are common, ordered indices

Index Definition in SQL

 Create an index
create index <index-name> on <relation-
name>
(<attribute-list>)
E.g.: create index b-index on branch(branch-
name)
 Use create unique index to indirectly
specify and enforce the condition that the
search key is a candidate key is a
candidate key.
 Not really required if SQL unique integrity

Multiple-Key Access certain types
 Use multiple indices for
of queries.
 Example:
select account-number
from account
where branch-name = ―Perryridge‖ and
balance = 1000
 Possible strategies for processing
query using indices on single
attributes:
1. Use index on branch-name to find
accounts with balances of $1000; test
branch-name = ―Perryridge‖.

Indices on Multiple Attributes
Suppose we have an index on combined search-key
(branch-name, balance).

 With the where clause
where branch-name = ―Perryridge‖
and balance = 1000
the index on the combined search-
key will fetch only records that
satisfy both conditions.
Using separate indices in less
efficient — we may fetch many
records (or pointers) that satisfy only
one of the conditions.
 Can also efficiently handle

Grid Files
 Structure used to speed the
processing of general multiple
search-key queries involving one or
more comparison operators.
 The grid file has a single grid array
and one linear scale for each search-
key attribute. The grid array has
number of dimensions equal to
number of search-key attributes.
 Multiple cells of grid array can point
to same bucket
 To find the bucket for a search-key

Queries on a Grid File

 A grid file on two attributes A and B
can handle queries of all following
forms with reasonable efficiency
 (a1 A a2)
 (b1 B b2)
 ( a1 A a2 b1 B b2),.
 E.g., to answer (a1 A a2 b1 B
b2), use linear scales to find
corresponding candidate grid array
cells, and look up all the buckets

Grid Files (Cont.)
 During insertion, if a bucket
becomes full, new bucket can be
created if more than one cell points
to it.
 Idea similar to extendable hashing, but
on multiple dimensions
 If only one cell points to it, either an
overflow bucket must be created or the
grid size must be increased
 Linear scales must be chosen to
uniformly distribute records across
cells.
 Otherwise there will be too many

Bitmap Indices

 Bitmap indices are a special type of
index designed for efficient querying
on multiple keys
 Records in a relation are assumed to
be numbered sequentially from, say, 0
 Given a number n it must be easy to
retrieve record n
 Particularly easy if records are of fixed size
 Applicable on attributes that take on a
relatively small number of distinct
values

Bitmap Indices (Cont.)

 In its simplest form a bitmap index on
an attribute has a bitmap for each
value of the attribute
 Bitmap has as many bits as records
 In a bitmap for value v, the bit for a record
is 1 if the record has the value v for the
attribute, and is 0 otherwise

 Bitmap indices are useful for queries
on multiple attributes
 not particularly useful for single attribute
queries
 Queries are answered using bitmap
operations
 Intersection (and)
 Union (or)
 Complementation (not)
 Each operation takes two bitmaps of
the same size and applies the
operation on corresponding bits to get

 Bitmap indices generally very small
compared with relation size
 E.g. if record is 100 bytes, space for a single
bitmap is 1/800 of space used by relation.
 If number of distinct attribute values is 8,
bitmap is only 1% of relation size
 Deletion needs to be handled properly
 Existence bitmap to note if there is a valid
record at a record location
 Needed for complementation
 not(A=v): (NOT bitmap-A-v) AND
ExistenceBitmap
 Should keep bitmaps for all values, even

Efficient Implementation of
Bitmap Operations
 Bitmaps are packed into words; a
single word and (a basic CPU
instruction) computes and of 32 or 64
bits at once
 E.g. 1-million-bit maps can be anded with
just 31,250 instruction
 Counting number of 1s can be done
fast by a trick:
 Use each byte to index into a
precomputed array of 256 elements each
storing the count of 1s in the binary
representation

Partitioned Hashing
 Hash values are split into
segments that depend on each
attribute of the search-key.
(A1, A2, . . . , An) for n
attribute search-key
 Example: n = 2, for
customer, search-key being
(customer-street, customer-city)
search-key value hash
value
(Main, Harrison) 101
111

Sequential File For account
Records

Deletion of ―Perryridge‖ From the
B+-Tree of Figure 12.12

Chapter 13: Query Processing
 Overview
 Measures of Query Cost
 Selection Operation
 Sorting
 Join Operation
 Other Operations
 Evaluation of Expressions

Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

Basic Steps in Query
Processing (Cont.)
 Parsing and translation
 translate the query into its internal form. This is then
translated into relational algebra.
 Parser checks syntax, verifies relations
 Evaluation
 The query-execution engine takes a query-evaluation
plan, executes that plan, and returns the answers to the
query.

Basic Steps in Query Processing :
 Optimization expression may have
A relational algebra
many equivalent expressions
 E.g., balance 2500 ( balance(account)) is equivalent
to
balance( balance 2500(account))
 Each relational algebra operation can be
evaluated using one of several different
algorithms
 Correspondingly, a relational-algebra
expression can be evaluated in many ways.
 Annotated expression specifying detailed
evaluation strategy is called an

Basic Steps: Optimization
(Cont.)
Query Optimization: Amongst all
equivalent evaluation plans choose
the one with lowest cost.
 Cost is estimated using statistical
information from the
database catalog
 e.g. number of tuples in each relation, size of
tuples, etc.
In this chapter we study
 How to measure query costs
 Algorithms for evaluating relational algebra
operations

Measures of Query Cost
 Cost is generally measured as
total elapsed time for answering
query
 Many factors contribute to time cost
 disk accesses, CPU, or even network
communication
 Typically disk access is the
predominant cost, and is also
relatively easy to estimate.
Measured by taking into account
 Number of seeks * average-
seek-cost
 Number of blocks read * average-

Measures of Query Cost (Cont.)
 For simplicity we just use number of
block transfers from disk as the cost
measure
 We ignore the difference in cost between
sequential and random I/O for simplicity
 We also ignore CPU costs for simplicity
 Costs depends on the size of the buffer
in main memory
 Having more memory reduces need for disk
access
 Amount of real memory available to buffer
depends on other concurrent OS processes,
and hard to determine ahead of actual

Selection Operation

 File scan – search algorithms that
locate and retrieve records that fulfill
a selection condition.
 Algorithm A1 (linear search). Scan
each file block and test all records to
see whether they satisfy the selection
condition.
 Cost estimate (number of disk blocks
scanned) = br
 br denotes number of blocks containing
records from relation r

Selection Operation (Cont.)
 A2 (binary search). Applicable if
selection is an equality comparison
on the attribute on which file is
ordered.
 Assume that the blocks of a relation are
stored contiguously
 Cost estimate (number of disk blocks to
be scanned):
 log2(br) — cost of locating the first tuple
by a binary search on the blocks
 Plus number of blocks containing records
that satisfy selection condition

Selections algorithms that use an index
 Index scan – search
Using Indices
 selection condition must be on search-key
of index.
 A3 (primary index on candidate key, equality). Retrieve a single
record that satisfies the corresponding equality condition
 Cost = HTi + 1
 A4 (primary index on nonkey, equality) Retrieve multiple
records.
 Records will be on consecutive blocks
 Cost = HTi + number of blocks containing
retrieved records
 A5 (equality on search-key of secondary index).
 Retrieve a single record if the search-key
is a candidate key
 Cost = HT + 1

Selections Involving
 Can implement selections of the form
Comparisons by using
(r) or (r)
A V A V
 a linear file scan or binary search,
 or by using indices in the following ways:
 A6 (primary index, comparison).
(Relation is sorted on A)
 For A V(r) use index to find first tuple v
and scan relation sequentially from there
 For A V (r) just scan relation sequentially till
first tuple > v; do not use index
 A7 (secondary index, comparison).
 For A V(r)
use index to find first index entry
v and scan index sequentially from there, to
find pointers to records.

Implementation of Complex
 Conjunction: . . . (r)
Selections 1 2 n

 A8 (conjunctive selection using one
index).
 Select a combination of and algorithms
i
A1 through A7 that results in the least
cost for i (r).
 Test other conditions on tuple after
fetching it into memory buffer.
 A9 (conjunctive selection using
multiple-key index).
 Use appropriate composite (multiple-key)
index if available.
 A10 (conjunctive selection by
intersection of identifiers).

Algorithms for Complex
 Disjunction: . . . (r).
Selections 1 2 n

 A11 (disjunctive selection by union of
identifiers).
 Applicable if all conditions have available
indices.
 Otherwise use linear scan.
 Use corresponding index for each condition,
and take union of all the obtained sets of
record pointers.
 Then fetch records from file
 Negation: (r)
 Use linear scan on file

Sorting

 We may build an index on the relation,
and then use the index to read the
relation in sorted order. May lead to
one disk block access for each tuple.
 For relations that fit in memory,
techniques like quicksort can be used.
For relations that don‘t fit in memory,
external
sort-merge is a good choice.

External Sort-Merge
Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of I be N
2. Merge the runs (N-way merge). We assume
(for now) that N < M .
1. Use N blocks of memory to buffer input
runs, and 1 block to buffer output. Read the
first block of each run into its buffer page
2. repeat
1. Select the first record (in sort order) among
all buffer pages
2. Write the record to the output buffer. If the

External Sort-Merge (Cont.)
 If i M, several merge passes
are required.
 In each pass, contiguous groups of
M - 1 runs are merged.
 A pass reduces the number of runs
by a factor of M -1, and creates
runs longer by the same factor.
 E.g. If M=11, and there are 90 runs,
one pass reduces the number of runs
to 9, each 10 times the size of the
initial runs
 Repeated passes are performed till
all runs have been merged into one.

Example: External Sorting Using
Sort-Merge

External Merge Sort (Cont.)

 Cost analysis:
 Total number of merge passes required:
logM–1(br/M) .
 Disk accesses for initial run creation as
well as in each pass is 2br
 for final pass, we don‘t count write cost
 we ignore final write cost for all operations
since the output of an operation may be sent
to the parent operation without being written
to disk
Thus total number of disk accesses for
external sorting:

Join Operation

 Several different algorithms to
implement joins
 Nested-loop join
 Block nested-loop join
 Indexed nested-loop join
 Merge-join
 Hash-join
 Choice based on cost estimate
 Examples use the following
information

Nested-Loop Join
 To compute the theta join r s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy
the join condition
if they do, add tr • ts to the result.
end
end
 r is called the outer relation and s the
inner relation of the join.
 Requires no indices and can be used
with any kind of join condition.

Nested-Loop Join (Cont.)
 In the worst case, if there is enough
memory only to hold one block of
each relation, the estimated cost is
nr bs + br
disk accesses.
 If the smaller relation fits entirely in
memory, use that as the inner
relation. Reduces cost to br + bs
disk accesses.
 Assuming worst case memory
availability cost estimate is
 5000 400 + 100 = 2,000,100 disk
accesses with depositor as outer
relation, and

Block Nested-Loop Join
 Variant of nested-loop join in
which every block of inner relation
is paired with every block of outer
relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do
begin
for each tuple ts in Bs do
begin
Check if (tr,ts) satisfy
the join condition

Block Nested-Loop Join (Cont.)
 Worst case estimate: br bs + br
block accesses.
 Each block in the inner relation s is read
once for each block in the outer relation
(instead of once for each tuple in the
outer relation
 Best case: br + bs block accesses.
 Improvements to nested loop and
block nested loop algorithms:
 In block nested-loop, use M — 2 disk
blocks as blocking unit for outer
relations, where M = memory size in
blocks; use remaining two blocks to
buffer inner relation and output

Indexed Nested-Loop Join
 Index lookups can replace file scans if
 join is an equi-join or natural join and
 an index is available on the inner relation‘s
join attribute
 Can construct an index just to compute a
join.
 For each tuple tr in the outer relation r,
use the index to look up tuples in s
that satisfy the join condition with tuple
tr.
 Worst case: buffer has space for only
one page of r, and, for each tuple in r,
we perform an index lookup on s.

Example of Nested-Loop Join
Costs
 Compute depositor customer, with
depositor as the outer relation.
 Let customer have a primary B+-tree
index on the join attribute customer-
name, which contains 20 entries in
each index node.
 Since customer has 10,000 tuples, the
height of the tree is 4, and one more
access is needed to find the actual
data

Merge-Join
1. Sort both relations on their join attribute
(if not already sorted on the join
attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the
sort-merge algorithm.
2. Main difference is handling of duplicate
values in join attribute — every pair with
same value on join attribute must be
matched

Merge-Join (Cont.)
 Can be used only for equi-joins and
natural joins
 Each block needs to be read only once
(assuming all tuples for any given value
of the join attributes fit in memory
 Thus number of block accesses for
merge-join is
br + bs + the cost of sorting
if relations are unsorted.
 hybrid merge-join: If one relation is
sorted, and the other has a secondary
B+-tree index on the join attribute

Hash-Join
 Applicable for equi-joins and natural
joins.
 A hash function h is used to partition
tuples of both relations
 h maps JoinAttrs values to {0, 1, ..., n},
where JoinAttrs denotes the common
attributes of r and s used in the
natural join.
 r0, r1, . . ., rn denote partitions of r tuples
 Each tuple tr r is put in partition ri where i
= h(tr [JoinAttrs]).
 r0,, r1. . ., rn denotes partitions of s tuples

Hash-Join (Cont.)

 r tuples in ri need only to be
compared with s tuples in si Need
not be compared with s tuples in
any other partition, since:
 an r tuple and an s tuple that satisfy the
join condition will have the same value
for the join attributes.
 If that value is hashed to some value i,
the r tuple has to be in ri and the s
tuple in si.

Hash-Join Algorithm
The hash-join of r and s is computed as follows.
1. Partition the relation s using
hashing function h. When
partitioning a relation, one block of
memory is reserved as the output
buffer for each partition.
2. Partition r similarly.
3. For each i:
(a) Load si into memory and build an in-
memory hash index on it using the join
Relation s is called the build input and
r is called the probe input.
attribute. This hash index uses a
different hash function than the earlier

Hash-Join algorithm (Cont.)
 The value n and the hash function h
is chosen such that each si should fit
in memory.
 Typically n is chosen as bs/M * f where
f is a ―fudge factor‖, typically around 1.2
 The probe relation partitions si need not
fit in memory
 Recursive partitioning required if
number of partitions n is greater
than number of pages M of memory.
 instead of partitioning n ways, use M –
1 partitions for s

Handling of Overflows
 Hash-table overflow occurs in partition
si if si does not fit in memory. Reasons
could be
 Many tuples in s with same value for join
attributes
 Bad hash function
 Partitioning is said to be skewed if
some partitions have significantly more
tuples than some others
 Overflow resolution can be done in
build phase
 Partition si is further partitioned using
different hash function.

Cost of Hash-Join
 If recursive partitioning is not
required: cost of hash join is
3(br + bs) +2 nh
 If recursive partitioning required,
number of passes required for
partitioning s is logM–1(bs) – 1 . This
is because each final partition of s
should fit in memory.
 The number of partitions of probe
relation r is the same as that for build
relation s; the number of passes for
partitioning of r is also the same as
for s.

Example of Cost of Hash-Join
customer depositor

 Assume that memory size is 20 blocks
 bdepositor= 100 and bcustomer = 400.
 depositor is to be used as build input.
Partition it into five partitions, each of
size 20 blocks. This partitioning can be
done in one pass.
 Similarly, partition customer into five
partitions,each of size 80. This is also
done in one pass.
 Therefore total cost: 3(100 + 400) =
1500 block transfers

Hybrid Hash–Join
 Useful when memory sized are relatively
large, and the build input is bigger than
memory.
 Main feature of hybrid hash join:
Keep the first partition of the build
relation in memory.
 E.g. With memory size of 25 blocks,
depositor can be partitioned into five
partitions, each of size 20 blocks.
 Division of memory:
 The first partition occupies 20 blocks of
memory bs
 1 block is used for input, and 1 block each

Complex Joins
 Join with a conjunctive condition:
r 1 2 ... n s
 Either use nested loops/block nested
loops, or
 Compute the result of one of the
simpler joins r is
 final result comprises those tuples in the
intermediate result that satisfy the
remaining conditions
1 ... i –1 i +1 ... n

 Join with a disjunctive condition
r 1 2 ... ns
 Either use nested loops/block nested

Other Operations
 Duplicate elimination can be
implemented via hashing or
sorting.
 On sorting duplicates will come
adjacent to each other, and all but
one set of duplicates can be
deleted. Optimization: duplicates
can be deleted during run
generation as well as at
intermediate merge steps in
external sort-merge.
 Hashing is similar – duplicates will

Other Operations : Aggregation

 Aggregation can be implemented in a
manner similar to duplicate
elimination.
 Sorting or hashing can be used to bring
tuples in the same group together, and
then the aggregate functions can be
applied on each group.
 Optimization: combine tuples in the same
group during run generation and
intermediate merges, by computing
partial aggregate values

Other Operations : Set
Operations ( , and ):
Set operations can
either use variant of merge-join after
sorting, or variant of hash-join.
 E.g., Set operations using hashing:
1. Partition both relations using the same
hash function, thereby creating, r1, .., rn r0,
and s1, s2.., sn
2. Process each partition i as follows.
Using a different hashing function, build
an in-memory hash index on ri after it is
brought into memory.
3. – r s: Add tuples in si to the hash
index if they are not already in it. At end

Other Operations : Outer Join
 Outer join can be computed either as
 A join followed by addition of null-padded
non-participating tuples.
 by modifying the join algorithms.
 Modifying merge join to compute r
s
 In r s, non participating tuples are
those in r – R(r s)
 Modify merge-join to compute r s:
During merging, for every tuple tr from r
that do not match any tuple in s, output tr
padded with nulls.

Evaluation of Expressions

 So far: we have seen algorithms for
individual operations
 Alternatives for evaluating an entire
expression tree
 Materialization: generate results of an
expression whose inputs are relations or
are already computed, materialize (store)
it on disk. Repeat.
 Pipelining: pass on tuples to parent
operations even as an operation is being
executed

Materialization
 Materialized evaluation: evaluate
one operation at a time, starting
at the lowest-level. Use
intermediate results materialized
into temporaryaccount)
balance 2500 ( relations to

evaluate next-level operations.
 E.g., in figure below, compute and
store

then compute the store its join
with customer, and finally
compute the projections on
customer-name.

Materialization (Cont.)

 Materialized evaluation is always
applicable
 Cost of writing results to disk and
reading them back can be quite high
 Our cost formulas for operations ignore
cost of writing results to disk, so
 Overall cost = Sum of costs of individual
operations +
cost of writing intermediate
results to disk
 Double buffering: use two output

Pipelining
 Pipelined evaluation : evaluate several
operations simultaneously, passing the
results of one operation on to the next.
balance 2500 (account )
 E.g., in previous expression tree, don‘t
store result of

 instead, pass tuples directly to the join..
Similarly, don‘t store result of join, pass
tuples directly to projection.
 Much cheaper than materialization: no
need to store a temporary relation to


Pipelining (Cont.)
In demand driven or lazy evaluation
 system repeatedly requests next tuple from
top level operation
 Each operation requests next tuple from
children operations as required, in order to
output its next tuple
 In between calls, operation has to maintain
―state‖ so it knows what to return next
 Each operation is implemented as an iterator
implementing the following operations
 open()
 E.g. file scan: initialize file scan, store pointer to
beginning of file as state
 E.g.merge join: sort relations and store pointers to

Pipelining (Cont.)

 In produce-driven or eager pipelining
 Operators produce tuples eagerly and
pass them up to their parents
 Buffer maintained between operators, child
puts tuples in buffer, parent removes tuples
from buffer
 if buffer is full, child waits till there is space
in the buffer, and then generates more
tuples
 System schedules operations that have
space in output buffer and can process
more input tuples

Evaluation Algorithms for
Some algorithms are not able to output
 Pipelining
results even as they get input tuples
 E.g. merge join, or hash join
 These result in intermediate results being
written to disk and then read back always
 Algorithm variants are possible to
generate (at least some) results on the
fly, as input tuples are read in
 E.g. hybrid hash join generates output
tuples even as probe relation tuples in the
in-memory partition (partition 0) are read
in
 Pipelined join technique: Hybrid hash join,
modified to buffer partition 0 tuples of both

Complex Joins
 Join involving three relations: loan
depositor customer
 Strategy 1. Compute depositor
customer; use result to compute
loan (depositor customer)
 Strategy 2. Compute loan
depositor first, and then join the
result with customer.
 Strategy 3. Perform the pair of joins
at once. Build and index on loan for
loan-number, and on customer for
customer-name.

Chapter 14: Query
Optimization
 Introduction
 Catalog Information for Cost
Estimation
 Estimation of Statistics
 Transformation of Relational
Expressions
 Dynamic Programming for
Choosing Evaluation Plans

Introduction

Alternative ways of evaluating a given
query
 Equivalent expressions
 Different algorithms for each operation
(Chapter 13)
Cost difference between a good and a
bad way of evaluating a query can be
enormous
 Example: performing a r X s followed by a
selection r.A = s.B is much slower than
performing a join on the same condition
Need to estimate the cost of operations

Introduction (Cont.)
Relations generated by two equivalent
expressions have the same set of
attributes and contain the same set of
tuples, although their attributes may
be ordered differently.

Introduction (Cont.)

 Generation of query-evaluation plans
for an expression involves several
steps:
1. Generating logically equivalent
expressions
 Use equivalence rules to transform an
expression into an equivalent one.
2. Annotating resultant expressions to get
alternative query plans
3. Choosing the cheapest plan based on
estimated cost

Overview of chapter

 Statistical information for cost
estimation
 Equivalence rules
 Cost-based optimization algorithm
 Optimizing nested subqueries
 Materialized views and view
maintenance

Statistical Information for Cost
Estimation of tuples in a relation
 nr: number
r.
 br: number of blocks containing
tuples of r.
 sr: size of a tuple of r.
 fr: blocking factor of r — i.e., the
number of tuples of r that fit into
one block.
 V(A, r): number of f
br distinct values
nr
r
that appear in r for attribute A;
same as the size of A(r).

Catalog Information about
Indices
 fi: average fan-out of internal
nodes of index i, for
tree-structured indices such as
B+-trees.
 HTi: number of levels in index i
— i.e., the height of i.
 For a balanced tree index (such as
B+-tree) on attribute A of relation r,
HTi = logfi(V(A,r)) .
 For a hash index, HTi is 1.
 LBi: number of lowest-level index

Measures of Query Cost
 Recall that
 Typically disk access is the
predominant cost, and is also
relatively easy to estimate.
 The number of block transfers from
disk is used as a measure of the
actual cost of evaluation.
 It is assumed that all transfers of
blocks have the same cost.
 Real life optimizers do not make this
assumption, and distinguish between
sequential and random disk access
 We do not include cost to writing

Selection Size Estimation
 Equality selection A=v(r)
 SC(A, r) : number of records that will
satisfy the selection
 SC(A, r)/fr — number of blocks that
these records will occupy
SC ( A, r )
 E.g. Binary search cost estimate becomes
Ea 2 log2 (br ) 1
fr

 Equality condition on a key attribute:
SC(A,r) = 1

Statistical Information for
Examples
 faccount= 20 (20 tuples of account fit
in one block)
 V(branch-name, account) = 50 (50
branches)
 V(balance, account) = 500 (500
different balance values)
 account = 10000 (account has 10,000
tuples)
 Assume the following indices exist on
account:

Comparisons
 Selections of the form A V(r) (case of
A V(r)
is symmetric)
 Let c denote the estimated number
of tuples satisfying the condition.
 If min(A,r) v min( A, r )
nr . and max(A,r) are available in
catalog max( A, r ) min( A, r )

 C = 0 if v < min(A,r)

C=

 In absence of statistical information c is
assumed to be nr / 2.

Implementation of Complex
Selections
 The selectivity of a condition i is the
probability that a tuple in the relation
r satisfies i . If si is the number of
satisfying tuples in r, the selectivity of
i is given by si /nrs1 s2 . . . sn
nr
.
 Conjunction: nrn (r). The
1 2 ... n
estimate for number of
s s s
tuples in nr 1 result (1 2 ) ... (1 n )
the (1 1 ) is:
nr nr nr

 Disjunction: 1 2 . . . n (r).
Estimated number of tuples:

Join Operation: Running
Example
Running example:
depositor customer
Catalog information for join examples:
 ncustomer = 10,000.
 fcustomer = 25, which implies that
bcustomer =10000/25 = 400.
 ndepositor = 5000.
 fdepositor = 50, which implies that
bdepositor = 5000/50 = 100.
 V(customer-name, depositor) =
2500, which implies that , on
average, each customer has two

Estimation of the Size of Joins
 The Cartesian product r x s contains
nr .ns tuples; each tuple occupies sr +
ss bytes.
 If R S = , then r s is the same
as r x s.
 If R S is a key for R, then a tuple of
s will join with at most one tuple from
r
 therefore, the number of tuples in r s
is no greater than the number of tuples
in s.
 If R S in S is a foreign key in S
referencing R, then the number of

(Cont.)
 If R S = {A} is not a key for R or S.
If we assumes )that every tuple t in R
nr ns
V ( A,
produces tuples in R S, the number
of tuples inr R s S is estimated to be:
n n
V ( A, r )

If the reverse is true, the estimate
obtained will be:

(Cont.)
 Compute the size estimates for
depositor customer without using
information about foreign keys:
 V(customer-name, depositor) = 2500,
and
V(customer-name, customer) = 10000
 The two estimates are 5000 *
10000/2500 - 20,000 and 5000 *
10000/10000 = 5000
 We choose the lower estimate, which in
this case, is the same as our earlier

Size Estimation for Other
Operations
 Projection: estimated size of A(r) =
V(A,r)
 Aggregation : estimated size of AgF(r)
= V(A,r)
 Set operations
 For unions/intersections of selections on
the same relation: rewrite and use size
estimate for selections
 E.g. 1 (r) 2 (r) can be rewritten as 1 2
(r)
 For operations on different relations:

Size Estimation (Cont.)

 Outer join:
 Estimated size of r s = size of r s
+ size of r
 Case of right outer join is symmetric
 Estimated size of r s = size of r
s + size of r + size of s

Estimation of Number of Distinct
Values
Selections: (r)
 If forces A to take a specified
value: V(A, (r)) = 1.
 e.g., A = 3
 If forces A to take on one of a
specified set of values:
V(A, (r)) = number of
specified values.
 (e.g., (A = 1 V A = 3 V A = 4 )),
 If the selection condition is of the
form A op r

Estimation of Distinct Values
(Cont.)
Joins: r s
 If all attributes in A are from r
estimated V(A, r s) = min
(V(A,r), n r s)
 If A contains attributes A1 from r and
A2 from s, then estimated V(A,r s)
=
min(V(A1,r)*V(A2 – A1,s), V(A1 –
A2,r)*V(A2,s), nr s)
 More accurate estimate can be got using

Estimation of Distinct Values
 (Cont.) of distinct values are
Estimation
straightforward for projections.
 They are the same in A (r) as in r.
 The same holds for grouping attributes
of aggregation.
 For aggregated values
 For min(A) and max(A), the number of
distinct values can be estimated as
min(V(A,r), V(G,r)) where G denotes
grouping attributes
 For other aggregates, assume all values are
distinct, and use V(G,r)

Transformation of Relational
Expressions
 Two relational algebra expressions are
said to be equivalent if on every legal
database instance the two expressions
generate the same set of tuples
 Note: order of tuples is irrelevant
 In SQL, inputs and outputs are
multisets of tuples
 Two expressions in the multiset version of
the relational algebra are said to be
equivalent if on every legal database
instance the two expressions generate the

Equivalence Rules
1.Conjunctive selection operations can
be deconstructed intoEa sequence of
(E) ( ( ))
individual selections.
1 2 1 2

1
( 2
( E )) 2
( 1
( E ))

2.Selection operations are
commutative.
t1 ( t 2 (( tn (E )))) t1 (E )

3.Only the last in a sequence of
projection operations is needed, the
others can be omitted.

Pictorial Depiction of Equivalence
Rules

Equivalence Rules (Cont.)

5.Theta-join operations (and natural
joins) are commutative.
E1 E2 = E2 E1
6. (a) Natural join operations are
associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the
following manner:


7. The selection operation distributes
over the theta join operation under
the following two conditions:
(a) When all the attributes in 0
involve only the attributes of one
of the expressions (E1) being
joined.

0 E1 E2) = ( 0(E1))
E2

8.The projections operation distributes
over the theta join operation as
follows:
L L ( E1....... E2 ) ( L ( E1 )) ...... ( L ( E2 ))
(a) if  involves only attributes from L1
1 2 1 2

L2:

(b) Consider a join E1 E2.
 LetL LE1..... E2 )L2 beL sets Lof 1attributes 2from E1
L 1
( and
12 1L 2
(( L ( E )) ...... ( L L ( E )))
1 3 2 4

and E2, respectively.
 Let L3 be attributes of E1 that are involved

9. The set operations union and
intersection are commutative
E1 E2 = E2 E1
E1 E2 = E2 E1
 (set difference is not commutative).
10.Set union and intersection are
associative.
(E1 E2) E3 = E1 (E2 E3)
(E1 E2) E3 = E1 (E2 E3)
11.The selection operation distributes
over , and –.
(E1 – E2) = (E1) – (E2)
and similarly for and

Transformation Example

 Query: Find the names of all
customers who have an account at
some branch located in Brooklyn.
customer-name( branch-city = ―Brooklyn‖
(branch (account
depositor)))
 Transformation using rule 7a.
customer-name
(( (branch))
branch-city =―Brooklyn‖
(account depositor))

Example with Multiple
Transformations
 Query: Find the names of all
customers with an account at a
Brooklyn branch whose account
balance is over $1000.
customer-name(( branch-city = ―Brooklyn‖ balance > 1000

(branch (account
depositor)))
 Transformation using join
associatively (Rule 6a):
customer-name(( branch-city = ―Brooklyn‖ balance > 1000

(branch (account))

Multiple Transformations
(Cont.)

Projection Operation Example
customer-name(( branch-city = “Brooklyn” (branch) account) depositor)

 When we compute
( branch-city = ―Brooklyn‖ (branch) account )
we obtain a relation whose schema is:
(branch-name, branch-city, assets,
account-number, balance)
 Push projections using equivalence rules
8a and 8b; eliminate unneeded attributes
from intermediate results to get:
customer-name ((
((

Join Ordering Example

 For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
 If r2 r3 is quite large and r1 r2 is
small, we choose

(r1 r2) r3
so that we compute and store a
smaller temporary relation.

Join Ordering Example (Cont.)
 Consider the expression
customer-name (( branch-city =
―Brooklyn‖ (branch))

account depositor)
 Could compute account
depositor first, and join result
with
branch-city = ―Brooklyn‖ (branch)
but account depositor is likely
to be a large relation.

Enumeration of Equivalent
Query optimizers use equivalence rules
 Expressions
to systematically generate expressions
equivalent to the given expression
 Conceptually, generate all equivalent
expressions by repeatedly executing
the following step until no more
expressions can be found:
 for each expression found so far, use all
applicable equivalence rules, and add newly
generated expressions to the set of
expressions found so far
 The above approach is very expensive
in space and time

Evaluation Plan
 An evaluation plan defines exactly what
algorithm is used for each operation,
and how the execution of the
operations is coordinated.

Choice of Evaluation Plans
 Must consider the interaction of
evaluation techniques when choosing
evaluation plans: choosing the
cheapest algorithm for each operation
independently may not yield best
overall algorithm. E.g.
 merge-join may be costlier than hash-join,
but may provide a sorted output which
reduces the cost for an outer level
aggregation.
 nested-loop join may provide opportunity
for pipelining

Cost-Based Optimization

 Consider finding the best join-order
for r1 r2 . . . rn.
 There are (2(n – 1))!/(n – 1)! different
join orders for above expression.
With n = 7, the number is 665280,
with n = 10, the number is greater
than 176 billion!
 No need to generate all the join
orders. Using dynamic programming,
the least-cost join order for any

Dynamic Programming in
Optimization
 To find best join tree for a set of n
relations:
 To find best plan for a set S of n
relations, consider all possible plans of
the form: S1 (S – S1) where S1 is any
non-empty subset of S.
 Recursively compute costs for joining
subsets of S to find the cost of each
plan. Choose the cheapest of the 2n – 1
alternatives.
 When plan for any subset is computed,
store it and reuse it when it is required

Join Order Optimization
Algorithm
procedure findbestplan(S)
if (bestplan[S].cost )
return bestplan[S]
// else bestplan[S] has not been
computed earlier, compute it now
for each non-empty subset S1 of S such
that S1 S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results
of P1 and P2
cost = P1.cost + P2.cost + cost of A

Left Deep Join Trees
 In left-deep join trees, the right-
hand-side input for each join is a
relation, not the result of an
intermediate join.

Cost of Optimization

 With dynamic programming time
complexity of optimization with bushy
trees is O(3n).
 With n = 10, this number is 59000
instead of 176 billion!
 Space complexity is O(2n)
 To find best left-deep join tree for a
set of n relations:
 Consider n alternatives with one relation
as right-hand side input and the other
relations as left-hand side input.
 Using (recursively computed and stored)

Interesting Orders in Cost-Based Optimization

 Consider the expression (r1 r2 r3)
r4 r5
 An interesting sort order is a
particular sort order of tuples that
could be useful for a later operation.
 Generating the result of r1 r2 r3
sorted on the attributes common with r4
or r5 may be useful, but generating it
sorted on the attributes common only r1
and r2 is not useful.
 Using merge-join to compute r1 r2 r3
may be costlier, but may provide an
output sorted in an interesting order.

Heuristic Optimization
 Cost-based optimization is
expensive, even with dynamic
programming.
 Systems may use heuristics to
reduce the number of choices
that must be made in a cost-
based fashion.
 Heuristic optimization
transforms the query-tree by
using a set of rules that typically
(but not in all cases) improve
execution performance:

Steps in Typical Heuristic
1. Deconstruct conjunctive
Optimization sequence of single
selections into a
selection operations (Equiv. rule 1.).
2. Move selection operations down
the query tree for the earliest
possible execution (Equiv. rules 2,
7a, 7b, 11).
3. Execute first those selection and
join operations that will produce the
smallest relations (Equiv. rule 6).
4. Replace Cartesian product
operations that are followed by a
selection condition by join

Structure of Query Optimizers

 The System R/Starburst optimizer
considers only left-deep join orders.
This reduces optimization complexity
and generates plans amenable to
pipelined evaluation.
System R/Starburst also uses
heuristics to push selections and
projections down the query tree.
 Heuristic optimization used in some
versions of Oracle:
 Repeatedly pick ―best‖ relation to join

Structure of Query Optimizers
(Cont.)
 Some query optimizers integrate
heuristic selection and the
generation of alternative access
plans.
 System R and Starburst use a hierarchical
procedure based on the nested-block
concept of SQL: heuristic rewriting
followed by cost-based join-order
optimization.
 Even with the use of heuristics, cost-
based query optimization imposes a
substantial overhead.

Optimizing Nested
 SQL conceptually treats nested
Subqueries**
subqueries in the where clause as
functions that take parameters and
return a single value or set of values
 Parameters are variables from outer level
query that are used in the nested subquery;
such variables are called correlation
variables
 E.g.
from borrower
where exists (select *
from depositor
where depositor.customer-

Optimizing Nested Subqueries
(Cont.) evaluation may be quite
 Correlated
inefficient since
 a large number of calls may be made to the
nested query
 there may be unnecessary random I/O as a
result
 SQL optimizers attempt to transform
nested subqueries to joins where
possible, enabling use of efficient join
techniques
 E.g.: earlier nested query can be
rewritten as

In general, SQL queries of the form below
(Cont.)
can be rewritten as shown
 Rewrite: select …
from L1
where P1 and exists (select *
from L2
where P2)
 To: create table t1 as
select distinct V
from L2
where P21
select …
from L1, t1
where P and P 2

 (Cont.)
In our example, the original nested
query would be transformed to
create table t1 as
from depositor
from borrower, t1
where t1.customer-name =
borrower.customer-name
 The process of replacing a nested query
by a query with a join (possibly with a
temporary relation) is called

Materialized Views**

 A materialized view is a view whose
contents are computed and stored.
 Consider the view
create view branch-total-loan(branch-
name, total-loan) as
select branch-name, sum(amount)
from loan
groupby branch-name
 Materializing the above view would be
very useful if the total loan amount is

Materialized View Maintenance

 The task of keeping a materialized
view up-to-date with the underlying
data is known as materialized view
maintenance
 Materialized views can be maintained
by recomputation on every update
 A better option is to use incremental
view maintenance
 Changes to database relations are used to
compute changes to materialized view,
which is then updated

Incremental View Maintenance

 The changes (inserts and deletes) to a
relation or expressions are referred to
as its differential
 Set of tuples inserted to and deleted from
r are denoted ir and dr
 To simplify our description, we only
consider inserts and deletes
 We replace updates to a tuple by deletion
of the tuple followed by insertion of the
update tuple
 We describe how to compute the

Join Operation

 Consider the materialized view v = r
s and an update to r
 Let rold and rnew denote the old and
new states of relation r
 Consider the case of an insert to r:
 We can write rnew s as (rold ir) s
 And rewrite the above to (rold s) ( ir
s)
 But (rold s) is simply the old value of the
materialized view, so the incremental
change to the view is just i s

Selection and Projection
Selection: Consider a view v =
 Operations (r).
 vnew =v
old ( i r)
 vnew = vold - ( d r)
 Projection is a more difficult operation
 R = (A,B), and r(R) = { (a,2), (a,3)}
 A(r) has a single tuple (a).
 If we delete the tuple (a,2) from r, we should
not delete the tuple (a) from A(r), but if we
then delete (a,3) as well, we should delete
the tuple
 For each tuple in a projection A(r)
, we
will keep a count of how many times it
was derived

Aggregation Operations
 count : v = Agcount(B) .
(r)

 When a set of tuples ir is inserted
 For each tuple r in ir, if the corresponding group
is already present in v, we increment its count,
else we add a new tuple with count = 1
 When a set of tuples dr is deleted
 for each tuple t in ir.we look for the group t.A in
v, and subtract 1 from the count for the group.
 If the count becomes 0, we delete from v the tuple
for the group t.A
 sum: v = Agsum (B)(r)

 We maintain the sum in a manner similar to
count, except we add/subtract the B value
instead of adding/subtracting 1 for the count

Aggregate Operations (Cont.)

 min, max: v = Agmin (B) (r).
 Handling insertions on r is
straightforward.
 Maintaining the aggregate values min and
max on deletions may be more expensive.
We have to look at the other tuples of r
that are in the same group to find the new
minimum

Other Operations

 Set intersection: v = r s
 when a tuple is inserted in r we check if it
is present in s, and if so we add it to v.
 If the tuple is deleted from r, we delete it
from the intersection if it is present.
 Updates to s are symmetric
 The other set operations, union and set
difference are handled in a similar
fashion.
 Outer joins are handled in much the
same way as joins but with some extra

Handling Expressions

 To handle an entire expression, we
derive expressions for computing the
incremental change to the result of
each sub-expressions, starting from
the smallest sub-expressions.
 E.g. consider E1 E2 where each of
E1 and E2 may be a complex
expression
 Suppose the set of tuples to be inserted
into E1 is given by D1
 Computed earlier, since smaller sub-

Query Optimization and
Materialized Views
 Rewriting queries to use materialized
views:
 A materialized view v = r s is available
 A user submits a query r s t
 We can rewrite the query as v t
 Whether to do so depends on cost estimates
for the two alternative
 Replacing a use of a materialized view
by the view definition:
 A materialized view v = r s is available,
but without any index on it
 User submits a query A=10(v).

Materialized View Selection

 Materialized view selection: ―What is
the best set of views to materialize?‖.
 This decision must be made on the basis
of the system workload
 Indices are just like materialized
views, problem of index selection is
closely related, to that of materialized
view selection, although it is simpler.
 Some database systems, provide tools
to help the database administrator

Selection Cost Estimate
Example branch-name = “Perryridge”(account)

 Number of blocks is baccount =
500: 10,000 tuples in the
relation; each block holds 20
tuples.
 Assume account is sorted on
branch-name.
 V(branch-name,account) is 50
 10000/50 = 200 tuples of the
account relation pertain to
Perryridge branch

Selections Using Indices
 Index scan – search algorithms that
use an index; condition is on search-
key of index.
 A3 (primary index on candidate key,
equality). Retrieve a single record
that satisfies theA,corresponding
E A4 HTi
SC ( r )

equality condition EA3 = HTi + 1
fr

 A4 (primary index on nonkey,
equality) Retrieve multiple records.
Let the search-key attribute be A.

Cost Estimate Example (Indices)
Consider the query is branch-name = “Perryridge”(account), with the
primary index on branch-name.

 Since V(branch-name, account)
= 50, we expect that 10000/50
= 200 tuples of the account
relation pertain to the Perryridge
branch.
 Since the index is a clustering
index, 200/20 = 10 block reads
are required to read the account
tuples.
 Several index blocks must also

Comparisons
selections of the form A V(r) or A (r) by using a linear file
V
scan or binary search, or by using indices in the following
ways:

 A6 (primary index, comparison).
The cost EAB HTi c is:
estimate
fr

where c is the estimated number
of tuples satisfying the
LBi c
E A7 HTi c
nr
condition. In absence of
statistical information c is
assumed to be nr/2.

Example of Cost Estimate for Complex Selection

 Consider a selection on account
with the following condition: where
branch-name = ―Perryridge‖ and
balance = 1200
 Consider using algorithm A8:
 The branch-name index is clustering,
and if we use it the cost estimate is 12
block reads (as we saw before).
 The balance index is non-clustering,
and
V(balance, account = 500, so the
selection would retrieve 10,000/500 =

Example (Cont.)
 Consider using algorithm A10:
 Use the index on balance to retrieve
set S1 of pointers to records with
balance = 1200.
 Use index on branch-name to
retrieve-set S2 of pointers to records
with branch-name = Perryridge‖.
 S1 S2 = set of pointers to records
with branch-name = ―Perryridge‖
and balance = 1200.
 The number of pointers retrieved
(20 and 200), fit into a single leaf
page; we read four index blocks to

Chapter 15: Transactions
 Transaction Concept
 Transaction State
 Implementation of Atomicity and
Durability
 Concurrent Executions
 Serializability
 Recoverability
 Implementation of Isolation
 Transaction Definition in SQL
 Testing for Serializability.

Transaction Concept
 A transaction is a unit of program
execution that accesses and possibly
updates various data items.
 A transaction must see a consistent
database.
 During transaction execution the database
may be inconsistent.
 When the transaction is committed, the
database must be consistent.
 Two main issues to deal with:
 Failures of various kinds, such as hardware
failures and system crashes
 Concurrent execution of multiple transactions

ACID Properties database system must ensure:
To preserve integrity of data, the
 Atomicity. Either all operations of the transaction
are properly reflected in the database or none are.
 Consistency. Execution of a transaction in
isolation preserves the consistency of the
database.
 Isolation. Although multiple transactions may
execute concurrently, each transaction must be
unaware of other concurrently executing
transactions. Intermediate transaction results
must be hidden from other concurrently executed
transactions.
 That is, for every pair of transactions Ti and Tj, it
appears to Ti that either Tj, finished execution before Ti
started, or Tj started execution after Ti finished.
 Durability. After a transaction completes
successfully, the changes it has made to the
database persist, even if there are system failures.

Example transfer $50 from account A to account B:
 Transaction to
of Fund Transfer
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Consistency requirement – the sum of A and B is
unchanged by the execution of the transaction.
 Atomicity requirement — if the transaction fails after step 3
and before step 6, the system should ensure that its
updates are not reflected in the database, else an
inconsistency will result.

Example of Fund Transfer
(Cont.) that the transaction hasthe user has (i.e.,
 Durability requirement — once
notified completed
been
the transfer of the $50 has taken place), the
updates to the database by the transaction must
persist despite failures.
 Isolation requirement — if between steps 3 and 6,
another transaction is allowed to access the
partially updated database, it will see an
inconsistent database
(the sum A + B will be less than it should be).
Can be ensured trivially by running transactions
serially, that is one after the other. However,
executing multiple transactions concurrently has
significant benefits, as we will see.

Transaction State
 Active, the initial state; the transaction stays in this state
while it is executing
 Partially committed, after the final statement has been
executed.
 Failed, after the discovery that normal execution can no
longer proceed.
 Aborted, after the transaction has been rolled back and
the database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
 restart the transaction – only if no
internal logical error
 kill the transaction
 Committed, after successful completion.

Implementation of Atomicity
andThe recovery-management
 Durability
component of a database system
implements the support for
atomicity and durability.
 The shadow-database scheme:
 assume that only one transaction is
active at a time.
 a pointer called db_pointer always
points to the current consistent copy
of the database.
 all updates are made on a shadow
copy of the database, and db_pointer
is made to point to the updated

Implementation of Atomicity and
Durability (Cont.)
The shadow-database scheme:

 Assumes disks to not fail
 Useful for text editors, but extremely
inefficient for large databases:
executing a single transaction

Concurrent Executions
 Multiple transactions are allowed to
run concurrently in the system.
Advantages are:
 increased processor and disk
utilization, leading to better
transaction throughput: one
transaction can be using the CPU while
another is reading from or writing to
the disk
 reduced average response time for
transactions: short transactions need
not wait behind long ones.
 Concurrency control schemes –
mechanisms to achieve isolation,

Schedules
 Schedules – sequences that indicate
the chronological order in which
instructions of concurrent transactions
are executed
 a schedule for a set of transactions must
consist of all instructions of those
transactions
 must preserve the order in which the
instructions appear in each individual
transaction.

Example Schedules A to B,
 Let T transfer $50 from
1
and T2 transfer 10% of the
balance from A to B. The
following is a serial schedule
(Schedule 1 in the text), in which
T1 is followed by T2.

Example be the transactions defined previously.
 Let T and T Schedule (Cont.)
1 2
The following schedule (Schedule 3 in the text) is not a
serial schedule, but it is equivalent to Schedule 1.

In both Schedule 1 and 3, the sum A + B is preserved.

Example Schedules (Cont.)
 The following concurrent
schedule (Schedule 4 in the text)
does not preserve the value of
the the sum A + B.

Serializability
 Basic Assumption – Each transaction
preserves database consistency.
 Thus serial execution of a set of
transactions preserves database
consistency.
 A (possibly concurrent) schedule is
serializable if it is equivalent to a
serial schedule. Different forms of
schedule equivalence give rise to the
notions of:
1. conflict serializability
2. view serializability

Conflict Serializability

 Instructions li and lj of transactions Ti
and Tj respectively, conflict if and only
if there exists some item Q accessed
by both li and lj, and at least one of
these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj
don‘t conflict.
2. li = read(Q), lj = write(Q). They
conflict.
3. li = write(Q), lj = read(Q). They
conflict

Conflict Serializability (Cont.)
 If a schedule S can be transformed
into a schedule S´ by a series of
swaps of non-conflicting
instructions, we say that S and S´
are conflict equivalent.
 We say that a schedule S is conflict
serializable if it is conflict
equivalent to a serial schedule
 Example of a schedule that is not
conflict serializable:
T3 T4
read(Q)

Conflict Serializability (Cont.)
 Schedule 3 below can be
transformed into Schedule 1, a
serial schedule where T2 follows T1,
by series of swaps of non-
conflicting instructions. Therefore
Schedule 3 is conflict serializable.

View and S´ be two schedules with
 Let S
Serializability
the same set of transactions. S and
S´ are view equivalent if the following
three conditions are met:
1. For each data item Q, if transaction Ti
reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S´,
also read the initial value of Q.
2. For each data item Q if transaction Ti
executes read(Q) in schedule S, and that
value was produced by transaction Tj (if
any), then transaction Ti must in schedule
S´ also read the value of Q that was
produced by transaction Tj .
3. For each data item Q, the transaction (if

View Serializability (Cont.)
 A schedule S is view serializable it is
view equivalent to a serial schedule.
 Every conflict serializable schedule is
also view serializable.
 Schedule 9 (from text) — a schedule
which is view-serializable but not
conflict serializable.

Other Notions of Serializability
 Schedule 8 (from text) given
below produces same outcome
as the serial schedule < T1, T5 >,
yet is not conflict equivalent or
view equivalent to it.

Recoverability
Need to address the effect of transaction failures on concurrently
running transactions.
 Recoverable schedule — if a
transaction Tj reads a data items
previously written by a transaction Ti ,
the commit operation of Ti appears
before the commit operation of Tj.
 The following schedule (Schedule 11)
is not recoverable if T9 commits
immediately after the read

Recoverability (Cont.)
 Cascading rollback – a single
transaction failure leads to a series
of transaction rollbacks. Consider
the following schedule where none
of the transactions has yet
committed (so the schedule is
recoverable)

Recoverability (Cont.)

 Cascadeless schedules — cascading
rollbacks cannot occur; for each pair
of transactions Ti and Tj such that Tj
reads a data item previously written
by Ti, the commit operation of Ti
appears before the read operation of
Tj .
 Every cascadeless schedule is also
recoverable
 It is desirable to restrict the schedules

Implementation of Isolation
 Schedules must be conflict or view
serializable, and recoverable, for
the sake of database consistency,
and preferably cascadeless.
 A policy in which only one
transaction can execute at a time
generates serial schedules, but
provides a poor degree of
concurrency..
 Concurrency-control schemes
tradeoff between the amount of
concurrency they allow and the

Transaction Definition in SQL
 Data manipulation language must
include a construct for specifying
the set of actions that comprise a
transaction.
 In SQL, a transaction begins
implicitly.
 A transaction in SQL ends by:
 Commit work commits current
transaction and begins a new one.
 Rollback work causes current
transaction to abort.
 Levels of consistency specified by
SQL-92:

Levels of Consistency in SQL-92
 Serializable — default
 Repeatable read — only committed
records to be read, repeated reads of
same record must return same value.
However, a transaction may not be
serializable – it may find some
records inserted by a transaction but
not find others.
Lower degrees of consistency useful only committed
 Read committed — for gathering approximate
records can be read, but successive
information about the database, e.g., statistics for query optimizer.

reads of record may return different
(but committed) values.

Testing for Serializabilitya set
 Consider some schedule of
of transactions T1, T2, ..., Tn
 Precedence graph — a direct
graph where the vertices are the
transactions (names).
 We draw an arc from Ti to Tj if
the two transaction conflict, and
Ti accessed the data item on
x

which the conflict arose earlier.
 We may label the arc by the item
that was accessed.
 Example 1
y

Example T2
T1 Schedule T4 T5
T3 (Schedule A)
read(X)
read(Y)
read(Z)
read(V)
read(W)
read(W)
read(Y)
write(Y)
write(Z)
read(U)
read(Y)

Precedence Graph for Schedule
A
T1 T2

T4
T3

Test for Conflict Serializability

 A schedule is conflict serializable if
and only if its precedence graph is
acyclic.
 Cycle-detection algorithms exist
which take order n2 time, where n is
the number of vertices in the graph.
(Better algorithms take order n + e
where e is the number of edges.)
 If precedence graph is acyclic, the
serializability order can be obtained

Test for View Serializability
 The precedence graph test for conflict
serializability must be modified to
apply to a test for view serializability.
 The problem of checking if a schedule
is view serializable falls in the class of
NP-complete problems. Thus
existence of an efficient algorithm is
unlikely.
However practical algorithms that just
check some sufficient conditions for
view serializability can still be used.

Concurrency Control vs.
Serializability Tests
 Testing a schedule for serializability
after it has executed is a little too late!
 Goal – to develop concurrency control
protocols that will assure
serializability. They will generally not
examine the precedence graph as it is
being created; instead a protocol will
impose a discipline that avoids
nonseralizable schedules.
Will study such protocols in Chapter
16.

Schedule 2 -- A Serial Schedule
in Which
T2 is Followed by T1

Schedule 5 -- Schedule 3 After
Swapping A Pair of Instructions

Schedule 6 -- A Serial Schedule
That is Equivalent to Schedule 3

Precedence Graph for
(a) Schedule 1 and (b) Schedule 2

Illustration of Topological
Sorting

Chapter 16: Concurrency

ControlProtocols
Lock-Based
 Timestamp-Based Protocols
 Validation-Based Protocols
 Multiple Granularity
 Multiversion Schemes
 Deadlock Handling
 Insert and Delete Operations
 Concurrency in Index Structures

Lock-Based Protocols
 A lock is a mechanism to control
concurrent access to a data item
 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can
be both read as well as
written. X-lock is requested using
lock-X instruction.
2. shared (S) mode. Data item can only
be read. S-lock is
requested using lock-S instruction.
 Lock requests are made to concurrency-

Lock-Based Protocols (Cont.)
 Lock-compatibility matrix

 A transaction may be granted a lock
on an item if the requested lock is
compatible with locks already held on
the item by other transactions
 Any number of transactions can hold
shared locks on an item, but if any
transaction holds an exclusive on the

Lock-Based Protocols (Cont.)
 Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
 Locking as above is not sufficient to guarantee serializability — if A
and B get updated in-between the read of A and B, the displayed
sum would be wrong.
 A locking protocol is a set of rules followed by all transactions
while requesting and releasing locks. Locking protocols restrict the
set of possible schedules.


Pitfalls of Lock-Based Protocols
Consider the partial schedule

 Neither T3 nor T4 can make progress — executing lock-S(B)
causes T4 to wait for T3 to release its lock on B, while executing
lock-X(A) causes T3 to wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
 To handle a deadlock one of T3 or T4 must
be rolled back
and its locks released.

Pitfalls of Lock-Based Protocols
(Cont.)
 The potential for deadlock exists in
most locking protocols. Deadlocks are
a necessary evil.
 Starvation is also possible if
concurrency control manager is badly
designed. For example:
 A transaction may be waiting for an X-lock
on an item, while a sequence of other
transactions request and are granted an S-
lock on the same item.
 The same transaction is repeatedly rolled

The Two-Phase Locking
Protocol
 This is a protocol which ensures
conflict-serializable schedules.
 Phase 1: Growing Phase
 transaction may obtain locks
 transaction may not release locks
 Phase 2: Shrinking Phase
 transaction may release locks
 transaction may not obtain locks

 The protocol assures serializability. It
can be proved that the transactions

The Two-Phase Locking Protocol
(Cont.)

 Two-phase locking does not ensure
freedom from deadlocks
 Cascading roll-back is possible under
two-phase locking. To avoid this,
follow a modified protocol called strict
two-phase locking. Here a transaction
must hold all its exclusive locks till it
commits/aborts.
 Rigorous two-phase locking is even
stricter: here all locks are held till

The Two-Phase Locking Protocol
(Cont.)

 There can be conflict serializable
schedules that cannot be obtained if
two-phase locking is used.
 However, in the absence of extra
information (e.g., ordering of access
to data), two-phase locking is needed
for conflict serializability in the
following sense:
Given a transaction Ti that does not
follow two-phase locking, we can find
a transaction T that uses two-phase

Lock Conversions
 Two-phase locking with lock
conversions:
– First Phase:
 can acquire a lock-S on item
 can acquire a lock-X on item
 can convert a lock-S to a lock-X (upgrade)

– Second Phase:
 can release a lock-S
 can release a lock-X
 can convert a lock-X to a lock-S
(downgrade)

Automatic Acquisition of Locks
 A transaction Ti issues the standard
read/write instruction, without explicit
locking calls.
 The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else
begin
if necessary wait
until no other

Automatic Acquisition of Locks

(Cont.)on D
write(D) is processed as:
if T has a lock-X
i
then
write(D)
else
begin
if necessary wait until no other trans. has any lock on D,
if Ti has a lock-S on D
then
upgrade lock on D to lock-X
else
grant Ti a lock-X on D
write(D)
end;
 All locks are released after commit or abort

Implementation of Locking

 A Lock manager can be implemented
as a separate process to which
transactions send lock and unlock
requests
 The lock manager replies to a lock
request by sending a lock grant
messages (or a message asking the
transaction to roll back, in case of a
deadlock)
 The requesting transaction waits until

Lock Table  Black rectangles indicate
granted locks, white ones
indicate waiting requests
 Lock table also records the type
of lock granted or requested
 New request is added to the
end of the queue of requests
for the data item, and granted if
it is compatible with all earlier
locks
 Unlock requests result in the
request being deleted, and later
requests are checked to see if
they can now be granted
 If transaction aborts, all waiting
or granted requests of the
transaction are deleted
 lock manager may
keep a list of locks
held by each
transaction, to

Graph-Based Protocols

 Graph-based protocols are an
alternative to two-phase locking
 Impose a partial ordering on the set
D = {d1, d2 ,..., dh} of all data items.
 If di dj then any transaction accessing
both di and dj must access di before
accessing dj.
 Implies that the set D may now be viewed
as a directed acyclic graph, called a
database graph.
 The tree-protocol is a simple kind of

Tree Protocol

 Only exclusive locks are allowed.
 The first lock by Ti may be on any data
item. Subsequently, a data Q can be
locked by Ti only if the parent of Q is
currently locked by Ti.

Graph-Based Protocols (Cont.)
 The tree protocol ensures conflict
serializability as well as freedom from
deadlock.
 Unlocking may occur earlier in the
tree-locking protocol than in the two-
phase locking protocol.
 shorter waiting times, and increase in
concurrency
 protocol is deadlock-free, no rollbacks are
required
 the abort of a transaction can still lead to
cascading rollbacks.
(this correction has to be made in the

Timestamp-Based Protocols
 Each transaction is issued a timestamp
when it enters the system. If an old
transaction Ti has time-stamp TS(Ti), a
new transaction Tj is assigned time-
stamp TS(Tj) such that TS(Ti) <TS(Tj).
 The protocol manages concurrent
execution such that the time-stamps
determine the serializability order.
 In order to assure such behavior, the
protocol maintains for each data Q two
timestamp values:

(Cont.)
 The timestamp ordering protocol
ensures that any conflicting read and
write operations are executed in
timestamp order.
 Suppose a transaction Ti issues a
read(Q)
1. If TS(Ti) W-timestamp(Q), then Ti
needs to read a value of Q
that was already overwritten. Hence,
the read operation is

(Cont.)
 Suppose that transaction Ti issues
write(Q).
 If TS(Ti) < R-timestamp(Q), then the
value of Q that Ti is producing was
needed previously, and the system
assumed that that value would never
be produced. Hence, the write
operation is rejected, and Ti is rolled
back.
 If TS(Ti) < W-timestamp(Q), then Ti is
attempting to write an obsolete value

Example Use of the Protocol
A partial schedule for several data items
for transactions with
timestamps 1, 2, 3,3 4, 5 T4
T1 T2 T T5
read(X)
read(Y)
read(Y)
write(Y)
write(Z)
read(Z)
read(X)
abort
read(X)
write(Z)
abort
write(Y)
write(Z)

Correctness of Timestamp-

Ordering Protocol protocol
The timestamp-ordering
guarantees serializability since all the
arcs in the precedence graph are of the
form: smaller
transaction transaction
with with larger
timestamp timestamp

Thus, there will be no cycles in the
precedence graph

Recoverability and Cascade
Freedom
 Problem with timestamp-ordering
protocol:
 Suppose Ti aborts, but Tj has read a data
item written by Ti
 Then Tj must abort; if Tj had been allowed
to commit earlier, the schedule is not
recoverable.
 Further, any transaction that has read a
data item written by Tj must abort
 This can lead to cascading rollback ---
that is, a chain of rollbacks

Thomas‘ Write Rule

 Modified version of the timestamp-
ordering protocol in which obsolete
write operations may be ignored under
certain circumstances.
 When Ti attempts to write data item Q,
if TS(Ti) < W-timestamp(Q), then Ti is
attempting to write an obsolete value
of {Q}. Hence, rather than rolling back
Ti as the timestamp ordering protocol
would have done, this {write} operation

Validation-Based Protocol
 Execution of transaction Ti is done in
three phases.
1. Read and execution phase:
Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti
performs a ``validation test''
to determine if local variables can
be written without violating
serializability.
3. Write phase: If Ti is validated, the
updates are applied to the
database; otherwise, Ti is rolled back.
 The three phases of concurrently

Validation-Based Protocol
(Cont.)
 Each transaction Ti has 3 timestamps
Start(Ti) : the time when Ti started its
execution
Validation(Ti): the time when Ti
entered its validation phase
Finish(Ti) : the time when Ti finished
its write phase
 Serializability order is determined by
timestamp given at validation time, to
increase concurrency. Thus TS(Ti) is

Validation Test for Transaction
Tj
 If for all Ti with TS (Ti) < TS (Tj) either
one of the following condition holds:
 finish(Ti) < start(Tj)
 start(Tj) < finish(Ti) < validation(Tj) and the
set of data items written by Ti does not
intersect with the set of data items read by
Tj .
then validation succeeds and Tj can
be committed. Otherwise, validation
fails and Tj is aborted.
 Justification: Either first condition is

Schedule Produced by
Validation
 Example of schedule produced using
validation
T14 T15
read(B)
read(B)
B:- B-50
read(A)
A:- A+50
read(A)
(validate)
display (A+B)
(validate)
write (B)
write (A)

Multiple Granularity

 Allow data items to be of various
sizes and define a hierarchy of data
granularities, where the small
granularities are nested within larger
ones
 Can be represented graphically as a
tree (but don't confuse with tree-
locking protocol)
 When a transaction locks a node in the
tree explicitly, it implicitly locks all the
node's descendents in the same mode.

Example of Granularity
Hierarchy

The highest level in the example

Intention Lock Modes

 In addition to S and X lock modes,
there are three additional lock modes
with multiple granularity:
 intention-shared (IS): indicates explicit
locking at a lower level of the tree but only
with shared locks.
 intention-exclusive (IX): indicates explicit
locking at a lower level with exclusive or
shared locks
 shared and intention-exclusive (SIX): the
subtree rooted by that node is locked
explicitly in shared mode and explicit

Compatibility Matrix with
Intention Lock Modes
 The compatibility matrix for all lock
modes is: IS IX S S IX X
IS    

IX  

S  

S IX 

X

 Multiple Granularity Locking
Transaction T can lock a node Q, using the following rules:
i

Scheme
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in
any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent
of Q is currently locked by Ti in either IX or IS
mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the
parent of Q is currently locked by Ti in either IX
or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node
(that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are
currently locked by Ti.
 Observe that locks are acquired in root-to-leaf order,
whereas they are released in leaf-to-root order.

Multiversion Schemes

 Multiversion schemes keep old
versions of data item to increase
concurrency.
 Multiversion Timestamp Ordering
 Multiversion Two-Phase Locking
 Each successful write results in the
creation of a new version of the data
item written.
 Use timestamps to label versions.
 When a read(Q) operation is issued,

Multiversion Timestamp
Ordering
 Each data item Q has a sequence of
versions <Q1, Q2,...., Qm>. Each
version Qk contains three data fields:
 Content -- the value of version Qk.
 W-timestamp(Qk) -- timestamp of the
transaction that created (wrote) version Qk
 R-timestamp(Qk) -- largest timestamp of a
transaction that successfully read version
Qk
 when a transaction Ti creates a new
version Qk of Q, Qk's W-timestamp and

Multiversion Timestamp
Ordering (Cont)timestamp scheme
 The multiversion
presented next ensures serializability.
 Suppose that transaction Ti issues a
read(Q) or write(Q) operation. Let Qk
denote the version of Q whose write
timestamp is the largest write timestamp
less than or equal to TS(Ti).
1. If transaction Ti issues a read(Q), then
the value returned is the
content of version Qk.
2. If transaction Ti issues a write(Q), and
if TS(Ti) < R-

Multiversion Two-Phase
 Locking
Differentiates between read-only
transactions and update transactions
 Update transactions acquire read and
write locks, and hold all locks up to
the end of the transaction. That is,
update transactions follow rigorous
two-phase locking.
 Each successful write results in the
creation of a new version of the data item
written.
 each version of a data item has a single
timestamp whose value is obtained from a

Multiversion Two-Phase
When an update transaction wants to

Locking (Cont.)
read a data item, it obtains a shared
lock on it, and reads the latest version.
 When it wants to write an item, it
obtains X lock on; it then creates a
new version of the item and sets this
version's timestamp to .
 When update transaction Ti completes,
commit processing occurs:
 Ti sets timestamp on the versions it has
created to ts-counter + 1
 Ti increments ts-counter by 1
 Read-only transactions that start after

Deadlock Handling

 Consider the following two
transactions:
T1: write (X) T2 :
write(Y) T1 T2

write(Y)
lock-X on X
write(Xwrite (X)
) lock-X on Y
 Schedule with deadlock (X) write
wait for lock-X on X
wait for lock-X on Y

Deadlock Handling

 System is deadlocked if there is a set
of transactions such that every
transaction in the set is waiting for
another transaction in the set.
 Deadlock prevention protocols ensure
that the system will never enter into a
deadlock state. Some prevention
strategies :
 Require that each transaction locks all its
data items before it begins execution
(predeclaration).

More Deadlock Prevention
 Strategies
Following schemes use transaction
timestamps for the sake of deadlock
prevention alone.
 wait-die scheme — non-preemptive
 older transaction may wait for younger
one to release data item. Younger
transactions never wait for older ones;
they are rolled back instead.
 a transaction may die several times before
acquiring needed data item
 wound-wait scheme — preemptive
 older transaction wounds (forces rollback)

Deadlock prevention (Cont.)

 Both in wait-die and in wound-wait
schemes, a rolled back transactions is
restarted with its original timestamp.
Older transactions thus have
precedence over newer ones, and
starvation is hence avoided.
 Timeout-Based Schemes :
 a transaction waits for a lock only for a
specified amount of time. After that, the
wait times out and the transaction is rolled
back.

Deadlock Detection

 Deadlocks can be described as a wait-
for graph, which consists of a pair G =
(V,E),
 V is a set of vertices (all the transactions in
the system)
 E is a set of edges; each element is an
ordered pair Ti Tj.
 If Ti Tj is in E, then there is a
directed edge from Ti to Tj, implying
that Ti is waiting for Tj to release a
data item.

Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycle

Deadlock Recovery

 When deadlock is detected :
 Some transaction will have to rolled back
(made a victim) to break deadlock. Select
that transaction as victim that will incur
minimum cost.
 Rollback -- determine how far to roll back
transaction
 Total rollback: Abort the transaction and
then restart it.
 More effective to roll back transaction only
as far as necessary to break deadlock.
 Starvation happens if same transaction is

Insert and Delete Operations

 If two-phase locking is used :
 A delete operation may be performed only
if the transaction deleting the tuple has an
exclusive lock on the tuple to be deleted.
 A transaction that inserts a new tuple into
the database is given an X-mode lock on
the tuple
 Insertions and deletions can lead to
the phantom phenomenon.
 A transaction that scans a relation (e.g.,
find all accounts in Perryridge) and a

Insert and Delete Operations
 The transaction scanning the relation is
(Cont.) information that indicates what
reading
tuples the relation contains, while a
transaction inserting a tuple updates the
same information.
 The information should be locked.
 One solution:
 Associate a data item with the relation, to
represent the information about what tuples
the relation contains.
 Transactions scanning the relation acquire a
shared lock in the data item,
 Transactions inserting or deleting a tuple
acquire an exclusive lock on the data item.

Index Locking Protocol

 Every relation must have at least one
index. Access to a relation must be
made only through one of the indices
on the relation.
 A transaction Ti that performs a
lookup must lock all the index buckets
that it accesses, in S-mode.
 A transaction Ti may not insert a tuple
ti into a relation r without updating all
indices to r.

Weak Levels of Consistency
 Degree-two consistency: differs from
two-phase locking in that S-locks may
be released at any time, and locks may
be acquired at any time
 X-locks must be held till end of
transaction
 Serializability is not guaranteed,
programmer must ensure that no
erroneous database state will occur]
 Cursor stability:
 For reads, each tuple is locked, read, and
lock is immediately released

Weak Levels of Consistency in
 SQLallows non-serializable executions
SQL
 Serializable: is the default
 Repeatable read: allows only committed
records to be read, and repeating a read
should return the same value (so read
locks should be retained)
 However, the phantom phenomenon need
not be prevented
 T1 may see some records inserted by T2, but
may not see others inserted by T2
 Read committed: same as degree two
consistency, but most systems implement
it as cursor-stability

Concurrency in Index Structures
 Indices are unlike other database
items in that their only job is to help in
accessing data.
 Index-structures are typically accessed
very often, much more than other
database items.
 Treating index-structures like other
database items leads to low
concurrency. Two-phase locking on
an index may result in transactions
executing practically one-at-a-time.
 It is acceptable to have nonserializable
concurrent access to an index as long

Concurrency in Index Structures
 (Cont.)
Example of index concurrency
protocol:
 Use crabbing instead of two-phase
locking on the nodes of the B+-tree, as
follows. During
search/insertion/deletion:
 First lock the root node in shared mode.
 After locking all required children of a
node in shared mode, release the lock on
the node.
 During insertion/deletion, upgrade leaf
node locks to exclusive mode.

Partial Schedule Under Two-
Phase Locking

Incomplete Schedule With a Lock
Conversion

Tree-Structured Database
Graph

Serializable Schedule Under the
Tree Protocol

Schedule 5, A Schedule Produced by Using Validation

Nonserializable Schedule with Degree-Two Consistency

B+-Tree For account File with n
= 3.

Insertion of ―Clearview‖ Into the B+-Tree of Figure 16.21

Chapter 17: Recovery System
 Failure Classification
 Storage Structure
 Recovery and Atomicity
 Log-Based Recovery
 Shadow Paging
 Recovery With Concurrent Transactions
 Buffer Management
 Failure with Loss of Nonvolatile
Storage
 Advanced Recovery Techniques
 ARIES Recovery Algorithm

Failure Classification
 Transaction failure :
 Logical errors: transaction cannot
complete due to some internal error
condition
 System errors: the database system must
terminate an active transaction due to an
error condition (e.g., deadlock)
 System crash: a power failure or other
hardware or software failure causes
the system to crash.
 Fail-stop assumption: non-volatile storage
contents are assumed to not be corrupted
by system crash
 Database systems have numerous integrity

Recovery Algorithms

 Recovery algorithms are techniques to
ensure database consistency and
transaction atomicity and durability
despite failures
 Focus of this chapter
 Recovery algorithms have two parts
1. Actions taken during normal transaction
processing to ensure enough information
exists to recover from failures
2. Actions taken after a failure to recover the
database contents to a state that ensures

Storage Structure
 Volatile storage:
 does not survive system crashes
 examples: main memory, cache memory
 Nonvolatile storage:
 survives system crashes
 examples: disk, tape, flash memory,
non-volatile (battery backed
up) RAM
 Stable storage:
 a mythical form of storage that survives all
failures
 approximated by maintaining multiple

Stable-Storage Implementation
 Maintain multiple copies of each block on separate disks
 copies can be at remote sites to protect
against disasters such as fire or flooding.
 Failure during data transfer can still result in inconsistent copies:
Block transfer can result in
 Successful completion
 Partial failure: destination block has
incorrect information
 Total failure: destination block was never
updated
 Protecting storage media from failure during data transfer (one
solution):
 Execute output operation as follows
(assuming two copies of each block):

Stable-Storage Implementation
(Cont.)
 Protecting storage media from failure
during data transfer (cont.):
 Copies of a block may differ due to failure
during output operation. To recover from
failure:
1. First find inconsistent blocks:
1. Expensive solution: Compare the two copies of
every disk block.
2. Better solution:
 Record in-progress disk writes on non-volatile
storage (Non-volatile RAM or special area of
disk).
 Use this information during recovery to find

Data Access
 Physical blocks are those blocks
residing on the disk.
 Buffer blocks are the blocks residing
temporarily in main memory.
 Block movements between disk and
main memory are initiated through the
following two operations:
 input(B) transfers the physical block B to
main memory.
 output(B) transfers the buffer block B to
the disk, and replaces the appropriate
physical block there.
 Each transaction Ti has its private

Data Access (Cont.)
 Transaction transfers data items
between system buffer blocks and its
private work-area using the following
operations :
 read(X) assigns the value of data item X to
the local variable xi.
 write(X) assigns the value of local variable
xi to data item {X} in the buffer block.
 both these commands may necessitate the
issue of an input(BX) instruction before the
assignment, if the block BX in which X
resides is not already in memory.
 Transactions
 Perform read(X) while accessing X for the

Example of Data Access
buffer
Buffer Block A x input(A)

Buffer Block B Y A
output(B) B
read(X)
write(Y)

x2 disk
x1
y1

work area work area
of T1 of T2
memory

Recovery and Atomicity
 Modifying the database without
ensuring that the transaction will
commit may leave the database in an
inconsistent state.
 Consider transaction Ti that transfers
$50 from account A to account B; goal
is either to perform all database
modifications made by Ti or none at
all.
 Several output operations may be
required for Ti (to output A and B). A
failure may occur after one of these

Recovery and Atomicity (Cont.)

 To ensure atomicity despite failures,
we first output information describing
the modifications to stable storage
without modifying the database itself.
 We study two approaches:
 log-based recovery, and
 shadow-paging
 We assume (initially) that transactions
run serially, that is, one after the
other.


Log-Based Recovery
A log is kept on stable storage.
 The log is a sequence of log records, and
maintains a record of update activities on
the database.
 When transaction Ti starts, it registers itself by writing a
<Ti start>log record
 Before Ti executes write(X), a log record <Ti, X, V1, V2> is
written, where V1 is the value of X before the write, and V2 is
the value to be written to X.
 Log record notes that Ti has performed a
write on data item Xj Xj had value V1
before the write, and will have value V2
after the write.
 When Ti finishes it last statement, the log record <Ti commit>
is written.
 We assume for now that log records are written directly to

Deferred Database Modification
 The deferred database modification
scheme records all modifications to
the log, but defers all the writes to
after partial commit.
 Assume that transactions execute
serially
 Transaction starts by writing <Ti
start> record to log.
 A write(X) operation results in a log
record <Ti, X, V> being written, where
V is the new value for X
 Note: old value is not needed for this

Deferred Database Modification
During recovery after a crash, a

(Cont.) needs to be redone if and
transaction
only if both <Ti start> and<Ti
commit> are there in the log.
 Redoing a transaction Ti ( redoTi) sets
the value of all data items updated by
the transaction to the new values.
 Crashes can occur while
 the transaction is executing the original
updates, or
 while recovery action is being taken
 example transactions T0 and T1 (T0
executes before T1):


Deferred DatabaseitModification
Below we show the log as appears at
(Cont.)
three instances of time.

 If log on stable storage at time of crash
is as in case:

Immediate Database
 Modification
The immediate database modification
scheme allows database updates of an
uncommitted transaction to be made
as the writes are issued
 since undoing may be needed, update logs
must have both old value and new value
 Update log record must be written
before database item is written
 We assume that the log record is output
directly to stable storage
 Can be extended to postpone log record
output, so long as prior to execution of an

Immediate Database Modification Example

Log Write Output

<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start> x1
<T1, C, 700, 600>
C = 600
BB, BC
<T1 commit>
BA
 Note: BX denotes block containing X.

Immediate Database
Recovery procedure has two

Modification of one:
operations instead
(Cont.)
 undo(Ti) restores the value of all data
items updated by Ti to their old values,
going backwards from the last log record
for Ti
 redo(Ti) sets the value of all data items
updated by Ti to the new values, going
forward from the first log record for Ti
 Both operations must be idempotent
 That is, even if the operation is executed
multiple times the effect is the same as if
it is executed once
 Needed since operations may get re-

Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600

Checkpoints
 Problems in recovery procedure as
discussed earlier :
1. searching the entire log is time-
consuming
2. we might unnecessarily redo transactions
which have already
3. output their updates to the database.
 Streamline recovery procedure by
periodically performing checkpointing
1. Output all log records currently residing in
main memory onto stable storage.
2. Output all modified buffer blocks to the

Checkpoints (Cont.)
 During recovery we need to consider
only the most recent transaction Ti
that started before the checkpoint, and
transactions that started after Ti.
1. Scan backwards from end of log to find
the most recent <checkpoint> record
2. Continue scanning backwards till a record
<Ti start> is found.
3. Need only consider the part of log
following above start record. Earlier part
of log can be ignored during recovery, and
can be erased whenever desired.
4. For all transactions (starting from Ti or

Example of Checkpoints
Tc Tf
T1
T2
T3
T4

checkpoint system failure

 T1 can be ignored (updates already
output to disk due to checkpoint)

Shadow Paging
 Shadow paging is an alternative to log-
based recovery; this scheme is useful if
transactions execute serially
 Idea: maintain two page tables during
the lifetime of a transaction –the current
page table, and the shadow page table
 Store the shadow page table in
nonvolatile storage, such that state of
the database prior to transaction
execution may be recovered.
 Shadow page table is never modified during
execution
 To start with, both the page tables are

Example of page tables after write to page 4
Shadow and current Shadow Paging

Shadow Paging: (Cont.)
 To commit a transaction
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as
follows:
 keep a pointer to the shadow page table at
a fixed (known) location on disk.
 to make the current page table the new
shadow page table, simply update the
pointer to point to current page table on
disk
 Once pointer to shadow page table has been written, transaction
is committed.
 No recovery is needed after a crash — new transactions can start
right away, using the shadow page table.
 Pages not pointed to from current/shadow page table should be

Shadow Paging (Cont.)
 Advantages of shadow-paging over
log-based schemes
 no overhead of writing log records
 recovery is trivial
 Disadvantages :
 Copying the entire page table is very
expensive
 Can be reduced by using a page table
structured like a B+-tree
 No need to copy entire tree, only need to copy
paths in the tree that lead to updated leaf nodes
 Commit overhead is high even with above
extension
 Need to flush every updated page, and page
table

Recovery With Concurrent
Transactions
 We modify the log-based recovery schemes to allow multiple
transactions to execute concurrently.
 All transactions share a single disk buffer
and a single log
 A buffer block can have data items updated
by one or more transactions
 We assume concurrency control using strict two-phase locking;
 i.e. the updates of uncommitted
transactions should not be visible to other
transactions
 Otherwise how to perform undo if T1
updates A, then T2 updates A and commits,
and finally T1 has to abort?
 Logging is done as described earlier.

 Checkpoints are performed as before,
except that the checkpoint log record is
now of the form
< checkpoint L>
where L is the list of transactions active
at the time of the checkpoint
 We assume no updates are in progress while
the checkpoint is carried out (will relax this
later)
 When the system recovers from a crash, it
first does the following:
1. Initialize undo-list and redo-list to empty
2. Scan the log backwards from the end,
stopping when the first <checkpoint L>

 At this point undo-list consists of
incomplete transactions which must be
undone, and redo-list consists of
finished transactions that must be
redone.
 Recovery now continues as follows:
1. Scan log backwards from most recent
record, stopping when
<Ti start> records have been encountered
for every Ti in undo-list.
 During the scan, perform undo for each log
record that belongs to a transaction in
undo-list.

Example of Recovery
 Go over the steps of the recovery
algorithm on the following log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
<T1 start>
<T1, B, 0, 10>
<T2 start> /* Scan in Step 4 stops here
*/
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>

Log Record Buffering
 Log record buffering: log records are
buffered in main memory, instead of of
being output directly to stable storage.
 Log records are output to stable storage
when a block of log records in the buffer is
full, or a log force operation is executed.
 Log force is performed to commit a
transaction by forcing all its log records
(including the commit record) to stable
storage.
 Several log records can thus be output
using a single output operation,

Log Record Buffering (Cont.)

 The rules below must be followed if
log records are buffered:
 Log records are output to stable storage
in the order in which they are created.
 Transaction Ti enters the commit state
only when the log record
<Ti commit> has been output to stable
storage.
 Before a block of data in main memory is
output to the database, all log records
pertaining to data in that block must have
been output to stable storage.

Database Buffering of data blocks
 Database maintains an in-memory buffer

 When a new block is needed, if buffer is full
an existing block needs to be removed from
buffer
 If the block chosen for removal has been
updated, it must be output to disk
 As a result of the write-ahead logging rule, if a block with
uncommitted updates is output to disk, log records with undo
information for the updates are output to the log on stable storage
first.
 No updates should be in progress on a block when it is output to
disk. Can be ensured as follows.
 Before writing a data item, transaction
acquires exclusive lock on block containing
the data item

Buffer Management (Cont.)
 Database buffer can be implemented
either
 in an area of real main-memory reserved
for the database, or
 in virtual memory
 Implementing buffer in reserved main-
memory has drawbacks:
 Memory is partitioned before-hand
between database buffer and applications,
limiting flexibility.
 Needs may change, and although
operating system knows best how memory
should be divided up at any time, it cannot

Buffer Management (Cont.)
 Database buffers are generally
implemented in virtual memory in
spite of some drawbacks:
 When operating system needs to evict a
page that has been modified, to make
space for another page, the page is written
to swap space on disk.
 When database decides to write buffer
page to disk, buffer page may be in swap
space, and may have to be read from
swap space on disk and output to the
database on disk, resulting in extra I/O!
 Known as dual paging problem.

Failure with Loss of Nonvolatile


Storage
So far we assumed no loss of non-volatile storage
Technique similar to checkpointing used to deal with loss of
non-volatile storage
 Periodically dump the entire content of the
database to stable storage
 No transaction may be active during the
dump procedure; a procedure similar to
checkpointing must take place
 Output all log records currently residing in
main memory onto stable storage.
 Output all buffer blocks onto the disk.
 Copy the contents of the database to stable
storage.
 Output a record <dump> to log on stable
storage.

Advanced Recovery Techniques
 Support high-concurrency locking
techniques, such as those used for B+-
tree concurrency control
 Operations like B+-tree insertions and
deletions release locks early.
 They cannot be undone by restoring old
values (physical undo), since once a lock is
released, other transactions may have
updated the B+-tree.
 Instead, insertions (resp. deletions) are
undone by executing a deletion (resp.
insertion) operation (known as logical
undo).
 For such operations, undo log records

Advanced Recovery done as follows:
 Operation logging is Techniques
(Cont.) operation starts, log <Ti, Oj,
1. When
operation-begin>. Here Oj is a unique
identifier of the operation instance.
2. While operation is executing, normal log
records with physical redo and physical
undo information are logged.
3. When operation completes, <Ti, Oj,
operation-end, U> is logged, where U
contains information needed to perform a
logical undo information.
 If crash/rollback occurs before
operation completes:
 the operation-end log record is not found,
and

(Cont.)
Rollback of transaction Ti is done as
follows:
 Scan the log backwards
1. If a log record <Ti, X, V1, V2> is found,
perform the undo and log a special redo-
only log record <Ti, X, V1>.
2. If a <Ti, Oj, operation-end, U> record is
found
 Rollback the operation logically using the
undo information U.
 Updates performed during roll back are
logged just like during normal operation
execution.

 (Cont.)
Scan the log backwards (cont.):
3. If a redo-only record is found ignore it
4. If a <Ti, Oj, operation-abort> record is
found:
 skip all preceding log records for Ti until
the record
<Ti, Oj, operation-begin> is found.
5. Stop the scan when the record <Ti, start>
is found
6. Add a <Ti, abort> record to the log
Some points to note:
 Cases 3 and 4 above can occur only if

Advanced Recovery
Techniques(Cont,)
The following actions are taken when
recovering from system crash
1. Scan log forward from last <
checkpoint L> record
1. Repeat history by physically redoing all
updates of all transactions,
2. Create an undo-list during the scan as
follows
 undo-list is set to L initially
 Whenever <Ti start> is found Ti is added to
undo-list
 Whenever <Ti commit> or <Ti abort> is
found, Ti is deleted from undo-list

(Cont.)
Recovery from system crash (cont.)
2. Scan log backwards, performing undo
on log records of transactions found in
undo-list.
 Transactions are rolled back as described
earlier.
 When <Ti start> is found for a
transaction Ti in undo-list, write a <Ti
abort> log record.
 Stop scan when <Ti start> records have
been found for all Ti in undo-list
 This undoes the effects of incomplete

 (Cont.)
Checkpointing is done as follows:
1. Output all log records in memory to stable
storage
2. Output to disk all modified buffer blocks
3. Output to log on stable storage a <
checkpoint L> record.
Transactions are not allowed to
perform any actions while
checkpointing is in progress.
 Fuzzy checkpointing allows
transactions to progress while the
most time consuming parts of

 Advanced Recovery Techniques
Fuzzy checkpointing is done as follows:

(Cont.)
1. Temporarily stop all updates by
transactions
2. Write a <checkpoint L> log record and
force log to stable storage
3. Note list M of modified buffer blocks
4. Now permit transactions to proceed with
their actions
5. Output to disk all modified buffer blocks
in list M
 blocks should not be updated while being
output
 Follow WAL: all log records pertaining to a
block must be output before the block is

ARIES

 ARIES is a state of the art recovery
method
 Incorporates numerous optimizations to
reduce overheads during normal
processing and to speed up recovery
 The ―advanced recovery algorithm‖ we
studied earlier is modeled after ARIES, but
greatly simplified by removing
optimizations
 Unlike the advanced recovery
algorithm, ARIES
1. Uses log sequence number (LSN) to

ARIES Optimizations
 Physiological redo
 Affected page is physically identified,
action within page can be logical
 Used to reduce logging overheads
 e.g. when a record is deleted and all other
records have to be moved to fill hole
 Physiological redo can log just the record
deletion
 Physical redo would require logging of old and
new values for much of the page
 Requires page to be output to disk
atomically
 Easy to achieve with hardware RAID, also
supported by some disk systems
 Incomplete page output can be detected by
checksum techniques,

ARIES Data Structures
 Log sequence number (LSN) identifies
each log record
 Must be sequentially increasing
 Typically an offset from beginning of log file
to allow fast access
 Easily extended to handle multiple log files
 Each page contains a PageLSN which is
the LSN of the last log record whose
effects are reflected on the page
 To update a page:
 X-latch the pag, and write the log record
 Update the page
 Record the LSN of the log record in PageLSN
 Unlock page

ARIES Data Structures (Cont.)
 Each log record contains LSN of
previous log record of the same
transaction
LSN TransId PrevLSN RedoInfo UndoInfo

 LSN in log record may be implicit
 Special redo-only log record called
compensation log record (CLR) used to
log actions taken during recovery that
never need to be undone
LSN TransID UndoNextLSN RedoInfo
 Also serve the role of operation-abort log
records used in advanced recovery

ARIES Data Structures (Cont.)
 DirtyPageTable
 List of pages in the buffer that have been
updated
 Contains, for each such page
 PageLSN of the page
 RecLSN is an LSN such that log records
before this LSN have already been applied to
the page version on disk
 Set to current end of log when a page is
inserted into dirty page table (just before being
updated)
 Recorded in checkpoints, helps to minimize
redo work
 Checkpoint log record
 Contains:

ARIES Recovery Algorithm
ARIES recovery involves three passes
 Analysis pass: Determines
 Which transactions to undo
 Which pages were dirty (disk version not up
to date) at time of crash
 RedoLSN: LSN from which redo should start
 Redo pass:
 Repeats history, redoing all actions from
RedoLSN
 RecLSN and PageLSNs are used to avoid
redoing actions already reflected on page
 Undo pass:

ARIES Recovery: Analysis

Analysis pass
 Starts from last complete checkpoint
log record
 Reads in DirtyPageTable from log record
 Sets RedoLSN = min of RecLSNs of all
pages in DirtyPageTable
 In case no pages are dirty, RedoLSN =
checkpoint record‘s LSN
 Sets undo-list = list of transactions in
checkpoint log record
 Reads LSN of last log record for each

ARIES Recovery: Analysis (Cont.)
Analysis pass (cont.)
 Scans forward from checkpoint
 If any log record found for transaction not
in undo-list, adds transaction to undo-list
 Whenever an update log record is found
 If page is not in DirtyPageTable, it is added
with RecLSN set to LSN of the update log
record
 If transaction end log record found, delete
transaction from undo-list
 Keeps track of last log record for each
transaction in undo-list
 May be needed for later undo

ARIES Redo Pass

Redo Pass: Repeats history by replaying
every action not already reflected in
the page on disk, as follows:
 Scans forward from RedoLSN.
Whenever an update log record is
found:
1. If the page is not in DirtyPageTable or the
LSN of the log record is less than the
RecLSN of the page in DirtyPageTable,
then skip the log record
2. Otherwise fetch the page from disk. If


ARIES Undo Actions log record
When an undo is performed for an update
 Generate a CLR containing the undo action
performed (actions performed during undo
are logged physicaly or physiologically).
 CLR for record n noted as n‘ in figure below
 Set UndoNextLSN of the CLR to the PrevLSN
value of the update log record
 Arrows indicate UndoNextLSN value
 ARIES supports partial rollback
1 
2 Used e.g. to handle deadlocks6'by rolling back
3 4 4' 3' 5 6 5' 2' 1'
just enough to release reqd. locks
 Figure indicates forward actions after partial
rollbacks
 records 3 and 4 initially, later 5 and 6, then full
rollback

ARIES: Undo Pass
Undo pass
 Performs backward scan on log undoing
all transaction in undo-list
 Backward scan optimized by skipping
unneeded log records as follows:
 Next LSN to be undone for each transaction
set to LSN of last log record for transaction
found by analysis pass.
 At each step pick largest of these LSNs to
undo, skip back to it and undo it
 After undoing a log record
 For ordinary log records, set next LSN to be
undone for transaction to PrevLSN noted in the
log record

Other ARIES Features
 Recovery Independence
 Pages can be recovered independently of
others
 E.g. if some disk pages fail they can be
recovered from a backup while other pages are
being used
 Savepoints:
 Transactions can record savepoints and roll
back to a savepoint
 Useful for complex transactions
 Also used to rollback just enough to release
locks on deadlock

Other ARIES Features (Cont.)

 Fine-grained locking:
 Index concurrency algorithms that permit
tuple level locking on indices can be used
 These require logical undo, rather than
physical undo, as in advanced recovery
algorithm
 Recovery optimizations: For example:
 Dirty page table can be used to prefetch
pages during redo
 Out of order redo is possible:
 redo can be postponed on a page being

Remote Backup Systems
 Remote backup systems provide high
availability by allowing transaction
processing to continue even if the primary
site is destroyed.

Remote Backup Systems (Cont.)
 Detection of failure: Backup site must
detect when primary site has failed
 to distinguish primary site failure from link
failure maintain several communication links
between the primary and the remote backup.
 Transfer of control:
 To take over control backup site first perform
recovery using its copy of the database and all
the long records it has received from the
primary.
 Thus, completed transactions are redone and
incomplete transactions are rolled back.
 When the backup site takes over processing it

 Time to recover: To reduce delay in
takeover, backup site periodically
proceses the redo log records (in
effect, performing recovery from
previous database state), performs a
checkpoint, and can then delete earlier
parts of the log.
 Hot-Spare configuration permits very
fast takeover:
 Backup continually processes redo log
record as they arrive, applying the updates
locally.

 Ensure durability of updates by delaying
transaction commit until update is
logged at backup; avoid this delay by
permitting lower degrees of durability.
 One-safe: commit as soon as
transaction‘s commit log record is
written at primary
 Problem: updates may not arrive at backup
before it takes over.
 Two-very-safe: commit when
transaction‘s commit log record is
written at primary and backup
 Reduces availability since transactions

Portion of the Database Log
Corresponding to T0 and T1

State of the Log and Database

Portion of the System Log

State of System Log and
Database Corresponding to T0
and T1

Chapter 18: Database System
Architectures
 Centralized Systems
 Client--Server Systems
 Parallel Systems
 Distributed Systems
 Network Types

Centralized Systems
 Run on a single computer system and do not
interact with other computer systems.
 General-purpose computer system: one to a few
CPUs and a number of device controllers that are
connected through a common bus that provides
access to shared memory.
 Single-user system (e.g., personal computer or
workstation): desk-top unit, single user, usually
has only one CPU and one or two hard disks; the
OS may support only one user.
 Multi-user system: more disks, more memory,
multiple CPUs, and a multi-user OS. Serve a large
number of users who are connected to the system
vie terminals. Often called server systems.

Client-Server Systems
 Server systems satisfy requests generated
at m client systems, whose general
structure is shown below:

Client-Server Systems (Cont.)
 Database functionality can be divided
into:
 Back-end: manages access structures, query
evaluation and optimization, concurrency
control and recovery.
 Front-end: consists of tools such as forms,
report-writers, and graphical user interface
facilities.
 The interface between the front-end and
the back-end is through SQL or through
an application program interface.

Client-Server Systems (Cont.)
 Advantages of replacing mainframes
with networks of workstations or
personal computers connected to
back-end server machines:
 better functionality for the cost
 flexibility in locating resources and
expanding facilities
 better user interfaces
 easier maintenance
 Server systems can be broadly
categorized into two kinds:

Transaction Servers
 Also called query server systems or
SQL server systems; clients send
requests to the server system where
the transactions are executed, and
results are shipped back to the client.
 Requests specified in SQL, and
communicated to the server through a
remote procedure call (RPC)
mechanism.
 Transactional RPC allows many RPC
calls to collectively form a transaction.

Transaction Server Process
Structure
 A typical transaction server consists of
multiple processes accessing data in
shared memory.
 Server processes
 These receive user queries (transactions),
execute them and send results back
 Processes may be multithreaded, allowing
a single process to execute several user
queries concurrently
 Typically multiple multithreaded server
processes

Transaction Server Processes
(Cont.)
 Log writer process
 Server processes simply add log records to
log record buffer
 Log writer process outputs log records to
stable storage.
 Checkpoint process
 Performs periodic checkpoints
 Process monitor process
 Monitors other processes, and takes
recovery actions if any of the other
processes fail
 E.g. aborting any transactions being executed

Transaction System Processes
(Cont.)

(Cont.)
 Shared memory contains shared data
 Buffer pool
 Lock table
 Log buffer
 Cached query plans (reused if same query
submitted again)
 All database processes can access
shared memory
 To ensure that no two processes are
accessing the same data structure at


To avoid overhead of interprocess communication for lock

(Cont.)
request/grant, each database process operates directly on the lock
table data structure (Section 16.1.4) instead of sending requests to
lock manager process
 Mutual exclusion ensured on the lock table
using semaphores, or more commonly,
atomic instructions
 If a lock can be obtained, the lock table is
updated directly in shared memory
 If a lock cannot be immediately obtained, a
lock request is noted in the lock table and the
process (or thread) then waits for lock to be
granted
 When a lock is released, releasing process
updates lock table to record release of lock,

Data Servers
 Used in LANs, where there is a very
high speed connection between the
clients and the server, the client
machines are comparable in
processing power to the server
machine, and the tasks to be executed
are compute intensive.
 Ship data to client machines where
processing is performed, and then
ship results back to the server
machine.

Data Servers (Cont.)
 Page-Shipping versus Item-Shipping
 Smaller unit of shipping more messages
 Worth prefetching related items along with
requested item
 Page shipping can be thought of as a form
of prefetching
 Locking
 Overhead of requesting and getting locks
from server is high due to message delays
 Can grant locks on requested and
prefetched items; with page shipping,
transaction is granted lock on whole page.
 Locks on a prefetched item can be P{called

Data Servers (Cont.)
 Data Caching
 Data can be cached at client even in
between transactions
 But check that data is up-to-date before it
is used (cache coherency)
 Check can be done when requesting lock on
data item
 Lock Caching
 Locks can be retained by client system even
in between transactions
 Transactions can acquire cached locks
locally, without contacting server

Parallel Systems
 Parallel database systems consist of
multiple processors and multiple disks
connected by a fast interconnection
network.
 A coarse-grain parallel machine
consists of a small number of powerful
processors
 A massively parallel or fine grain
parallel machine utilizes thousands of
smaller processors.
 Two main performance measures:

Speed-Up and Scale-Up
 Speedup: a fixed-sized problem
executing on a small system is given
to a system which is N-times larger.
 Measured by:
speedup = small system elapsed time
large system elapsed time
 Speedup is linear if equation equals N.
 Scaleup: increase the size of both the
problem and the system
 N-times larger system used to perform N-
times larger job

Batch and Transaction Scaleup
 Batch scaleup:
 A single large job; typical of most
database queries and scientific simulation.
 Use an N-times larger computer on N-
times larger problem.
 Transaction scaleup:
 Numerous small queries submitted by
independent users to a shared database;
typical transaction processing and
timesharing systems.
 N-times as many users submitting
requests (hence, N-times as many

Factors Limiting Speedup and
Scaleup
Speedup and scaleup are often sublinear
due to:
 Startup costs: Cost of starting up
multiple processes may dominate
computation time, if the degree of
parallelism is high.
 Interference: Processes accessing
shared resources (e.g.,system bus,
disks, or locks) compete with each
other, thus spending time waiting on
other processes, rather than

Interconnection Network
 Architectures
Bus. System components send data on
and receive data from a single
communication bus;
 Does not scale well with increasing
parallelism.
 Mesh. Components are arranged as
nodes in a grid, and each component
is connected to all adjacent
components
 Communication links grow with growing
number of components, and so scales
better.
 But may require 2 n hops to send

Parallel Database Architectures
 Shared memory -- processors share a
common memory
 Shared disk -- processors share a
common disk
 Shared nothing -- processors share
neither a common memory nor
common disk
 Hierarchical -- hybrid of the above
architectures

Parallel Database Architectures

Shared Memory
 Processors and disks have access to a
common memory, typically via a bus
or through an interconnection
network.
 Extremely efficient communication
between processors — data in shared
memory can be accessed by any
processor without having to move it
using software.
 Downside – architecture is not scalable
beyond 32 or 64 processors since the

Shared Disk
 All processors can directly access all
disks via an interconnection network,
but the processors have private
memories.
 The memory bus is not a bottleneck
 Architecture provides a degree of fault-
tolerance — if a processor fails, the other
processors can take over its tasks since
the database is resident on disks that are
accessible from all processors.
 Examples: IBM Sysplex and DEC
clusters (now part of Compaq) running

Shared Nothing
 Node consists of a processor, memory,
and one or more disks. Processors at
one node communicate with another
processor at another node using an
interconnection network. A node
functions as the server for the data on
the disk or disks the node owns.
 Examples: Teradata, Tandem, Oracle-n
CUBE
 Data accessed from local disks (and
local memory accesses) do not pass

Hierarchical
 Combines characteristics of shared-
memory, shared-disk, and shared-
nothing architectures.
 Top level is a shared-nothing
architecture – nodes connected by an
interconnection network, and do not
share disks or memory with each
other.
 Each node of the system could be a
shared-memory system with a few
processors.


Distributed multiple machines
Data spread over
Systems
(also referred to as sites or nodes.
 Network interconnects the machines
 Data shared by users on multiple
machines

Distributed Databases

 Homogeneous distributed databases
 Same software/schema on all sites, data
may be partitioned among sites
 Goal: provide a view of a single database,
hiding details of distribution
 Heterogeneous distributed databases
 Different software/schema on different
sites
 Goal: integrate existing databases to
provide useful functionality
 Differentiate between local and global

Trade-offs in Distributed
 Systems
Sharing data – users at one site able to
access the data residing at some other
sites.
 Autonomy – each site is able to retain
a degree of control over data stored
locally.
 Higher system availability through
redundancy — data can be replicated
at remote sites, and system can
function even if a site fails.
 Disadvantage: added complexity

Implementation Issues for
Distributed Databases transactions
Atomicity needed even for
that update data at multiple site
 Transaction cannot be committed at one site
and aborted at another
 The two-phase commit protocol (2PC)
used to ensure atomicity
 Basic idea: each site executes transaction
till just before commit, and the leaves final
decision to a coordinator
 Each site must follow decision of
coordinator: even if there is a failure while
waiting for coordinators decision
 To do so, updates of transaction are logged to
stable storage and transaction is recorded as

Network Types
 Local-area networks (LANs) –
composed of processors that are
distributed over small geographical
areas, such as a single building or a
few adjacent buildings.
 Wide-area networks (WANs) –
composed of processors distributed
over a large geographical area.
 Discontinuous connection – WANs,
such as those based on periodic dial-
up (using, e.g., UUCP), that are

Networks Types (Cont.)
 WANs with continuous connection are
needed for implementing distributed
database systems
 Groupware applications such as Lotus
notes can work on WANs with
discontinuous connection:
 Data is replicated.
 Updates are propagated to replicas
periodically.
 No global locking is possible, and copies
of data may be independently updated.
 Non-serializable executions can thus

Chapter 19: Distributed Databases

 Heterogeneous and Homogeneous
Databases
 Distributed Data Storage
 Distributed Transactions
 Commit Protocols
 Concurrency Control in Distributed
Databases
 Availability
 Distributed Query Processing
 Heterogeneous Distributed Databases
 Directory Systems
117
6

Distributed Database System

 A distributed database system consists
of loosely coupled sites that share no
physical component
 Database systems that run on each site
are independent of each other
 Transactions may access data at one or
more sites

117
7

Homogeneous Distributed
 Databases
In a homogeneous distributed
database
 All sites have identical software
 Are aware of each other and agree to
cooperate in processing user requests.
 Each site surrenders part of its autonomy
in terms of right to change schemas or
software
 Appears to user as a single system
 In a heterogeneous distributed
database
 Different sites may use different schemas
and software 117
8

Distributed Data Storage

 Assume relational data model
 Replication
 System maintains multiple copies of data,
stored in different sites, for faster
retrieval and fault tolerance.
 Fragmentation
 Relation is partitioned into several
fragments stored in distinct sites
 Replication and fragmentation can be
combined
 Relation is partitioned into several 117
9

Data Replication

 A relation or fragment of a relation is
replicated if it is stored redundantly in
two or more sites.
 Full replication of a relation is the case
where the relation is stored at all sites.
 Fully redundant databases are those in
which every site contains a copy of the
entire database.

118
0

Data Replication (Cont.)
 Advantages of Replication
 Availability: failure of site containing relation
r does not result in unavailability of r is
replicas exist.
 Parallelism: queries on r may be processed by
several nodes in parallel.
 Reduced data transfer: relation r is available
locally at each site containing a replica of r.
 Disadvantages of Replication
 Increased cost of updates: each replica of
relation r must be updated.
 Increased complexity of concurrency control:
concurrent updates to distinct replicas may 118
1

Data Fragmentation

 Division of relation r into fragments r1,
r2, …, rn which contain sufficient
information to reconstruct relation r.
 Horizontal fragmentation: each tuple
of r is assigned to one or more
fragments
 Vertical fragmentation: the schema for
relation r is split into several smaller
schemas
 All schemas must contain a common
candidate key (or superkey) to ensure 118
2

Horizontal Fragmentation of account
Relation
branch-name account-number balance

Hillside A-305 500
Hillside A-226 336
Hillside A-155 62

account1= branch-name=“Hillside”(account)

branch-name account-number balance

Valleyview A-177 205

account2= branch-name=“Valleyview”(account)

118
3

Vertical Fragmentation of employee-info Relation

branch-name customer-name tuple-id

Hillside Lowman 1
Hillside Camp 2
Valleyview Camp 3
Valleyview Kahn 4
Hillside Kahn 5
Valleyview Kahn 6
Valleyview Green 7
deposit1= branch-name, customer-name, tuple-id(employee-info)
account number balance tuple-id

A-305 500 1
A-226 336 2
A-177 205 3
A-402 10000 4
A-155 62 5
A-408 1123 6
A-639 750 7
deposit2= account-number, balance, tuple-id(employee-info) 118
4

Advantages of Fragmentation
 Horizontal:
 allows parallel processing on fragments of
a relation
 allows a relation to be split so that tuples
are located where they are most frequently
accessed
 Vertical:
 allows tuples to be split so that each part of
the tuple is stored where it is most
frequently accessed
 tuple-id attribute allows efficient joining of
vertical fragments 118
5

Data Transparency
 Data transparency: Degree to which
system user may remain unaware of
the details of how and where the data
items are stored in a distributed system
 Consider transparency issues in
relation to:
 Fragmentation transparency
 Replication transparency
 Location transparency

118
6

Naming of Data Items - Criteria
1. Every data item must have a system-
wide unique name.
2. It should be possible to find the
location of data items efficiently.
3. It should be possible to change the
location of data items transparently.
4. Each site should be able to create
new data items autonomously.

118
7

Centralized Scheme - Name

Server
Structure:
 name server assigns all names
 each site maintains a record of local data
items
 sites ask name server to locate non-local
data items
 Advantages:
 satisfies naming criteria 1-3
 Disadvantages:
 does not satisfy naming criterion 4
 name server is a potential performance 118
8

Use of Aliases
 Alternative to centralized scheme:
each site prefixes its own site
identifier to any name that it generates
i.e., site 17.account.
 Fulfills having a unique identifier, and
avoids problems associated with central
control.
 However, fails to achieve network
transparency.
 Solution: Create a set of aliases for
data items; Store the mapping of
aliases to the real names at each site.
118
9

119
Copyright: Silberschatz, Korth and Sudarhan 0

Distributed Transactions
 Transaction may access data at several
sites.
 Each site has a local transaction
manager responsible for:
 Maintaining a log for recovery purposes
 Participating in coordinating the
concurrent execution of the transactions
executing at that site.
 Each site has a transaction
coordinator, which is responsible for:
 Starting the execution of transactions that 119
1

Transaction System Architecture

119
2

System Failure Modes
 Failures unique to distributed systems:
 Failure of a site.
 Loss of massages
 Handled by network transmission control
protocols such as TCP-IP
 Failure of a communication link
 Handled by network protocols, by routing
messages via alternative links
 Network partition
 A network is said to be partitioned when it
has been split into two or more subsystems
that lack any connection between them
 Note: a subsystem may consist of a single 119
node 3

Commit Protocols
 Commit protocols are used to ensure
atomicity across sites
 a transaction which executes at multiple
sites must either be committed at all the
sites, or aborted at all the sites.
 not acceptable to have a transaction
committed at one site and aborted at
another
 The two-phase commit (2 PC) protocol
is widely used
 The three-phase commit (3 PC)
protocol is more complicated and
119
4

Two Phase Commit Protocol

(2PC) fail-stop model – failed sites
Assumes
simply stop working, and do not cause
any other harm, such as sending
incorrect messages to other sites.
 Execution of the protocol is initiated
by the coordinator after the last step
of the transaction has been reached.
 The protocol involves all the local sites
at which the transaction executed
 Let T be a transaction initiated at site
Si, and let the transaction coordinator 119
5

Phase 1: Obtaining a Decision
 Coordinator asks all participants to
prepare to commit transaction Ti.
 Ci adds the records <prepare T> to the
log and forces log to stable storage
 sends prepare T messages to all sites at
which T executed
 Upon receiving message, transaction
manager at site determines if it can
commit the transaction
 if not, add a record <no T> to the log and
send abort T message to Ci 119

 if the transaction can be committed, then:
6

Phase 2: Recording the Decision
 T can be committed of Ci received a
ready T message from all the
participating sites: otherwise T must
be aborted.
 Coordinator adds a decision record,
<commit T> or <abort T>, to the log
and forces record onto stable storage.
Once the record stable storage it is
irrevocable (even if failures occur)
 Coordinator sends a message to each
participant informing it of the decision 119
7

Handling of Failures - Site
When site Si recovers, it examines its log
Failure
to determine the fate of
transactions active at the time of the
failure.
 Log contain <commit T> record: site
executes redo (T)
 Log contains <abort T> record: site
executes undo (T)
 Log contains <ready T> record: site
must consult Ci to determine the fate
of T. 119
8

Handling of Failures-
Coordinator Failure the commit
 If coordinator fails while
protocol for T is executing then
participating sites must decide on T‘s
fate:
1. If an active site contains a <commit T>
record in its log, then T must be
committed.
2. If an active site contains an <abort T>
record in its log, then T must be aborted.
3. If some active participating site does not
contain a <ready T> record in its log, then
the failed coordinator Ci cannot have
decided to commit T. Can therefore abort 119
9

Handling of Failures - Network
Partition
 If the coordinator and all its
participants remain in one partition,
the failure has no effect on the commit
protocol.
 If the coordinator and its participants
belong to several partitions:
 Sites that are not in the partition
containing the coordinator think the
coordinator has failed, and execute the
protocol to deal with failure of the
coordinator.
 No harm results, but sites may still have to
120
0

Recovery and Concurrency

Control
In-doubt transactions have a <ready
T>, but neither a
<commit T>, nor an <abort T> log
record.
 The recovering site must determine
the commit-abort status of such
transactions by contacting other sites;
this can slow and potentially block
recovery.
 Recovery algorithms can note lock
information in the log. 120
1

Three Phase Commit (3PC)
 Assumptions:
 No network partitioning
 At any point, at least one site must be up.
 At most K sites (participants as well as
coordinator) can fail
 Phase 1: Obtaining Preliminary Decision: Identical to 2PC Phase 1.
 Every site is ready to commit if instructed to
do so
 Phase 2 of 2PC is split into 2 phases, Phase 2 and Phase 3 of 3PC
 In phase 2 coordinator makes a decision as
in 2PC (called the pre-commit decision) and
records it in multiple (at least K) sites
 In phase 3, coordinator sends commit/abort
message to all participating sites,
120
 Under 3PC, knowledge of pre-commit decision can be used to 2

Alternative Models of Transaction
Processing
 Notion of a single transaction
spanning multiple sites is
inappropriate for many applications
 E.g. transaction crossing an
organizational boundary
 No organization would like to permit an
externally initiated transaction to block
local transactions for an indeterminate
period
 Alternative models carry out
transactions by sending messages
 Code to handle messages must be 120
3

Alternative Models (Cont.)
 Motivating example: funds transfer
between two banks
 Two phase commit would have the potential
to block updates on the accounts involved
in funds transfer
 Alternative solution:
 Debit money from source account and send a
message to other site
 Site receives message and credits destination
account
 Messaging has long been used for
distributed transactions (even before
computers were invented!)
 Atomicity issue 120
4

Error Conditions with Persistent
Messaging
 Code to handle messages has to take
care of variety of failure situations
(even assuming guaranteed message
delivery)
 E.g. if destination account does not exist,
failure message must be sent back to
source site
 When failure message is received from
destination site, or destination site itself
does not exist, money must be deposited
back in source account
 Problem if source account has been closed 120
 get humans to take care of problem 5

Persistent Messaging and
Workflows

 Workflows provide a general model of
transactional processing involving
multiple sites and possibly human
processing of certain steps
 E.g. when a bank receives a loan
application, it may need to
 Contact external credit-checking agencies
 Get approvals of one or more managers
and then respond to the loan application
 We study workflows in Chapter 24
(Section 24.2) 120
6

Implementation of Persistent
 Messaging
Sending site protocol
1. Sending transaction writes message to a
special relation messages-to-send. The
message is also given a unique identifier.
 Writing to this relation is treated as any other
update, and is undone if the transaction
aborts.
 The message remains locked until the sending
transaction commits
2. A message delivery process monitors the
messages-to-send relation
 When a new message is found, the message is
sent to its destination
 When an acknowledgment is received from a 120
destination, the message is deleted from 7

Implementation of Persistent
 Receiving site protocol
Messaging
When a message is received
1. it is written to a received-messages relation
if it is not already present (the message id is
used for this check). The transaction
performing the write is committed
2. An acknowledgement (with message id) is
then sent to the sending site.
 There may be very long delays in message
delivery coupled with repeated messages
 Could result in processing of duplicate
messages if we are not careful!
 Option 1: messages are never deleted from
received-messages
 Option 2: messages are given timestamps 120

 Messages older than some cut-off are deleted 8

120

Concurrency Control

 Modify concurrency control schemes
for use in distributed environment.
 We assume that each site participates
in the execution of a commit protocol
to ensure global transaction
automicity.
 We assume all replicas of any item are
updated
 Will see how to relax this in case of site
failures later 121
0

Single-Lock-Manager Approach

 System maintains a single lock
manager that resides in a single
chosen site, say Si
 When a transaction needs to lock a
data item, it sends a lock request to Si
and lock manager determines whether
the lock can be granted immediately
 If yes, lock manager sends a message to
the site which initiated the request
 If no, request is delayed until it can be
granted, at which time a message is sent 121
1

Single-Lock-Manager Approach

(Cont.)
The transaction can read the data item
from any one of the sites at which a
replica of the data item resides.
 Writes must be performed on all
replicas of a data item
 Advantages of scheme:
 Simple implementation
 Simple deadlock handling
 Disadvantages of scheme are:
 Bottleneck: lock manager site becomes a
bottleneck 121
2

Distributed Lock Manager

 In this approach, functionality of
locking is implemented by lock
managers at each site
 Lock managers control access to local
data items
 But special protocols may be used for
replicas
 Advantage: work is distributed and
can be made robust to failures
 Disadvantage: deadlock detection is
more complicated
 Lock managers cooperate for deadlock
121
3

Primary Copy
 Choose one replica of data item to be
the primary copy.
 Site containing the replica is called the
primary site for that data item
 Different data items can have different
primary sites
 When a transaction needs to lock a
data item Q, it requests a lock at the
primary site of Q.
 Implicitly gets lock on all replicas of the
data item 121
4

Majority Protocol
 Local lock manager at each site
administers lock and unlock requests
for data items stored at that site.
 When a transaction wishes to lock an
unreplicated data item Q residing at
site Si, a message is sent to Si ‗s lock
manager.
 If Q is locked in an incompatible mode,
then the request is delayed until it can be
granted.
 When the lock request can be granted, the 121
lock manager sends a message back to the 5

Majority Protocol (Cont.)
 In case of replicated data
 If Q is replicated at n sites, then a lock
request message must be sent to more
than half of the n sites in which Q is
stored.
 The transaction does not operate on Q
until it has obtained a lock on a majority
of the replicas of Q.
 When writing the data item, transaction
performs writes on all replicas.
 Benefit
 Can be used even when some sites are
unavailable 121
6

Biased Protocol
 Local lock manager at each site as in
majority protocol, however, requests
for shared locks are handled
differently than requests for exclusive
locks.
 Shared locks. When a transaction
needs to lock data item Q, it simply
requests a lock on Q from the lock
manager at one site containing a
replica of Q.
 Exclusive locks. When transaction 121
7

Quorum Consensus Protocol

 A generalization of both majority and
biased protocols
 Each site is assigned a weight.
 Let S be the total of all site weights
 Choose two values read quorum Qr
and write quorum Qw
 Such that Qr + Qw > S and 2 * Qw >
S
 Quorums can be chosen (and S computed)
separately for each item 121
8

Deadlock Handling
Consider the following two transactions
and history, with item X and
T1: write (X) T2: write (Y)
transaction T1 at site 1, and item Y (X)
write (Y) write and
transaction T2 at site 2:

X-lock on X
write (X) X-lock on Y
write (Y)
wait for X-lock on X

Wait for X-lock on Y

Result: deadlock which cannot be detected locally at either site
121
9

Centralized Approach
 A global wait-for graph is constructed
and maintained in a single site; the
deadlock-detection coordinator
 Real graph: Real, but unknown, state of
the system.
 Constructed graph:Approximation
generated by the controller during the
execution of its algorithm .
 the global wait-for graph can be
constructed when:
 a new edge is inserted in or removed from
one of the local wait-for graphs.
122
0

Local and Global Wait-For
Graphs
Local

Global

122
1

Example Wait-For Graph for
False Cycles
Initial state:

122
2

False Cycles (Cont.)
 Suppose that starting from the state
shown in figure,
1. T2 releases resources at S1
 resulting in a message remove T1 T2
message from the Transaction Manager at
site S1 to the coordinator)
2. And then T2 requests a resource
held by T3 at site S2
 resulting in a message insert T2 T3 from
S2 to the coordinator
 Suppose further that the insert
message reaches before the delete
message 122
3

Unnecessary Rollbacks
 Unnecessary rollbacks may result
when deadlock has indeed occurred
and a victim has been picked, and
meanwhile one of the transactions was
aborted for reasons unrelated to the
deadlock.
 Unnecessary rollbacks can result from
false cycles in the global wait-for
graph; however, likelihood of false
cycles is low.
122
4

Timestamping
 Timestamp based concurrency-control
protocols can be used in distributed
systems
 Each transaction must be given a
unique timestamp
 Main problem: how to generate a
timestamp in a distributed fashion
 Each site generates a unique local
timestamp using either a logical counter
or the local clock.
 Global unique timestamp is obtained by
concatenating the unique local timestamp 122
with the unique identifier. 5

Timestamping (Cont.)
 A site with a slow clock will assign
smaller timestamps
 Still logically correct: serializability not
affected
 But: ―disadvantages‖ transactions
 To fix this problem
 Define within each site Si a logical clock
(LCi), which generates the unique local
timestamp
 Require that Si advance its logical clock
whenever a request is received from a
transaction Ti with timestamp < x,y> and
122
6

Replication with Weak
Consistency
 Many commercial databases support
replication of data with weak degrees
of consistency (I.e., without a
guarantee of serializabiliy)
 E.g.: master-slave replication:
updates are performed at a single
―master‖ site, and propagated to
―slave‖ sites.
 Propagation is not part of the update
transaction: its is decoupled
 May be immediately after transaction
commits 122
7

Replication with Weak
Consistency (Cont.)

 Replicas should see a transaction-
consistent snapshot of the database
 That is, a state of the database reflecting
all effects of all transactions up to some
point in the serialization order, and no
effects of any later transactions.
 E.g. Oracle provides a create snapshot
statement to create a snapshot of a
relation or a set of relations at a
remote site
 snapshot refresh either by recomputation
122
8

Multimaster Replication

 With multimaster replication (also
called update-anywhere replication)
updates are permitted at any replica,
and are automatically propagated to
all replicas
 Basic model in distributed databases,
where transactions are unaware of the
details of replication, and database
system propagates updates as part of the
same transaction
 Coupled with 2 phase commit 122
9


Lazy Propagation (Cont.)
Two approaches to lazy propagation
 Updates at any replica translated into update
at primary site, and then propagated back to
all replicas
 Updates to an item are ordered serially
 But transactions may read an old value of an
item and use it to perform an update, result in
non-serializability
 Updates are performed at any replica and
propagated to all other replicas
 Causes even more serialization problems:
 Same data item may be updated concurrently at
multiple sites!
 Conflict detection is a problem 123
0

123

Availability

 High availability: time for which
system is not fully usable should be
extremely low (e.g. 99.99%
availability)
 Robustness: ability of system to
function spite of failures of
components
 Failures are more likely in large
distributed systems
 To be robust, a distributed system 123
2

Reconfiguration

 Reconfiguration:
 Abort all transactions that were active at a
failed site
 Making them wait could interfere with other
transactions since they may hold locks on
other sites
 However, in case only some replicas of a
data item failed, it may be possible to
continue transactions that had accessed
data at a failed site (more on this later)
 If replicated data items were at failed site,
update system catalog to remove them
from the list of replicas. 123
3

Reconfiguration (Cont.)

 Since network partition may not be
distinguishable from site failure, the
following situations must be avoided
 Two ore more central servers elected in
distinct partitions
 More than one partition updates a
replicated data item
 Updates must be able to continue
even if some sites are down
 Solution: majority based approach 123

 Alternative of ―read one write all available‖
4

Majority-Based Approach

 The majority protocol for distributed
concurrency control can be modified
to work even if some sites are
unavailable
 Each replica of each item has a version
number which is updated when the replica
is updated, as outlined below
 A lock request is sent to at least ½ the
sites at which item replicas are stored and
operation continues only when a lock is
obtained on a majority of the sites 123
5

Majority-Based Approach
 Majority protocol (Cont.)
 Write operations
 find highest version number like reads, and
set new version number to old highest
version + 1
 Writes are then performed on all locked
replicas and version number on these replicas
is set to new version number
 Failures (network and site) cause no
problems as long as
 Sites at commit contain a majority of replicas
of any updated data items
 During reads a majority of replicas are
available to find version numbers
 Subject to above, 2 phase commit can be used 123
6

Read One Write All (Available)

 Biased protocol is a special case of
quorum consensus
 Allows reads to read any one replica but
updates require all replicas to be available
at commit time (called read one write all)
 Read one write all available (ignoring
failed sites) is attractive, but incorrect
 If failed link may come back up, without a
disconnected site ever being aware that it
was disconnected
 The site then has old values, and a read 123
7

Site Reintegration

 When failed site recovers, it must
catch up with all updates that it
missed while it was down
 Problem: updates may be happening to
items whose replica is stored at the site
while the site is recovering
 Solution 1: halt all updates on system
while reintegrating a site
 Unacceptable disruption
 Solution 2: lock all replicas of all data
items at the site, update to latest version, 123
8

Comparison with Remote
Backup
 Remote backup (hot spare) systems
(Section 17.10) are also designed to
provide high availability
 Remote backup systems are simpler
and have lower overhead
 All actions performed at a single site, and
only log records shipped
 No need for distributed concurrency
control, or 2 phase commit
 Using distributed databases with
replicas of data items can provide
123
9

Coordinator Selection
 Backup coordinators
 site which maintains enough information
locally to assume the role of coordinator if
the actual coordinator fails
 executes the same algorithms and
maintains the same internal state
information as the actual coordinator fails
executes state information as the actual
coordinator
 allows fast recovery from coordinator
failure but involves overhead during
normal processing. 124
0

Bully Algorithm
 If site Si sends a request that is not
answered by the coordinator within a
time interval T, assume that the
coordinator has failed Si tries to elect
itself as the new coordinator.
 Si sends an election message to every
site with a higher identification
number, Si then waits for any of these
processes to answer within T.
 If no response within T, assume that
all sites with number greater than i 124
1

Bully Algorithm (Cont.)
 If no message is sent within T‘,
assume the site with a higher number
has failed; Si restarts the algorithm.
 After a failed site recovers, it
immediately begins execution of the
same algorithm.
 If there are no active sites with higher
numbers, the recovered site forces all
processes with lower numbers to let it
become the coordinator site, even if
there is a currently active coordinator 124
2

124

Distributed Query Processing
 For centralized systems, the primary
criterion for measuring the cost of a
particular strategy is the number of
disk accesses.
 In a distributed system, other issues
must be taken into account:
 The cost of a data transmission over the
network.
 The potential gain in performance from
having several sites process parts of the
query in parallel.
124
4

Query Transformation
 Translating algebraic queries on
fragments.
 It must be possible to construct relation r
from its fragments
 Replace relation r by the expression to
construct relation r from its fragments
 Consider the horizontal fragmentation of
the account relation into
account1 = branch-name = ―Hillside‖(account)
account2 = branch-name = ―Valleyview‖ (account)

 The query branch-name = ―Hillside‖ (account)
becomes 124
5

Example Query (Cont.)

 Since account1 has only tuples
pertaining to the Hillside branch, we
can eliminate the selection operation.
 Apply the definition of account2 to
obtain
branch-name = ―Hillside‖ ( branch-name = ―Valleyview‖
(account)
 This expression is the empty set
regardless of the contents of the
account relation.
 Final strategy is for the Hillside site to
return account1 as the result of the 124
6

Simple Join Processing
 Consider the following relational
algebra expression in which the three
relations are neither replicated nor
fragmented
account depositor branch
 account is stored at site S1
 depositor at S2
 branch at S3
 For a query issued at site SI, the
system needs to produce the result at
site SI 124
7

Possible Query Processing

Strategies three relations to site
Ship copies of all
SI and choose a strategy for
processing the entire locally at site SI.
 Ship a copy of the account relation to
site S2 and compute temp1 = account
depositor at S2. Ship temp1 from S2
to S3, and compute temp2 = temp1
branch at S3. Ship the result temp2 to
SI.
 Devise similar strategies, exchanging
the roles S1, S2, S3 124
8

Semijoin Strategy
 Let r1 be a relation with schema R1
stores at site S1
Let r2 be a relation with schema R2
stores at site S2
 Evaluate the expression r1 r2 and
obtain the result at S1.
1. Compute temp1 R1 R2 (r1) at S1.
 2. Ship temp1 from S1 to S2.
 3. Compute temp2 r2 temp1 at
S2
 4. Ship temp2 from S2 to S1. 124
9

Formal Definition

 The semijoin of r1 with r2, is denoted by:
r1 r2
 it is defined by:
 R1 (r1 r2)
 Thus, r1 r2 selects those tuples of r1 that
contributed to r1 r2.
 In step 3 above, temp2=r2 r1.
 For joins of several relations, the above
strategy can be extended to a series of
semijoin steps.
125
0

Join Strategies that Exploit Parallelism

 Consider r1 r2 r3 r4 where relation
ri is stored at site Si. The result must be
presented at site S1.
 r1 is shipped to S2 and r1 r2 is computed
at S2: simultaneously r3 is shipped to S4 and
r3 r4 is computed at S4
 S2 ships tuples of (r1 r2) to S1 as they
produced;
S4 ships tuples of (r3 r4) to S1 125
1

Heterogeneous Distributed
Databases
 Many database applications require
data from a variety of preexisting
databases located in a heterogeneous
collection of hardware and software
platforms
 Data models may differ
(hierarchical, relational , etc.)
 Transaction commit protocols may be
incompatible
 Concurrency control may be based on 125
2

Advantages
 Preservation of investment in existing
 hardware
 system software
 Applications
 Local autonomy and administrative
control
 Allows use of special-purpose DBMSs
 Step towards a unified homogeneous
DBMS
 Full integration into a homogeneous DBMS
faces
125
3

Unified View of Data
 Agreement on a common data model
 Typically the relational model
 Agreement on a common conceptual
schema
 Different names for same
relation/attribute
 Same relation/attribute name means
different things
 Agreement on a single representation
of shared data
 E.g. data types, precision,
 Character sets 125
4

Query Processing
 Several issues in query processing in a
heterogeneous database
 Schema translation
 Write a wrapper for each data source to
translate data to a global schema
 Wrappers must also translate updates on
global schema to updates on local schema
 Limited query capabilities
 Some data sources allow only restricted
forms of selections
 E.g. web forms, flat file data sources
 Queries have to be broken up and processed 125
5

Mediator Systems

 Mediator systems are systems that
integrate multiple heterogeneous data
sources by providing an integrated
global view, and providing query
facilities on global view
 Unlike full fledged multidatabase systems,
mediators generally do not bother about
transaction processing
 But the terms mediator and multidatabase
are sometimes used interchangeably
 The term virtual database is also used to 125
6

125

Directory Systems
 Typical kinds of directory information
 Employee information such as name, id,
email, phone, office addr, ..
 Even personal information to be accessed
from multiple places
 e.g. Web browser bookmarks
 White pages
 Entries organized by name or identifier
 Meant for forward lookup to find more about
an entry
 Yellow pages
 Entries organized by properties 125
8

Directory Access Protocols

 Most commonly used directory access
protocol:
 LDAP (Lightweight Directory Access
Protocol)
 Simplified from earlier X.500 protocol
 Question: Why not use database
protocols like ODBC/JDBC?
 Answer:
 Simplified protocols for a limited type of
data access, evolved parallel to
ODBC/JDBC
125
 Provide a nice hierarchical naming 9

LDAP:Lightweight Directory
Access Protocol
 LDAP Data Model
 Data Manipulation
 Distributed Directory Trees

126
0

LDAP Data Model

 LDAP directories store entries
 Entries are similar to objects
 Each entry must have unique
distinguished name (DN)
 DN made up of a sequence of relative
distinguished names (RDNs)
 E.g. of a DN
 cn=Silberschatz, ou-Bell
Labs, o=Lucent, c=USA
 Standard RDNs (can be specified as part of 126
1

LDAP Data Model (Cont.)

 Entries can have attributes
 Attributes are multi-valued by default
 LDAP has several built-in types
 Binary, string, time types
 Tel: telephone number PostalAddress:
postal address
 LDAP allows definition of object
classes
 Object classes specify attribute names and
types
 Can use inheritance to define object
126
2

LDAP Data Model (cont.)

 Entries organized into a directory
information tree according to their
DNs
 Leaf level usually represent specific
objects
 Internal node entries represent objects
such as organizational units,
organizations or countries
 Children of a node inherit the DN of the
parent, and add on RDNs
 E.g. internal node with DN c=USA 126
3

LDAP Data Manipulation

 Unlike SQL, LDAP does not define DDL
or DML
 Instead, it defines a network protocol
for DDL and DML
 Users use an API or vendor specific front
ends
 LDAP also defines a file format
 LDAP Data Interchange Format (LDIF)
 Querying mechanism is very simple:
only selection & projection 126
4

LDAP Queries

 LDAP query must specify
 Base: a node in the DIT from where search
is to start
 A search condition
 Boolean combination of conditions on
attributes of entries
 Equality, wild-cards and approximate equality
supported
 A scope
 Just the base, the base and its children, or
the entire subtree from the base
126
 Attributes to be returned 5

LDAP URLs
 First part of URL specifis server and DN
of base
 ldap:://aura.research.bell-
labs.com/o=Lucent,c=USA
 Optional further parts separated by ?
symbol
 ldap:://aura.research.bell-
labs.com/o=Lucent,c=USA??sub?cn=Korth
 Optional parts specify
1. attributes to return (empty means all)
2. Scope (sub indicates entire subtree) 126

3. Search condition (cn=Korth) 6

C Code using LDAP API
#include <stdio.h>
#include <ldap.h>
main( ) {
LDAP *ld;
LDAPMessage *res, *entry;
char *dn, *attr, *attrList [ ] =
{―telephoneNumber‖, NULL};
BerElement *ptr;
int vals, i;
// Open a connection to server
ld = ldap_open(―aura.research.bell- 126
7

C Code using LDAP API (Cont.)
ldap_search_s(ld, ―o=Lucent, c=USA‖, LDAP_
SCOPE_SUBTREE,
―cn=Korth‖, attrList, /* attrsonly*/
0, &res);
/*attrsonly = 1 => return only schema
not actual results*/ printf(―found%d
entries‖, ldap_count_entries(ld, res));
for (entry=ldap_first_entry(ld, res); entry !=
NULL;
entry=ldap_next_entry(id, entry)) {
dn = ldap_get_dn(ld, entry);
printf(―dn: %s‖, dn); /* dn: DN of matching
entry */
ldap_memfree(dn); 126
8

LDAP API (Cont.)

 LDAP API also has functions to create,
update and delete entries
 Each function call behaves as a
separate transaction
 LDAP does not support atomicity of
updates

126
9

Distributed Directory Trees
 Organizational information may be split into multiple directory
information trees
 Suffix of a DIT gives RDN to be tagged onto
to all entries to get an overall DN
 E.g. two DITs, one with suffix o=Lucent,
c=USA
and another with suffix o=Lucent,
c=India
 Organizations often split up DITs based on
geographical location or by organizational
structure
 Many LDAP implementations support
replication (master-slave or multi-master
replication) of DITs (not part of LDAP 3 127

standard) 0

END OF CHAPTER
EXTRA SLIDES (MATERIAL NOT
IN BOOK)

127

Three Phase Commit (3PC)

 Assumptions:
 No network partitioning
 At any point, at least one site must be up.
 At most K sites (participants as well as
coordinator) can fail
 Phase 1: Obtaining Preliminary Decision:
Identical to 2PC Phase 1.
 Every site is ready to commit if instructed to
do so
 Under 2 PC each site is obligated to wait for
decision from coordinator 127
2

Phase 2. Recording the
Preliminary Decision
 Coordinator adds a decision record
(<abort T> or
< precommit T>) in its log and forces
record to stable storage.
 Coordinator sends a message to each
participant informing it of the decision
 Participant records decision in its log
 If abort decision reached then
participant aborts locally
 If pre-commit decision reached then 127
3

Phase 3. Recording Decision in
the Database
Executed only if decision in phase 2 was
to precommit

 Coordinator collects
acknowledgements. It sends <commit
T> message to the participants as
soon as it receives K
acknowledgements.
 Coordinator adds the record <commit
T> in its log and forces record to
stable storage. 127
4

Handling Site Failure
 Site Failure. Upon recovery, a
participating site examines its log and
does the following:
 Log contains <commit T> record: site
executes redo (T)
 Log contains <abort T> record: site
executes undo (T)
 Log contains <ready T> record, but no
<abort T> or <precommit T> record: site
consults Ci to determine the fate of T.
 if Ci says T aborted, site executes undo (T)
(and writes 127
5

Handling Site Failure (Cont.)
 Log contains <precommit T> record,
but no <abort T> or <commit T>: site
consults Ci to determine the fate of T.
 if Ci says T aborted, site executes undo (T)
 if Ci says T committed, site executes redo
(T)
 if Ci says T still in precommit state, site
resumes protocol at this point
 Log contains no <ready T> record for
a transaction T: site executes undo (T)
writes <abort T> record. 127
6

Coordinator – Failure Protocol
1. The active participating sites select a new coordinator, Cnew
2. Cnew requests local status of T from each participating site
3. Each participating site including Cnew determines the local
status of T:
 Committed. The log contains a < commit
T> record
 Aborted. The log contains an <abort T>
record.
 Ready. The log contains a <ready T>
record but no <abort T> or <precommit
T> record
 Precommitted. The log contains a
<precommit T> record but no <abort T>
or <commit T> record.
 Not ready. The log contains neither a
127
7

Coordinator Failure Protocol
(Cont.)
5. C decides either to commit or
new
abort T, or to restart the
three-phase commit protocol:
 Commit state for any one participant
commit
 Abort state for any one participant
abort.
 Precommit state for any one participant
and above 2 cases do not hold
A precommit message is sent to those
participants in the uncertain state.
Protocol is resumed from that point. 127
8

Fully Distributed Deadlock
Detection Scheme
 Each site has local wait-for graph;
system combines information in these
graphs 1to detect deadlock
Site Site 2 Site 3

 Local Wait-for Graphs
T1 T 2 T3 T T
3 4 T 5 T T 5 1

 Global Wait-for Graphs
T1 T2 T3 T4 T5 127
9

Fully Distributed Approach

(Cont.) a transaction runs at a
System model:
single site, and makes requests to
other sites for accessing non-local
data.
 Each site maintains its own local wait-
for graph in the normal fashion: there
is an edge Ti Tj if Ti is waiting on a
lock held by Tj (note: Ti and Tj may be
non-local).
 Additionally, arc Ti Tex exists in the
graph at site Sk if 128
0


(Cont.) Deadlock Detection - all
Centralized
graph edges sent to central deadlock
detector
 Distributed Deadlock Detection - ―path
pushing‖ algorithm
 Path pushing initiated wen a site
detects a local cycle involving Tex,
which indicates possibility of a
deadlock.
 Suppose cycle at site Si is
Tex Ti Tj ... Tn Tex 128
1

Fully Distributed Approach:
Example Site 1
EX(3) T1 T2 T3 EX(2)

Site 2

EX(1) T3 T4 T5 EX(3)

Site 3
EX(2) T5 T1 T3 EX(1)

EX (i): Indicates Tex, plus wait is on/by a transaction at Site i

128
2


Example (Cont.)
Site passes wait-for information along
path in graph:
 Let EX(j) Ti ... Tn EX (k) be a path in
local wait-for graph at Site m
 Site m ―pushes‖ the path information to
site k if i > n
 Example:
 Site 1 does not pass information : 1 > 3
 Site 2 does not pass information : 3 > 5
 Site 3 passes (T5, T1) to Site 1 because:
5>1 128
3


(Cont.) T T EX (1) has been pushed to Site
After the path EX (2) 5 1
1 we have:

Site 1
EX(2) T5 T1 T2 T3 EX(2)

Site 2
EX(1) T3 T4 T5 EX(3)

Site 3
EX(2) T5 T1 EX(1)

128
4


After the push, only Site 1 has new
(Cont.)
edges. Site 1 passes (T5, T1, T2, T3) to
site 2 since 5 > 3 and T3 is waiting for
a data item, at site 2
Site 1
 The new state T T T localEX(2)
EX(2) of the T wait-for
5 1 2 3

graph:
Site 2
T5 T1 T2 T3 T4

Deadlock Detected

Site 3
EX(2) T5 T1 EX(1)
128
5

128

Naming of Replicas and

Fragments each fragment of a
Each replica and
data item must have a unique name.
 Use of postscripts to determine those
replicas that are replicas of the same data
item, and those fragments that are
fragments of the same data item.
 fragments of same data item: ―.f1‖, ―.f2‖, …,
―.fn‖
 replicas of same data item: ―.r1‖, ―.r2‖, …,
―.rn‖
site17.account.f3.r2
refers to replica 2 of fragment 3 of 128
7

Nameappears in the alias table Algorithm
if name - Translation
then expression := map (name)
else expression := name;
function map (n)
if n appears in the replica table
then result := name of replica of n;
if n appears in the fragment table
then begin
result := expression to construct fragment;
for each n‘ in result do begin
replace n‘ in result with map (n‘);
end
end
return result;

128
8

Example of Name - Translation

Scheme Hillside branch (site S ),
A user at the 1
uses the alias local-account for the
local fragment account.f1 of the
account relation.
 When this user references local-
account, the query-processing
subsystem looks up local-account in
the alias table, and replaces local-
account with S1.account.f1.
 If S1.account.f1 is replicated, the
system must consult the replica table 128
9

Transparency and Updates
 Must ensure that all replicas of a data
item are updated and that all affected
fragments are updated.
 Consider the account relation and the
insertion of the tuple:
(―Valleyview‖, A-733, 600)
 Horizontal fragmentation of account
 account1 = branch-name = ―Hillside‖
(account)
 account2 = branch-name =
―Valleyview‖ (account)
129
0

Transparency and Updates
(Cont.)
 Vertical fragmentation of deposit into
deposit1 and deposit2
 The tuple (―Valleyview‖, A-733, ‗Jones‖,
600) must be split into two fragments:
 one to be inserted into deposit1
 one to be inserted into deposit2
 If deposit is replicated, the tuple
(―Valleyview‖, A-733, ―Jones‖ 600) must
be inserted in all replicas
 Problem: If deposit is accessed
concurrently it is possible that one
replica will be updated earlier than
129
1

129

Network Topologies

129
3

Network Topologies (Cont.)

129
4

Network Topology (Cont.)
 A partitioned system is split into two
(or more) subsystems (partitions) that
lack any connection.
 Tree-structured: low installation and
communication costs; the failure of a
single link can partition network
 Ring: At least two links must fail for
partition to occur; communication cost
is high.
 Star:
 the failure of a single link results in a
129
5

Robustness
 A robustness system must:
 Detect site or link failures
 Reconfigure the system so that
computation may continue.
 Recover when a processor or link is
repaired
 Handling failure types:
 Retransmit lost messages
 Unacknowledged retransmits indicate link
failure; find alternative route for message.
 Failure to find alternative route is a 129
symptom of network partition. 6

Procedure to Reconfigure

System data is stored at the failed
If replicated
site, update the catalog so that queries
do not reference the copy at the failed
site.
 Transactions active at the failed site
should be aborted.
 If the failed site is a central server for
some subsystem, an election must be
held to determine the new server.
 Reconfiguration scheme must work
correctly in case of network 129
7

129

Chapter 20: Parallel Databases
 Introduction
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism
 Design of Parallel Systems

Introduction
 Parallel machines are becoming quite
common and affordable
 Prices of microprocessors, memory and
disks have dropped sharply
 Databases are growing increasingly
large
 large volumes of transaction data are
collected and stored for later analysis.
 multimedia objects like images are
increasingly stored in databases
 Large-scale parallel database systems
increasingly used for:

Parallelism in Databases
 Data can be partitioned across
multiple disks for parallel I/O.
 Individual relational operations (e.g.,
sort, join, aggregation) can be
executed in parallel
 data can be partitioned and each
processor can work independently on its
own partition.
 Queries are expressed in high level
language (SQL, translated to relational
algebra)

I/O Parallelism
 Reduce the time required to retrieve
relations from disk by partitioning
 the relations on multiple disks.
 Horizontal partitioning – tuples of a
relation are divided among many disks
such that each tuple resides on one disk.
 Partitioning techniques (number of disks
= n):
Round-robin:
Send the ith tuple inserted in the relation to disk i
mod n.

I/O Parallelism (Cont.)
 Partitioning techniques (cont.):
 Range partitioning:
 Choose an attribute as the partitioning
attribute.
 A partitioning vector [vo, v1, ..., vn-2] is
chosen.
 Let v be the partitioning attribute value of
a tuple. Tuples such that vi vi+1 go to
disk I + 1. Tuples with v < v0 go to disk 0
and tuples with v vn-2 go to disk n-1.
E.g., with a partitioning vector [5,11], a
tuple with partitioning attribute value of 2

Comparison of Partitioning
 Techniques
Evaluate how well partitioning
techniques support the following types
of data access:
1.Scanning the entire relation.
2.Locating a tuple associatively –
point queries.
 E.g., r.A = 25.
3.Locating all tuples such that the
value of a given attribute lies within a
specified range – range queries.
 E.g., 10 r.A < 25.

Techniques (Cont.)
Round robin:
 Advantages
 Best suited for sequential scan of entire
relation on each query.
 All disks have almost an equal number of
tuples; retrieval work is thus well balanced
between disks.
 Range queries are difficult to process
 No clustering -- tuples are scattered
across all disks

Techniques(Cont.)
Hash partitioning:
 Good for sequential access
 Assuming hash function is good, and
partitioning attributes form a key, tuples
will be equally distributed between disks
 Retrieval work is then well balanced
between disks.
 Good for point queries on partitioning
attribute
 Can lookup single disk, leaving others
available for answering other queries.

Techniques (Cont.)
Range partitioning:
 Provides data clustering by
partitioning attribute value.
 Good for sequential access
 Good for point queries on partitioning
attribute: only one disk needs to be
accessed.
 For range queries on partitioning
attribute, one to a few disks may need
to be accessed
Remaining disks are available for other
queries.

Partitioning a Relation across
Disks
 If a relation contains only a few tuples
which will fit into a single disk block,
then assign the relation to a single
disk.
 Large relations are preferably
partitioned across all the available
disks.
 If a relation consists of m disk blocks
and there are n disks available in the
system, then the relation should be

Handling of Skew
 The distribution of tuples to disks may
be skewed — that is, some disks have
many tuples, while others may have
fewer tuples.
 Types of skew:
 Attribute-value skew.
 Some values appear in the partitioning
attributes of many tuples; all the tuples with
the same value for the partitioning attribute
end up in the same partition.
 Can occur with range-partitioning and
hash-partitioning.
 Partition skew.

Handling Skew in Range-
Partitioning
 To create a balanced partitioning vector
(assuming partitioning attribute forms a
key of the relation):
 Sort the relation on the partitioning
attribute.
 Construct the partition vector by scanning
the relation in sorted order as follows.
 After every 1/nth of the relation has been read,
the value of the partitioning attribute of the
next tuple is added to the partition vector.
 n denotes the number of partitions to be
constructed.


Handling Skew using from histogram in a
Balanced partitioning vector can be constructed
Histograms
relatively straightforward fashion
 Assume uniform distribution within each range of the histogram
 Histogram can be constructed by scanning relation, or sampling
(blocks containing) tuples of the relation

Handling Skew Using Virtual
Processor Partitioning
 Skew in range partitioning can be
handled elegantly using virtual
processor partitioning:
 create a large number of partitions (say
10 to 20 times the number of processors)
 Assign virtual processors to partitions
either in round-robin fashion or based on
estimated cost of processing each virtual
partition
 Basic idea:
 If any normal partition would have been

Interquery Parallelism
 Queries/transactions execute in
parallel with one another.
 Increases transaction throughput;
used primarily to scale up a
transaction processing system to
support a larger number of
transactions per second.
 Easiest form of parallelism to support,
particularly in a shared-memory
parallel database, because even
sequential database systems support
concurrent processing.

Cache Coherency Protocol
 Example of a cache coherency protocol
for shared disk systems:
 Before reading/writing to a page, the page
must be locked in shared/exclusive mode.
 On locking a page, the page must be read
from disk
 Before unlocking a page, the page must be
written to disk if it was modified.
 More complex protocols with fewer
disk reads/writes exist.
 Cache coherency protocols for shared-

Intraquery Parallelism
 Execution of a single query in parallel
on multiple processors/disks;
important for speeding up long-
running queries.
 Two complementary forms of
intraquery parallelism :
 Intraoperation Parallelism – parallelize the
execution of each individual operation in
the query.
 Interoperation Parallelism – execute the
different operations in a query expression
in parallel.

Parallel Processing of Relational
Operations
 Our discussion of parallel algorithms
assumes:
 read-only queries
 shared-nothing architecture
 n processors, P0, ..., Pn-1, and n disks D0,
..., Dn-1, where disk Di is associated with
processor Pi.
 If a processor has multiple disks they
can simply simulate a single disk Di.
 Shared-nothing architectures can be
efficiently simulated on shared-

Parallel Sort
Range-Partitioning Sort
 Choose processors P0, ..., Pm, where m n -1 to do sorting.
 Create range-partition vector with m entries, on the sorting
attributes
 Redistribute the relation using range partitioning
 all tuples that lie in the ith range are sent
to processor Pi
 Pi stores the tuples it received temporarily
on disk Di.
 This step requires I/O and communication
overhead.
 Each processor Pi sorts its partition of the relation locally.
 Each processors executes same operation (sort) in parallel with
other processors, without any interaction with the others (data
parallelism).
 Final merge operation is trivial: range-partitioning ensures that,

Parallel Sort (Cont.)
Parallel External Sort-Merge
 Assume the relation has already been
partitioned among disks D0, ..., Dn-1
(in whatever manner).
 Each processor Pi locally sorts the data
on disk Di.
 The sorted runs on each processor are
then merged to get the final sorted
output.
 Parallelize the merging of sorted runs
as follows:

Parallel Join
 The join operation requires pairs of
tuples to be tested to see if they
satisfy the join condition, and if they
do, the pair is added to the join
output.
 Parallel join algorithms attempt to split
the pairs to be tested over several
processors. Each processor then
computes part of the join locally.
 In a final step, the results from each
processor can be collected together to

Partitioned Join
 For equi-joins and natural joins, it is
possible to partition the two input
relations across the processors, and
compute the join locally at each processor.
 Let r and s be the input relations, and we
want to compute r r.A=s.B s.
 r and s each are partitioned into n
partitions, denoted r0, r1, ..., rn-1 and s0,
s1, ..., sn-1.
 Can use either range partitioning or hash
partitioning.

Fragment-and-Replicate Join
 Partitioning not possible for some join
conditions
 e.g., non-equijoin conditions, such as r.A
> s.B.
 For joins were partitioning is not
applicable, parallelization can be
accomplished by fragment and
replicate technique
 Depicted on next slide
 Special case – asymmetric fragment-
and-replicate:
 One of the relations, say r, is partitioned;
any partitioning technique can be used.

Depiction of Fragment-and-
Replicate Joins

a. Asymmetric
Fragment and
Replicate
b. Fragment and Replicate

 (Cont.)
General case: reduces the sizes of the
relations at each processor.
 r is partitioned into n partitions,r0, r1, ..., r
n-1;s is partitioned into m
partitions, s0, s1, ..., sm-1.
 Any partitioning technique may be used.
 There must be at least m * n processors.
 Label the processors as
 P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
 Pi,j computes the join of ri with sj. In order
to do so, ri is replicated to Pi,0, Pi,1, ..., Pi,m-
1, while si is replicated to P0,i, P1,i, ..., Pn-1,i

(Cont.)
 Both versions of fragment-and-replicate
work with any join condition, since every
tuple in r can be tested with every tuple
in s.
 Usually has a higher cost than
partitioning, since one of the relations
(for asymmetric fragment-and-replicate)
or both relations (for general fragment-
and-replicate) have to be replicated.
 Sometimes asymmetric fragment-and-
replicate is preferable even though

Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
 Assume s is smaller than r and
therefore s is chosen as the build
relation.
 A hash function h1 takes the join
attribute value of each tuple in s and
maps this tuple to one of the n
processors.
 Each processor Pi reads the tuples of s
that are on its disk Di, and sends each
tuple to the appropriate processor

Partitioned Parallel Hash-Join
 (Cont.)
Once the tuples of s have been
distributed, the larger relation r is
redistributed across the m processors
using the hash function h1
 Let ri denote the tuples of relation r that
are sent to processor Pi.
 As the r tuples are received at the
destination processors, they are
repartitioned using the function h2
 (just as the probe relation is partitioned in
the sequential hash-join algorithm).

Parallel Nested-Loop Join
 Assume that
 relation s is much smaller than relation r
and that r is stored by partitioning.
 there is an index on a join attribute of
relation r at each of the partitions of
relation r.
 Use asymmetric fragment-and-
replicate, with relation s being
replicated, and using the existing
partitioning of relation r.
 Each processor Pj where a partition of
relation s is stored reads the tuples of

Other Relational Operations

Selection (r)
 If is of the form ai = v, where ai is an
attribute and v a value.
 If r is partitioned on ai the selection is
performed at a single processor.
 If is of the form l <= ai <= u (i.e.,
is a range selection) and the relation
has been range-partitioned on ai
 Selection is performed at each processor
whose partition overlaps with the
specified range of values.

Other Relational Operations
(Cont.)
 Duplicate elimination
 Perform by using either of the parallel sort
techniques
 eliminate duplicates as soon as they are
found during sorting.
 Can also partition the tuples (using either
range- or hash- partitioning) and perform
duplicate elimination locally at each
processor.

 Projection

Grouping/Aggregation

 Partition the relation on the grouping
attributes and then compute the
aggregate values locally at each
processor.
 Can reduce cost of transferring tuples
during partitioning by partly
computing aggregate values before
partitioning.
 Consider the sum aggregation
operation:

Cost of Parallel Evaluation of
Operations
 If there is no skew in the partitioning,
and there is no overhead due to the
parallel evaluation, expected speed-
up will be 1/n
 If skew and overheads are also to be
taken into account, the time taken by
a parallel operation can be estimated
as
Tpart + Tasm + max (T0, T1, …, Tn-1)
 Tpart is the time for partitioning the
relations

Interoperator Parallelism

 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4
 Set up a pipeline that computes the three
joins in parallel
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of
temp2 = temp1 r3
 And P3 be assigned the computation of
temp2 r4
 Each of these operations can execute in

Factors Limiting Utility of Pipeline
Parallelism
 Pipeline parallelism is useful since it
avoids writing intermediate results to
disk
 Useful with small number of
processors, but does not scale up well
with more processors. One reason is
that pipeline chains do not attain
sufficient length.
 Cannot pipeline operators which do not
produce output until all inputs have
been accessed (e.g. aggregate and

Independent Parallelism

Independent parallelism
 Consider a join of four relations
r1 r2 r3 r4
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of temp2
= r3 r4
 And P3 be assigned the computation of temp1
temp2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining
independent parallelism and pipelined parallelism
 Does not provide a high degree of parallelism

Query Optimization
Query optimization in parallel databases is significantly more complex than
query optimization in sequential databases.
Cost models are more complicated, since we must take into account partitioning
costs and issues such as skew and resource contention.
When scheduling execution tree in parallel system, must decide:
 How to parallelize each operation and how many
processors to use for it.
 What operations to pipeline, what operations to
execute independently in parallel, and what
operations to execute sequentially, one after the
other.
Determining the amount of resources to allocate for each operation is a
problem.
 E.g., allocating more processors than optimal can
result in high communication overhead.
Long pipelines should be avoided as the final operation may wait a lot for inputs,
while holding precious resources

Query Optimization (Cont.)
 The number of parallel evaluation plans from which to choose
from is much larger than the number of sequential evaluation
plans.
 Therefore heuristics are needed while
optimization
 Two alternative heuristics for choosing parallel plans:
 No pipelining and inter-operation
pipelining; just parallelize every operation
across all processors.
 Finding best plan is now much easier ---
use standard optimization technique, but
with new cost model
 Volcano parallel database popularize the
exchange-operator model
 exchange operator is introduced into query
plans to partition and distribute tuples

Design of Parallel Systems

Some issues in the design of parallel
systems:
 Parallel loading of data from external
sources is needed in order to handle
large volumes of incoming data.
 Resilience to failure of some
processors or disks.
 Probability of some disk or processor
failing is higher in a parallel system.
 Operation (perhaps with degraded
performance) should be possible in spite

Design of Parallel Systems
(Cont.)
 On-line reorganization of data and
schema changes must be supported.
 For example, index construction on
terabyte databases can take hours or days
even on a parallel system.
 Need to allow other processing
(insertions/deletions/updates) to be
performed on relation even as index is
being constructed.
 Basic idea: index construction tracks
changes and ``catches up'‗ on changes at
the end.

Overview

 Web Interfaces to Databases
 Performance Tuning
 Performance Benchmarks
 Standardization
 E-Commerce
 Legacy Systems

134
5

The World Wide Web

 The Web is a distributed information
system based on hypertext.
 Most Web documents are hypertext
documents formatted via the
HyperText Markup Language (HTML)
 HTML documents contain
 text along with font specifications, and
other formatting instructions
 hypertext links to other documents, which
can be associated with regions of the text.
 forms, enabling users to enter data which
can then be sent back to the Web server 134
6

Web Interfaces to Databases
Why interface databases to the Web?
1. Web browsers have become the de-
facto standard user interface to
databases
 Enable large numbers of users to access
databases from anywhere
 Avoid the need for
downloading/installing specialized code,
while providing a good graphical user
interface
 E.g.: Banks, Airline/Car reservations,
University course registration/grading, …
134
7

Web Interfaces to Database (Cont.)
2. Dynamic generation of documents
 Limitations of static HTML documents
 Cannot customize fixed Web documents
for individual users.
 Problematic to update Web documents,
especially if multiple Web documents
replicate data.
 Solution: Generate Web documents
dynamically from data stored in a
database.
 Can tailor the display based on user
information stored in the database.
 E.g. tailored ads, tailored weather and
local news, …
 Displayed information is up-to-date, 134
unlike the static Web pages 8

Uniform Resources Locators
 In the Web, functionality of pointers is
provided by Uniform Resource
Locators (URLs).
 URL example:
http://www.bell-
labs.com/topics/book/db-book
 The first part indicates how the document
is to be accessed
 ―http‖ indicates that the document is to be
accessed using the Hyper Text Transfer
Protocol.
 The second part gives the unique name of
a machine on the Internet.
 The rest of the URL identifies the 134
9

HTML and HTTP
 HTML provides formatting, hypertext
link, and image display features.
 HTML also provides input features
 Select from a set of options
 Pop-up menus, radio buttons, check lists
 Enter values
 Text boxes
 Filled in input sent back to the server, to
be acted upon by an executable at the
server
 HyperText Transfer Protocol (HTTP)
used for communication with the
Web server 135
0

Sample HTML Source Text
<html> <body>
<table border cols = 3>
<tr> <td> A-101 </td> <td>
Downtown </td> <td> 500 </td> </tr>
…
</table>
<center> The <i>account</i> relation
</center>

<form action=―BankQuery‖
method=get>
Select account/loan and enter number
<br>
<select name=―type‖>
135
1

Display of Sample HTML Source

135
2

Client Side Scripting and
 Applets can fetch certain scripts
Browsers
(client-side scripts) or programs along
with documents, and execute them in
―safe mode‖ at the client site
 Javascript
 Macromedia Flash and Shockwave for
animation/games
 VRML
 Applets
 Client-side scripts/programs allow
documents to be active
 E.g., animation by executing programs at 135
3

Client Side Scripting and
 Security
Security mechanisms needed to
ensure that malicious scripts do not
cause damage to the client machine
 Easy for limited capability scripting
languages, harder for general purpose
programming languages like Java
 E.g. Java‘s security system ensures
that the Java applet code does not
make any system calls directly
 Disallows dangerous actions such as file
writes
 Notifies the user about potentially 135
4

Web Servers
 A Web server can easily serve as a
front end to a variety of information
services.
 The document name in a URL may
identify an executable program,
that, when run, generates a HTML
document.
 When a HTTP server receives a request
for such a document, it executes the
program, and sends back the HTML
document that is generated.
 The Web client can pass extra
arguments with the name of the 135
5

Three-Tier Web Architecture

135
6


Two-Tier Web Architecture
Multiple levels of indirection have overheads
 Alternative: two-tier architecture

135
7

HTTP and Sessions

 The HTTP protocol is connectionless
 That is, once the server replies to a
request, the server closes the connection
with the client, and forgets all about the
request
 In contrast, Unix logins, and JDBC/ODBC
connections stay connected until the
client disconnects
 retaining user authentication and other
information
 Motivation: reduces load on server 135

 operating systems have tight limits on 8

Sessions and Cookies

 A cookie is a small piece of text
containing identifying information
 Sent by server to browser on first
interaction
 Sent by browser to the server that created
the cookie on further interactions
 part of the HTTP protocol
 Server saves information about cookies it
issued, and can use it when serving a
request
 E.g., authentication information, and user 135
9

Servlets
 Java Servlet specification defines an
API for communication between the
Web server and application program
 E.g. methods to get parameter values and
to send HTML text back to client
 Application program (also called a
servlet) is loaded into the Web server
 Two-tier model
 Each request spawns a new thread in the
Web server
 thread is closed once the request is
serviced 136
0

Example Servlet Code extends
Public class BankQuery(Servlet
HttpServlet {
public void doGet(HttpServletRequest
request, HttpServletResponse result)
throws ServletException, IOException {
String type =
request.getParameter(―type‖);
String number =
request.getParameter(―number‖);
…code to find the loan
amount/account balance …
…using JDBC to communicate with the
database..
…we assume the value is stored in the
136
1

Server-Side Scripting

 Server-side scripting simplifies the
task of connecting a database to the
Web
 Define a HTML document with embedded
executable code/SQL queries.
 Input values from HTML forms can be
used directly in the embedded code/SQL
queries.
 When the document is requested, the Web
server executes the embedded code/SQL
queries to generate the actual HTML 136
2

Improving Web Server
Performance
 Performance is an issue for popular
Web sites
 May be accessed by millions of users
every day, thousands of requests per
second at peak time
 Caching techniques used to reduce
cost of serving pages by exploiting
commonalities between requests
 At the server site:
 Caching of JDBC connections between
servlet requests 136
3

Performance Tuning
 Adjusting various parameters and
design choices to improve system
performance for a specific application.
 Tuning is best done by
1. identifying bottlenecks, and
2. eliminating them.

 Can tune a database system at 3
levels:
 Hardware -- e.g., add disks to speed up
I/O, add memory to increase buffer hits,
move to a faster processor.
 Database system parameters -- e.g., set 136

buffer size to avoid paging of buffer, set 5

Bottlenecks

 Performance of most systems (at least
before they are tuned) usually limited
by performance of one or a few
components: these are called
bottlenecks
 E.g. 80% of the code may take up 20% of
time and 20% of code takes up 80% of
time
 Worth spending most time on 20% of code
that take 80% of time
 Bottlenecks may be in hardware (e.g. 136
6

Identifying Bottlenecks
 Transactions request a sequence of
services
 e.g. CPU, Disk I/O, locks
 With concurrent transactions,
transactions may have to wait for a
requested service while other
transactions are being served
 Can model database as a queueing
system with a queue for each service
 transactions repeatedly do the following
 request a service, wait in queue for the service,
and get serviced
 Bottlenecks in a database system 136
7

Queues In A Database System

136
8

Tunable Parameters

 Tuning of hardware
 Tuning of schema
 Tuning of indices
 Tuning of materialized views
 Tuning of transactions

136
9

Tuning of Hardware

 Even well-tuned transactions typically
require a few I/O operations
 Typical disk supports about 100 random
I/O operations per second
 Suppose each transaction requires just 2
random I/O operations. Then to support
n transactions per second, we need to
stripe data across n/50 disks (ignoring
skew)
 Number of I/O operations per
transaction can be reduced by keeping 137
0

Hardware Tuning: Five-Minute
 Question: which data to keep in memory:
Rule is accessed n times per
 If a page
second, keeping it in memory saves
 n* price-per-disk-drive
accesses-per-second-per-disk
 Cost of keeping page in memory
 price-per-MB-of-memory
ages-per-MB-of-memory
 Break-even point: value of n for which above
costs are equal
 If accesses are more then saving is greater than
cost
 Solving above equation with current disk and 137
1

Hardware Tuning: One-Minute
Rule
 For sequentially accessed data, more
pages can be read per second.
Assuming sequential reads of 1MB of
data at a time:
1-minute rule: sequentially accessed
data that is accessed
once or more in a minute should be
kept in memory
 Prices of disk and memory have
changed greatly over the years, but
the ratios have not changed much
137
2

Hardware Tuning: Choice of RAID
Level
 To use RAID 1 or RAID 5?
 Depends on ratio of reads and writes
 RAID 5 requires 2 block reads and 2 block
writes to write out one data block
 If an application requires r reads and w
writes per second
 RAID 1 requires r + 2w I/O operations
per second
 RAID 5 requires: r + 4w I/O operations
per second
 For reasonably large r and w, this
requires lots of disks to handle
workload 137
3

Tuning the Database Design
 Schema tuning
 Vertically partition relations to isolate the
data that is accessed most often -- only
fetch needed information.
• E.g., split account into two, (account-number,
branch-name) and (account-number,
balance).
• Branch-name need not be fetched unless
required
 Improve performance by storing a
denormalized relation
• E.g., store join of account and depositor;
branch-name and balance information is
repeated for each holder of an account, but
join need not be computed repeatedly. 137
4

(Cont.)tuning
 Index
 Create appropriate indices to speed up slow
queries/updates
 Speed up slow updates by removing excess
indices (tradeoff between queries and
updates)
 Choose type of index (B-tree/hash)
appropriate for most frequent types of
queries.
 Choose which index to make clustered
 Index tuning wizards look at past history
of queries and updates (the workload) 137
5

Materialized Views
(Cont.)
 Materialized views can help speed up
certain queries
 Particularly aggregate queries
 Overheads
 Space
 Time for view maintenance
 Immediate view maintenance:done as part of
update txn
 time overhead paid by update transaction
 Deferred view maintenance: done only when
required
 update transaction is not affected, but system
time is spent on view maintenance
137
6

(Cont.)
 How to choose set of materialized
views
 Helping one transaction type by
introducing a materialized view may hurt
others
 Choice of materialized views depends on
costs
 Users often have no idea of actual cost of
operations
 Overall, manual selection of materialized
views is tedious 137
7

Tuning of Transactions
 Basic approaches to tuning of
transactions
 Improve set orientation
 Reduce lock contention
 Rewriting of queries to improve
performance was important in the past,
but smart optimizers have made this
less important
 Communication overhead and query
handling overheads significant part of
cost of each call
 Combine multiple embedded
SQL/ODBC/JDBC queries into a single set- 137
8

Tuning of Transactions (Cont.)

 Reducing lock contention
 Long transactions (typically read-only)
that examine large parts of a relation
result in lock contention with update
transactions
 E.g. large query to compute bank
statistics and regular bank transactions
 To reduce contention
 Use multi-version concurrency control
 E.g. Oracle ―snapshots‖ which support 137
multi-version 2PL 9

Tuning of Transactions (Cont.)
 Long update transactions cause several
problems
 Exhaust lock space
 Exhaust log space
 and also greatly increase recovery time after
a crash, and may even exhaust log space
during recovery if recovery algorithm is badly
designed!
 Use mini-batch transactions to limit
number of updates that a single
transaction can carry out. E.g., if a
single large transaction updates every
record of a very large relation, log may
grow too big. 138
0

Performance Simulation

 Performance simulation using queuing
model useful to predict bottlenecks as
well as the effects of tuning changes,
even without access to real system
 Queuing model as we saw earlier
 Models activities that go on in parallel
 Simulation model is quite detailed, but
usually omits some low level details
 Model service time, but disregard details
of service
 E.g. approximate disk read time by using 138
an average disk read time 1

Performance Benchmarks
 Suites of tasks used to quantify the
performance of software systems
 Important in comparing database
systems, especially as systems become
more standards compliant.
 Commonly used performance
measures:
 Throughput (transactions per second, or
tps)
 Response time (delay from submission of
transaction to return of result)
 Availability or mean time to failure 138
3

Performance Benchmarks
 (Cont.)
Suites of tasks used to characterize
performance
 single task not enough for complex systems
 Beware when computing average
throughput of different transaction types
 E.g., suppose a system runs transaction type
A at 99 tps and transaction type B at 1 tps.
 Given an equal mixture of types A and B,
throughput is not (99+1)/2 = 50 tps.
1/t1 + 1/t2 + … + 1/tn
 Running one transaction of each type takes
time 1+.01 seconds, giving a throughput of
1.98 tps. 138
4

Database Application Classes

 Online transaction processing (OLTP)
 requires high concurrency and clever
techniques to speed up commit
processing, to support a high rate of
update transactions.
 Decision support applications
 including online analytical processing, or
OLAP applications
 require good query evaluation algorithms
and query optimization.
 Architecture of some database 138
5

Benchmarks Suites
 The Transaction Processing Council
(TPC) benchmark suites are widely
used.
 TPC-A and TPC-B: simple OLTP application
modeling a bank teller application with
and without communication
 Not used anymore
 TPC-C: complex OLTP application
modeling an inventory system
 Current standard for OLTP benchmarking

138
6

Benchmarks Suites (Cont.)

 TPC benchmarks (cont.)
 TPC-D: complex decision support
application
 Superceded by TPC-H and TPC-R
 TPC-H: (H for ad hoc) based on TPC-D
with some extra queries
 Models ad hoc queries which are not known
beforehand
 Total of 22 queries with emphasis on
aggregation
 prohibits materialized views
138
 permits indices only on primary and foreign 7

TPC Performance Measures
 TPC performance measures
 transactions-per-second with specified
constraints on response time
 transactions-per-second-per-dollar
accounts for cost of owning system
 TPC benchmark requires database
sizes to be scaled up with increasing
transactions-per-second
 reflects real world applications where
more customers means more database
size and more transactions-per-second
 External audit of TPC performance 138
8

TPC Performance Measures

 Two types of tests for TPC-H and
TPC-R
 Power test: runs queries and updates
sequentially, then takes mean to find
queries per hour
 Throughput test: runs queries and
updates concurrently
 multiple streams running in parallel each
generates queries, with one parallel update
stream
 Composite query per hour metric: square 138
root of product of power and throughput 9

Other Benchmarks

 OODB transactions require a different
set of benchmarks.
 OO7 benchmark has several different
operations, and provides a separate
benchmark number for each kind of
operation
 Reason: hard to define what is a typical
OODB application
 Benchmarks for XML being discussed

139
0

Standardization
 The complexity of contemporary
database systems and the need for
their interoperation require a variety of
standards.
 syntax and semantics of programming
languages
 functions in application program
interfaces
 data models (e.g. object oriented/object
relational databases)
 Formal standards are standards
developed by a standards organization 139
2

Standardization (Cont.)
 Anticipatory standards lead the market
place, defining features that vendors
then implement
 Ensure compatibility of future products
 But at times become very large and
unwieldy since standards bodies may not
pay enough attention to ease of
implementation (e.g.,SQL-92 or
SQL:1999)
 Reactionary standards attempt to
standardize features that vendors have
already implemented, possibly in 139

different ways. 3

SQL Standards History
 SQL developed by IBM in late 70s/early
80s
 SQL-86 first formal standard
 IBM SAA standard for SQL in 1987
 SQL-89 added features to SQL-86
that were already implemented in
many systems
 Was a reactionary standard
 SQL-92 added many new features to
SQL-89 (anticipatory standard)
 Defines levels of compliance (entry, 139
4

SQL Standards History (Cont.)

 SQL:1999
 Adds variety of new features --- extended
data types, object orientation, procedures,
triggers, etc.
 Broken into several parts
 SQL/Framework (Part 1): overview
 SQL/Foundation (Part 2): types, schemas,
tables, query/update statements, security,
etc
 SQL/CLI (Call Level Interface) (Part 3): API
interface
139
 SQL/PSM (Persistent Stored Modules) (Part 5

SQL Standards History (Cont.)

 More parts undergoing
standardization process
 Part 7: SQL/Temporal: temporal data
 Part 9: SQL/MED (Management of External
Data)
 Interfacing of database to external data
sources
 Allows other databases, even files, can be
viewed as part of the database
 Part 10 SQL/OLB (Object Language
Bindings): embedding SQL in Java 139
6

Database Connectivity
 Open DataBase Connectivity (ODBC)
Standards
standard for database interconnectivity
 based on Call Level Interface (CLI)
developed by X/Open consortium
 defines application programming interface,
and SQL features that must be supported at
different levels of compliance
 JDBC standard used for Java
 X/Open XA standards define
transaction management standards for
supporting distributed 2-phase commit
 OLE-DB: API like ODBC, but intended
to support non-database sources of 139
7

Object Oriented Databases
 Standards
Object Database Management Group
(ODMG) standard for object-oriented
databases
 version 1 in 1993 and version 2 in
1997, version 3 in 2000
 provides language independent Object
Definition Language (ODL) as well as
several language specific bindings
 Object Management Group (OMG)
standard for distributed software
based on objects
 Object Request Broker (ORB) provides 139
8

XML-Based Standards

 Several XML based Standards for E-
commerce
 E.g. RosettaNet (supply chain), BizTalk
 Define catalogs, service descriptions,
invoices, purchase orders, etc.
 XML wrappers are used to export
information from relational databases to
XML
 Simple Object Access Protocol (SOAP):
XML based remote procedure call
standard 139
9

E-Commerce

 E-commerce is the process of carrying
out various activities related to
commerce through electronic means
 Activities include:
 Presale activities:
catalogs, advertisements, etc
 Sale process: negotiations on
price/quality of service
 Marketplace: e.g. stock
exchange, auctions, reverse auctions
 Payment for sale
140
1

E-Catalogs

 Product catalogs must provide
searching and browsing facilities
 Organize products into intuitive hierarchy
 Keyword search
 Help customer with comparison of
products
 Customization of catalog
 Negotiated pricing for specific
organizations
 Special discounts for customers based on
past history
140
2

Marketplaces
 Marketplaces help in negotiating the
price of a product when there are
multiple sellers and buyers
 Several types of marketplaces
 Reverse auction
 Auction
 Exchange
 Real world marketplaces can be quite
complicated due to product
differentiation
 Database issues:
 Authenticate bidders
140
 Record buy/sell bids securely 3

Types of Marketplace

 Reverse auction system: single buyer,
multiple sellers.
 Buyer states requirements, sellers bid for
supplying items. Lowest bidder wins.
(also known as tender system)
 Open bidding vs. closed bidding
 Auction: Multiple buyers, single seller
 Simplest case: only one instance of each
item is being sold
 Highest bidder for an item wins
 More complicated with multiple copies,
and buyers bid for specific number of 140
4

Order Settlement

 Order settlement: payment for goods
and delivery
 Insecure means for electronic
payment: send credit card number
 Buyers may present some one else‘s credit
card numbers
 Seller has to be trusted to bill only for
agreed-on item
 Seller has to be trusted not to pass on the
credit card number to unauthorized
people 140
5

Secure Payment Systems
 All information must be encrypted to
prevent eavesdropping
 Public/private key encryption widely used
 Must prevent person-in-the-middle
attacks
 E.g. someone impersonates seller or
bank/credit card company and fools buyer
into revealing information
 Encrypting messages alone doesn‘t solve this
problem
 More on this in next slide
 Three-way communication between 140
6

Secure Payment Systems (Cont.)

 Digital certificates are used to prevent
impersonation/man-in-the middle
attack
 Certification agency creates digital
certificate by encrypting, e.g., seller‘s
public key using its own private key
 Verifies sellers identity by external means
first!
 Seller sends certificate to buyer
 Customer uses public key of certification
agency to decrypt certificate and find 140
7

Digital Cash
 Credit-card payment does not provide
anonymity
 The SET protocol hides buyers identity from
seller
 But even with SET, buyer can be traced with
help of credit card company
 Digital cash systems provide anonymity
similar to that provided by physical
cash
 E.g. DigiCash
 Based on encryption techniques that make
it impossible to find out who purchased 140
8

Legacy Systems
 Legacy systems are older-generation
systems that are incompatible with
current generation standards and
systems but still in production use
 E.g. applications written in Cobol that run on
mainframes
 Today‘s hot new system is tomorrows legacy
system!
 Porting legacy system applications to a
more modern environment is
problematic
 Very expensive, since legacy system may
involve millions of lines of code, written over
decades
141
0

Legacy Systems (Cont.)

 Rewriting legacy application requires a
first phase of understanding what it
does
 Often legacy code has no documentation
or outdated documentation
 reverse engineering: process of going
over legacy code to
 Come up with schema designs in ER or OO
model
 Find out what procedures and processes are
implemented, to get a high level view of 141
system 1


 Switching over from old to new system
is a major problem
 Production systems are in every day,
generating new data
 Stopping the system may bring all of a
company‘s activities to a halt, causing
enormous losses
 Big-bang approach:
1. Implement complete new system
2. Populate it with data from old system
1. No transactions while this step is executed
141
2

 Chicken-little approach:
 Replace legacy system one piece at a time
 Use wrappers to interoperate between
legacy and new code
 E.g. replace front end first, with wrappers on
legacy backend
 Old front end can continue working in this
phase in case of problems with new front end
 Replace back end, one functional unit at a
time
 All parts that share a database may have to be
replaced together, or wrapper is needed on
database also
 Drawback: significant extra development
141
3

Chapter 22: Advanced Querying and
Information Retrieval Systems
 Decision-Support
 Data Analysis
 OLAP
 Extended aggregation features in SQL
 Windowing and ranking
 Data Mining
 Data Warehousing
 Information-Retrieval Systems
 Including Web search

141
5

Decision Support Systems

 Decision-support systems are used
to make business decisions often
based on data collected by on-line
transaction-processing systems.
 Examples of business decisions:
 what items to stock?
 What insurance premium to change?
 Who to send advertisements to?
 Examples of data used for making
decisions
 Retail sales transaction details 141
6

Decision-Support Systems: Overview
 Data analysis tasks are simplified by specialized tools and SQL
extensions
 Example tasks
 For each product category and each region,
what were the total sales in the last quarter
and how do they compare with the same
quarter last year
 As above, for each product category and
each customer category
 Statistical analysis packages (e.g., : S++) can be interfaced with
databases
 Statistical analysis is a large field will not
study it here
 Data mining seeks to discover knowledge automatically in the
form of statistical rules and patterns from Large databases.
 A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site. 141
7

Data Analysis and OLAP
 Aggregate functions summarize large
volumes of data
 Online Analytical Processing (OLAP)
 Interactive analysis of data, allowing data
to be summarized and viewed in different
ways in an online fashion (with negligible
delay)
 Data that can be modeled as
dimension attributes and measure
attributes are called multidimensional
data.
 Given a relation used for data analysis, we 141
8

Cross Tabulation of sales by
item-name and color

 The table above is an example of a
cross-tabulation (cross-tab), also
referred to as a pivot-table.
 A cross-tab is a table where
 values for one of the dimension attributes
form the row headers, values for another
dimension attribute form the column headers 141
9

Relational Representation of
Crosstabs
Crosstabs can be represented as
relations
The value all is used to represent
aggregates
The SQL:1999 standard actually uses
null values in place of all
More on this later….

142
0

Three-Dimensional Data Cube
A data cube is a multidimensional generalization of a crosstab
Cannot view a three-dimensional object in its entirety
but crosstabs can be used as views on a data cube

142
1

Online Analytical Processing
 The operation of changing the
dimensions used in a cross-tab is
called pivoting
 Suppose an analyst wishes to see a
cross-tab on item-name and color for
a fixed value of size, for
example, large, instead of the sum
across all sizes.
 Such an operation is referred to as slicing.
 The operation is sometimes called
dicing, particularly when values for multiple
dimensions are fixed.
142
2

Hierarchiesattributes: lets dimensions to be viewed at
Hierarchy on dimension
on Dimensions
different levels of detail
 E.g. the dimension DateTime can be used to aggregate by hour of day, date,
day of week, month, quarter or year

142
3

Cross Tabulation With
Hierarchyextended to deal with hierarchies
Crosstabs can be easily
 Can drill down or roll up on a hierarchy

142
4

OLAP Implementation
 The earliest OLAP systems used
multidimensional arrays in memory to
store data cubes, and are referred to as
multidimensional OLAP (MOLAP)
systems.
 OLAP implementations using only
relational database features are called
relational OLAP (ROLAP) systems
 Hybrid systems, which store some
summaries in memory and store the
base data and other summaries in a
relational database, are called hybrid142
5

OLAP Implementation (Cont.)
 Early OLAP systems precomputed all
possible aggregates in order to provide
online response
 Space and time requirements for doing so
can be very high
 2n combinations of group by
 It suffices to precompute some aggregates,
and compute others on demand from one of
the precomputed aggregates
 Can compute aggregate on (item-name, color)
from an aggregate on (item-name, color, size)
 For all but a few ―non-decomposable‖
aggregates such as median
 is cheaper than computing it from scratch
 Several optimizations available for
142
6

Extended Aggregation

 SQL-92 aggregation quite limited
 Many useful aggregates are either very
hard or impossible to specify
 Data cube
 Complex aggregates (median, variance)
 binary aggregates (correlation, regression
curves)
 ranking queries (―assign each student a
rank based on the total marks‖
 SQL:1999 OLAP extensions provide a
variety of aggregation functions to 142
7

Extended Aggregation in
 The cube operation computes union of
SQL:1999
group by‘s on every subset of the
specified attributes
 E.g. consider the query
select item-name, color, size,
sum(number)
from sales
group by cube(item-name, color, size)
This computes the union of eight
different groupings of the sales relation:
{ (item-name, color, size), (item-name,
color), 142
8


Extended Aggregation (Cont.)
Relational representation of crosstab that we saw earlier, but with
null in place of all, can be computed by
select item-name, color, sum(number)
from sales
group by cube(item-name, color)
 The function grouping() can be applied on an attribute
 Returns 1 if the value is a null value
representing all, and returns 0 in all other
cases.
select item-name, color, size, sum(number),
grouping(item-name) as item-name-flag,
grouping(color) as color-flag,
grouping(size) as size-flag,
from sales
group by cube(item-name, color, size)
 Can use the function decode() in the select clause to replace
such nulls by a value such as all

 E.g. replace item-name in first query by
decode( grouping(item-name), 1, ‗all‘,
142
9


The rollup construct generates union on every prefix of specified
list of attributes
 E.g.
select item-name, color, size, sum(number)
from sales
group by rollup(item-name, color, size)
 Generates union of four groupings:
{ (item-name, color, size), (item-name, color), (item-name), (
)}
 Rollup can be used to generate aggregates at multiple levels of a
hierarchy.
 E.g., suppose table itemcategory(item-name, category) gives the
category of each item. Then
select category, item-name, sum(number)
from sales, itemcategory
where sales.item-name = itemcategory.item-name
group by rollup(category, item-name)
would give a hierarchical summary by item-name and by
category. 143
0

 Multiple rollups and cubes can be used in
a single group by clause
 Each generates set of group by lists, cross
product of sets gives overall set of group by
lists
 E.g.,
select item-name, color, size,
sum(number)
from sales
group by rollup(item-name),
rollup(color, size)
generates the groupings 143
1

Ranking
 Ranking is done in conjunction with an
order by specification.
 Given a relation student-marks(student-
id, marks) find the rank of each student.
select student-id, rank( ) over (order by
marks desc) as s-rank
from student-marks
 An extra order by clause is needed to get
them in sorted order
select student-id, rank ( ) over (order by
marks desc) as s-rank
from student-marks 143
2

Ranking (Cont.)
 Ranking can be done within partition of the data.
 ―Find the rank of students within each section.‖
select student-id, section,
rank ( ) over (partition by section order by marks desc)
as sec-rank
from student-marks, student-section
where student-marks.student-id = student-section.student-id
order by section, sec-rank
 Multiple rank clauses can occur in a single select clause
 Ranking is done after applying group by clause/aggregation
 Exercises:
 Find students with top n ranks
 Many systems provide special (non-standard)
syntax for ―top-n‖ queries
 Rank students by sum of their marks in
different courses 143
3

Ranking (Cont.)

 Other ranking functions:
 percent_rank (within partition, if
partitioning is done)
 cume_dist (cumulative distribution)
 fraction of tuples with preceding values
 row_number (non-deterministic in
presence of duplicates)
 SQL:1999 permits the user to specify
nulls first or nulls last
select student-id,
rank ( ) over (order by marks
143
4

Ranking (Cont.)
 For a given constant n, the ranking the
function ntile(n) takes the tuples in
each partition in the specified order,
and divides them into n buckets with
qual numbers of tuples. For instance,
we an sort employees by salary, and
use ntile(3) to find which range
(bottom third, middle third, or top
third) each employee is in, and
compute the total salary earned by
employees in each range:
select threetile, sum(salary) 143
5

Windowing
 E.g.: ―Given sales values for each date, calculate for each date the
average of the sales on that day, the previous day, and the next day‖
 Such moving average queries are used to smooth out random
variations.
 In contrast to group by, the same tuple can exist in multiple
windows
 Window specification in SQL:
 Ordering of tuples, size of window for each
tuple, aggregate function
 E.g. given relation sales(date, value)
select date, sum(value) over
(order by date between rows 1 preceding and 1 following)
from sales
 Examples of other window specifications:
 between rows unbounded preceding and
current
 rows unbounded preceding
143
6

Windowing (Cont.)

 Can do windowing within partitions
 E.g. Given a relation
transaction(account-number, date-
time, value), where value is positive
for a deposit and negative for a
withdrawal
 ―Find total balance of each account after
each transaction on the account‖
select account-number, date-time,
sum(value) over
(partition by account-number
143
7

143
Copyright: Silberschatz, Korth and Sudarshan 8

Data Mining
 Broadly speaking, data mining is the
process of semi-automatically
analyzing large databases to find
useful patterns
 Like knowledge discovery in artificial
intelligence data mining discovers
statistical rules and patterns
 Differs from machine learning in that
it deals with large volumes of data
stored primarily on disk.
 Some types of knowledge discovered
from a database can be represented 143
9

Applications of Data Mining
 Prediction based on past history
 Predict if a credit card applicant poses a
good credit risk, based on some attributes
(income, job type, age, ..) and past history
 Predict if a customer is likely to switch
brand loyalty
 Predict if a customer is likely to respond to
―junk mail‖
 Predict if a pattern of phone calling card
usage is likely to be fraudulent
 Some examples of prediction
mechanisms:
 Classification
 Given a training set consisting of items 144
0

Applications of Data Mining
(Cont.)
 Descriptive Patterns
 Associations
 Find books that are often bought by the
same customers. If a new customer buys
one such book, suggest that he buys the
others too.
 Other similar applications: camera
accessories, clothes, etc.
 Associations may also be used as a first
step in detecting causation
 E.g. association between exposure to
chemical X and cancer, or new medicine and
144
1

Classification Rules
 Classification rules help assign new
objects to a set of classes. E.g., given a
new automobile insurance applicant,
should he or she be classified as low
risk, medium risk or high risk?
 Classification rules for above example
could use a variety of knowledge, such
as educational level of applicant,
salary of applicant, age of applicant,
etc.
 person P, P.degree = masters and
P.income > 75,000 144
2

Decision Tree

144
3

Construction of Decision Trees
 Training set: a data sample in which the grouping for each tuple is
already known.
 Consider credit risk example: Suppose degree is chosen to
partition the data at the root.
 Since degree has a small number of
possible values, one child is created for
each value.
 At each child node of the root, further classification is done if
required. Here, partitions are defined by income.
 Since income is a continuous attribute,
some number of intervals are chosen, and
one child created for each interval.
 Different classification algorithms use different ways of choosing
which attribute to partition on at each node, and what the
intervals, if any, are.
144
 In general 4

Construction of Decision Trees
 (Cont.)
Greedy top down generation of
decision trees.
 Each internal node of the tree partitions
the data into groups based on a
partitioning attribute, and a partitioning
condition for the node
 More on choosing partioning
attribute/condition shortly
 Algorithm is greedy: the choice is made
once and not revisited as more of the tree
is constructed
 The data at a node is not partitioned
further if either 144
 all (or most) of the items at the node 5

 Best Splits
Idea: evaluate different attributes and partitioning conditions and pick
the one that best improves the ―purity‖ of the training set examples
 The initial training set has a mixture of
instances from different classes and is thus
relatively impure
 E.g. if degree exactly predicts credit risk,
partitioning on degree would result in each
child having instances of only one class
 I.e., the child nodes would be pure
 The purity of a set S of training instances can be measured
k
quantitatively in several ways. p2i
 Notation: number of classesi-= k, number of instances = |S|,
1
fraction of instances in class i = pi.
 The Gini measure of purity is defined as
[

Gini (S) = 1 -
144
 When all instances are in a single class, the 6

Best Splits (Cont.)
k
pilog2 pi
 Another measure of purity is the entropy measure, which is
i- 1
defined as

entropy (S) = –

r |S |
 When a set S is split into multiple sets Si, I=1, 2, …, r, we can
i
purity (Si)
measure the purity of the resultant set of sets as:
i= 1 |S|

purity(S1, S2, ….., Sr) =

 The information gain due to particular split of S into Si, i =
1, 2, …., r
Information-gain (S, {S1, S2, …., Sr) =
purity(S) – purity (S1, S2, … Sr)
144
7

Best Splits (Cont.)
 Measure of ―cost‖ of a split:
Information-content(S, {S1, S2, ….., Sr})) =

r |S–
i| |Si|
log2
i- 1 |S| |S|
 Information-gain ratio = Information-gain (S, {S1, S2, ……, Sr})
Information-content (S, {S1, S2, ….., Sr})
 The best split for an attribute is the one that gives the maximum
information gain ratio
 Continuous valued attributes
 Can be ordered in a fashion meaningful to
classification
 e.g. integer and real values
 Categorical attributes
 Cannot be meaningfully ordered (e.g.
country, school/university, item-color, .): 144
8

Finding Best Splits

 Categorical attributes:
 Multi-way split, one child for each value
 may have too many children in some cases
 Binary split: try all possible breakup of
values into two sets, and pick the best
 Continuous valued attribute
 Binary split:
 Sort values in the instances, try each as a
split point
 E.g. if values are 1, 10, 15, 25, split at 1,
10, 15 144
9

Decision-Tree Construction
Algorithm
Procedure Grow.Tree(S)
Partition(S);

Procedure Partition (S)
if (purity(S) > p or |S| < s) then
return;
for each attribute A
evaluate splits on attribute A;
Use best split found (across all attributes) to partition
S into S1, S2, …., Sr,
for i = 1, 2, ….., r
Partition(Si);

145
0

Decision Tree Constr‘n Algo‘s.
(Cont.)
 Variety of algorithms have been
developed to
 Reduce CPU cost and/or
 Reduce IO cost when handling datasets
larger than memory
 Improve accuracy of classification
 Decision tree may be overfitted, i.e.,
overly tuned to given training set
 Pruning of decision tree may be done on
branches that have too few training
instances
 When a subtree is pruned, an internal node
145
1

Other Types of Classifiers
 Further types of classifiers
 Neural net classifiers
 Bayesian classifiers
 Neural net classifiers use the training
data to train artificial neural nets
 Widely studied in AI, won‘t cover here
 Bayesian classifiers use Bayes theorem,
which says
p(cj | d) = p(d | cj ) p(cj)
p(d)
where
p(cj | d) = probability of instance
d being in class cj, 145
2

Naïve Bayesian Classifiers

 Bayesian classifiers require
 computation of p(d | cj)
 precomputation of p(cj)
 p(d) can be ignored since it is the same
for all classes
 To simplify the task, naïve Bayesian
classifiers assume attributes have
independent distributions, and
thereby estimate
p(d|cj) = p(d1|cj) * p(d2|cj) * ….* 145
3

Regression
 Regression deals with the prediction of a
value, rather than a class.
 Given values for a set of variables, X1, X2, …,
Xn, we wish to predict the value of a variable
Y.
 One way is to infer coefficients a0, a1, a1,
…, an such that
Y = a0 + a1 * X1 + a2 * X2 + … + an *
Xn
 Finding such a linear polynomial is
called linear regression.
 In general, the process of finding a curve 145
4

Association Rules
 Retail shops are often interested in
associations between different items
that people buy.
 Someone who buys bread is quite likely
also to buy milk
 A person who bought the book Database
System Concepts is quite likely also to buy
the book Operating System Concepts.
 Associations information can be used
in several ways.
 E.g. when a customer buys a particular
book, an online shop may suggest
associated books.
 Association rules:
145
5

Association Rules (Cont.)
 Rules have an associated support, as
well as an associated confidence.
 Support is a measure of what fraction
of the population satisfies both the
antecedent and the consequent of the
rule.
 E.g. suppose only 0.001 percent of all
purchases include milk and screwdrivers.
The support for the rule is milk
screwdrivers is low.
 We usually want rules with a reasonably
high support
 Rules with low support are usually not very
useful 145
6

Finding Association Rules

 We are generally only interested in
association rules with reasonably high
support (e.g. support of 2% or greater)
 Naïve algorithm
1. Consider all possible sets of relevant
items.
2. For each set find its support (i.e. count
how many transactions purchase all
items in the set).
 Large itemsets: sets with sufficiently high
support
3. Use large itemsets to generate 145
7

Finding Support
 Few itemsets: determine support of all itemsets via a single pass on
set of transactions
 A count is maintained for each itemset,
initially set to 0.
 When a transaction is fetched, the count is
incremented for each set of items that is
contained in the transaction.
 Large itemsets: sets with a high count at the
end of the pass
 Many itemsets: If memory not enough to hold all counts for all
itemsets use multiple passes, considering only some itemsets in each
pass.
 Optimization: Once an itemset is eliminated because its count
(support) is too small none of its supersets needs to be considered.
 The a priori technique to find large itemsets: 145
8

Other Types of Associations
 Basic association rules have several
limitations
 Deviations from the expected probability
are more interesting
 E.g. if many people purchase bread, and
many people purchase cereal, quite a few
would be expected to purchase both (prob1 *
prob2)
 We are interested in positive as well as
negative correlations between sets of items
 Positive correlation: co-occurrence is higher
than predicted
 Negative correlation: co-occurrence is lower
than predicted 145
9

Clustering
 Clustering: Intuitively, finding clusters
of points in the given data such that
similar points lie in the same cluster
 Can be formalized using distance
metrics in several ways
 E.g. Group points into k sets (for a
given k) such that the average distance
of points from the centroid of their
assigned group is minimized
 Centroid: point defined by taking average of
coordinates in each dimension.
 Another metric: minimize average distance
146
0

Hierarchical Clustering
 Example from biological classification
 (the word classification here does not mean a
prediction mechanism)
chordata
mammalia reptilia
leopards humans snakes
crocodiles
 Other examples: Internet directory
systems (e.g. Yahoo, more on this later)
 Agglomerative clustering algorithms 146
1

Clustering Algorithms

 Clustering algorithms have been
designed to handle very large datasets
 E.g. the Birch algorithm
 Main idea: use an in-memory R-tree to
store points that are being clustered
 Insert points one at a time into the R-tree,
merging a new point with an existing
cluster if is less than some distance
away
 If there are more leaf nodes than fit in
memory, merge existing clusters that are 146
2

Collaborative Filtering

 Goal: predict what movies/books/… a
person may be interested in, on the
basis of
 Past preferences of the person
 Other people with similar past preferences
 The preferences of such people for a new
movie/book/…
 One approach based on repeated
clustering
 Cluster people on the basis of preferences
for movies
 Then cluster movies on the basis of being
146
3

Other Types of Mining

 Text mining: application of data
mining to textual documents
 E.g. cluster Web pages to find related
pages
 E.g. cluster pages a user has visited to
organize their visit history
 E.g. classify Web pages automatically into
a Web directory
 Data visualization systems help users
examine large volumes of data and
detect patterns visually 146
4

146

Data Warehousing

 Large organizations have complex
internal organizations, and have data
stored at different locations, on
different operational (transaction
processing) systems, under different
schemas
 Data sources often store only current
data, not historical data
 Corporate decision making requires a
unified view of all organizational data, 146
6

Data Warehousing

146
7

Components of Data Warehouse
 When and how to gather data
 Source driven architecture: data sources
transmit new information to warehouse,
either continuously or periodically (e.g. at
night)
 Destination driven architecture: warehouse
periodically requests new information
from data sources
 Keeping warehouse exactly synchronized
with data sources (e.g. using two-phase
commit) is too expensive
 Usually OK to have slightly out-of-date data
at warehouse 146
8

Components of Data Warehouse
(Cont.)
 Data cleansing
 E.g. correct mistakes in addresses
 E.g. misspellings, zip code errors
 Merge address lists from different sources
and purge duplicates
 Keep only one address record per
household (―householding‖)
 How to propagate updates
 Warehouse schema may be a
(materialized) view of schema from data
sources
 Efficient techniques for update of 146
9

Data Warehouse Schemas

147
0

Warehouse Schemas

 Typically warehouse data is
multidimensional, with very large fact
tables
 Examples of dimensions: item-id,
date/time of sale, store where sale was
made, customer identifier
 Examples of measures: number of items
sold, price of items
 Dimension values are usually encoded
using small integers and mapped to
full values via dimension tables 147
1

147

 Information retrieval (IR) systems use
a simpler data model than database
systems
 Information organized as a collection of
documents
 Documents are unstructured, no schema
 Information retrieval locates relevant
documents, on the basis of user input
such as keywords or example
documents
 e.g., find documents containing the words
―database systems‖ 147
3

(Cont.)
 Differences from database systems
 IR systems don‘t deal with transactional
updates (including concurrency control
and recovery)
 Database systems deal with structured
data, with schemas that define the data
organization
 IR systems deal with some querying issues
not generally addressed by database
systems
 Approximate searching by keywords 147
4

Keyword Search
 In full text retrieval, all the words in each document are
considered to be keywords.
 We use the word term to refer to the words
in a document
 Information-retrieval systems typically allow query expressions
formed using keywords and the logical connectives and, or, and
not
 Ands are implicit, even if not explicitly
specified
 Ranking of documents on the basis of estimated relevance to a
query is critical
 Relevance ranking is based on factors such
as
 Term frequency
 Frequency of occurrence of query keyword in
document 147

 Inverse document frequency 5

Relevance Ranking Using Terms
 TF-IDF (Term frequency/Inverse
Document frequency) ranking:
 Let n(d) = number of terms in the
document d
n(d, t)
 n(d, t) = number of 1 +
r(d, t) = log occurrences of term t
n(d)
in the document d.
 Then relevance of a document d to a term
t
r(d, t)
r(d, Q) = t Q n(t)

 The log factor is to avoid excessive 147
6

Relevance Ranking Using Terms
(Cont.)
 Most systems add to the above model
 Words that occur in title, author list,
section headings, etc. are given greater
importance
 Words whose first occurrence is late in the
document are given lower importance
 Very common words such as ―a‖, ―an‖,
―the‖, ―it‖ etc are eliminated
 Called stop words
 Proximity: if keywords in query occur
close together in the document, the 147
7

Relevance Using Hyperlinks
 When using keyword queries on the
Web, the number of documents is
enormous (many billions)
 Number of documents relevant to a query
can be enormous if only term frequencies
are taken into account
 Using term frequencies makes
―spamming‖ easy
 E.g. a travel agent can add many occurrences
of the words ―travel agent‖ to his page to
make its rank very high
 Most of the time people are looking for 147

pages from popular sites 8

 Solution: use number of hyperlinks to a
(Cont.)
site as a measure of the popularity or
prestige of the site
 Count only one hyperlink from each site
(why?)
 Popularity measure is for site, not for
individual page
 Most hyperlinks are to root of site
 Site-popularity computation is cheaper than
page popularity computation
 Refinements
 When computing prestige based on links to a
site, give more weightage to links from sites 147
9

 (Cont.)
Connections to social networking
theories that ranked prestige of people
 E.g. the president of the US has a high
prestige since many people know him
 Someone known by multiple prestigious
people has high prestige
 Hub and authority based ranking
 A hub is a page that stores links to many
pages (on a topic)
 An authority is a page that contains actual
information on a topic
 Each page gets a hub prestige based on 148
0

Similarity Based Retrieval

 Similarity based retrieval - retrieve
documents similar to a given
document
 Similarity may be defined on the basis of
common words
 E.g. find k terms in A with highest r(d, t)
and use these terms to find relevance of
other documents; each of the terms carries
a weight of r (d,t)
 Similarity can be used to refine answer
set to keyword query 148
1

Synonyms and Homonyms
 Synonyms
 E.g. document: ―motorcycle repair‖, query:
―motorcycle maintenance‖
 need to realize that ―maintenance‖ and ―repair‖
are synonyms
 System can extend query as ―motorcycle and
(repair or maintenance)‖
 Homonyms
 E.g. ―object‖ has different meanings as
noun/verb
 Can disambiguate meanings (to some extent)
from the context
148
2

Indexing of Documents
 An inverted index maps each keyword
Ki to a set of documents Si that contain
the keyword
 Documents identified by identifiers
 Inverted index may record
 Keyword locations within document to
allow proximity based ranking
 Counts of number of occurrences of
keyword to compute TF
 and operation: Finds documents that
contain all of K1, K2, ..., Kn.
 Intersection S1 S2 ..... Sn
 or operation: documents that contain 148
3

Measuring Retrieval
Effectiveness save space by using index
 IR systems
structures that support only
approximate retrieval. May result in:
 false negative (false drop) - some
relevant documents may not be
retrieved.
 false positive - some irrelevant
documents may be retrieved.
 For many applications a good index
should not permit any false drops, but
may permit a few false positives.
 Relevant performance metrics:
 Precision - what percentage of the 148
4

Measuring Retrieval
 Ranking order can also result in false
Effectiveness (Cont.)
positives/false negatives
 Recall vs. precision tradeoff:
 Can increase recall by retrieving many
documents (down to a low level of relevance
ranking), but many irrelevant documents would
be fetched, reducing precision
 Measures of retrieval effectiveness:
 Recall as a function of number of documents
fetched, or
 Precision as a function of recall
 Equivalently, as a function of number of
documents fetched
 E.g. ―precision of 75% at recall of 50%, and 60%
at a recall of 75%‖ 148
5

Web Crawling

 Web crawlers are programs that locate
and gather information on the Web
 Recursively follow hyperlinks present in
known documents, to find other
documents
 Starting from a seed set of documents
 Fetched documents
 Handed over to an indexing system
 Can be discarded after indexing, or store as
a cached copy
 Crawling the entire Web would take a 148
6

Web Crawling (Cont.)

 Crawling is done by multiple
processes on multiple machines,
running in parallel
 Set of links to be crawled stored in a
database
 New links found in crawled pages added
to this set, to be crawled later
 Indexing process also runs on
multiple machines
 Creates a new copy of index instead of
modifying old index
148
7

Browsing

 Storing related documents together
in a library facilitates browsing
 users can see not only requested
document but also related ones.
 Browsing is facilitated by
classification system that organizes
logically related documents together.
 Organization is hierarchical:
classification hierarchy

148
8

A Classification Hierarchy For A
Library System

148
9

Classification DAG

 Documents can reside in multiple
places in a hierarchy in an information
retrieval system, since physical
location is not important.
 Classification hierarchy is thus
Directed Acyclic Graph (DAG)

149
0

A Classification DAG For A
Library Information Retrieval
System

149
1

Web Directories

 A Web directory is just a classification
directory on Web pages
 E.g. Yahoo! Directory, Open Directory
project
 Issues:
 What should the directory hierarchy be?
 Given a document, which nodes of the
directory are categories relevant to the
document
 Often done manually
 Classification of documents into a hierarchy 149
2

149

Overview

 Temporal Data
 Spatial and Geographic Databases
 Multimedia Databases
 Mobility and Personal Databases

149
4

Time In Databases
 While most databases tend to model
reality at a point in time (at the
``current'' time), temporal databases
model the states of the real world
across time.
 Facts in temporal relations have
associated times when they are valid,
which can be represented as a union of
intervals.
 The transaction time for a fact is the
time interval during which the fact is
current within the database system. 149
5

Time In Databases (Cont.)
 Example of a temporal relation:

 Temporal query languages have been
proposed to simplify modeling of time as 149
6

Time Specification in SQL-92
 date: four digits for the year (1--
9999), two digits for the month (1--
12), and two digits for the date (1--
31).
 time: two digits for the hour, two
digits for the minute, and two digits
for the second, plus optional fractional
digits.
 timestamp: the fields of date and
time, with six fractional digits for the
seconds field.
 Times are specified in the Universal
149
7

Temporal Query Languages
 Predicates precedes, overlaps, and
contains on time intervals.
 Intersect can be applied on two
intervals, to give a single (possibly
empty) interval; the union of two
intervals may or may not be a single
interval.
 A snapshot of a temporal relation at
time t consists of the tuples that are
valid at time t, with the time-interval
attributes projected out.
 Temporal selection: involves time
149
8

Temporal Query Languages
 (Cont.)
Functional dependencies must be used
with care: adding a time field may
invalidate functional dependency
 A temporal functional dependency x
Y holds on a relation schema R if,
for all legal instances r of R, all
snapshots of r satisfy the functional
dependency X Y.
 SQL:1999 Part 7 (SQL/Temporal) is a
proposed extension to SQL:1999 to
improve support of temporal data.
149
9

150

Spatial and Geographic Databases
 Spatial databases store information
related to spatial locations, and
support efficient storage, indexing
and querying of spatial data.
 Special purpose index structures are
important for accessing spatial data,
and for processing spatial join
queries.
 Computer Aided Design (CAD)
databases store design information
about how objects are constructed
E.g.: designs of buildings, aircraft, 150
1

Represented of Geometric
Information
 Various geometric constructs can be represented in a database
in a normalized fashion.
 Represent a line segment by the coordinates of its endpoints.
 Approximate a curve by partitioning it into a sequence of
segments
 Create a list of vertices in order, or
 Represent each segment as a separate
tuple that also carries with it the identifier
of the curve (2D features such as roads).
 Closed polygons
 List of vertices in order, starting vertex is
the same as the ending vertex, or
 Represent boundary edges as separate
tuples, with each containing identifier of 150
2

Representation of Geometric
Constructs

150
3

Information (Cont.)
 Representation of points and line
segment in 3-D similar to 2-D, except
that points have an extra z component
 Represent arbitrary polyhedra by
dividing them into tetrahedrons, like
triangulating polygons.
 Alternative: List their faces, each of
which is a polygon, along with an
indication of which side of the face is
inside the polyhedron.
150
4

Design Databases
 Represent design components as
objects (generally geometric
objects); the connections between
the objects indicate how the
design is structured.
 Simple two-dimensional objects:
points, lines, triangles, rectangles,
polygons.
 Complex two-dimensional objects:
formed from simple objects via
union, intersection, and difference
operations. 150
5

Constructs

(a) Difference of cylinders (b) Union of cylinders

 Design databases also store non-
spatial information about objects (e.g.,
construction material, color, etc.)
 Spatial integrity constraints are
important. 150
6

Geographic Data

 Raster data consist of bit maps or
pixel maps, in two or more
dimensions.
 Example 2-D raster image: satellite
image of cloud cover, where each pixel
stores the cloud visibility in a particular
area.
 Additional dimensions might include the
temperature at different altitudes at
different regions, or measurements
taken at different points in time.
 Design databases generally do not 150
7

Geographic Data (Cont.)

 Vector data are constructed from
basic geometric objects: points, line
segments, triangles, and other
polygons in two dimensions, and
cylinders, speheres, cuboids, and
other polyhedrons in three
dimensions.
 Vector format often used to
represent map data.
 Roads can be considered as two-
dimensional and represented by lines
and curves. 150
8

Applications of Geographic Data
 Examples of geographic data
 map data for vehicle navigation
 distribution network information for
power, telephones, water supply, and
sewage
 Vehicle navigation systems store
information about roads and
services for the use of drivers:
 Spatial data: e.g, road/restaurant/gas-
station coordinates
 Non-spatial data: e.g., one-way
streets, speed limits, traffic congestion
150

 Global Positioning System (GPS) 9

Spatial Queries
 Nearness queries request objects
that lie near a specified location.
 Nearest neighbor queries, given a
point or an object, find the nearest
object that satisfies given
conditions.
 Region queries deal with spatial
regions. e.g., ask for objects that lie
partially or fully inside a specified
region.
 Queries that compute intersections
or unions of regions. 151
0

Spatial Queries (Cont.)
 Spatial data is typically queried
using a graphical query language;
results are also displayed in a
graphical manner.
 Graphical interface constitutes the
front-end
 Extensions of SQL with abstract
data types, such as lines, polygons
and bit maps, have been proposed
to interface with back-end.
 allows relational databases to store and
retrieve spatial information 151
1

Indexing of Spatial Data

 k-d tree - early structure used for
indexing in multiple dimensions.
 Each level of a k-d tree partitions the
space into two.
 choose one dimension for partitioning at
the root level of the tree.
 choose another dimensions for partitioning
in nodes at the next level and so on,
cycling through the dimensions.
 In each node, approximately half of the
points stored in the sub-tree fall on
one side and half on the other. 151
2

Division of Space by a k-d Tree

 Each line in the figure (other than
the outside box) corresponds to a
node in the k-d tree
 the maximum number of points in a
leaf node has been set to 1.
 The numbering of the lines in the 151
3

Division of Space by Quadtrees
Quadtrees
 Each node of a quadtree is associated with a rectangular region of
space; the top node is associated with the entire target space.
 Each non-leaf nodes divides its region into four equal sized
quadrants
 correspondingly each such node has four
child nodes corresponding to the four
quadrants and so on
 Leaf nodes have between zero and some fixed maximum number of
points (set to 1 in example).

151
4

Quadtrees (Cont.)
 PR quadtree: stores points; space is
divided based on regions, rather than
on the actual set of points stored.
 Region quadtrees store array (raster)
information.
 A node is a leaf node is all the array
values in the region that it covers are the
same. Otherwise, it is subdivided further
into four children of equal area, and is
therefore an internal node.
 Each node corresponds to a sub-array of
values.
 The sub-arrays corresponding to leaves
either contain just a single array 151
5

R-Trees

 R-trees are a N-dimensional
extension of B+-trees, useful for
indexing sets of rectangles and
other polygons.
 Supported in many modern database
systems, along with variants like R+ -
trees and R*-trees.
 Basic idea: generalize the notion of a
one-dimensional interval associated
with each B+ -tree node to an
N-dimensional interval, that is, an
N-dimensional rectangle.
151
6

R Trees (Cont.)
 A rectangular bounding box is
associated with each tree node.
 Bounding box of a leaf node is a
minimum sized rectangle that contains
all the rectangles/polygons associated
with the leaf node.
 The bounding box associated with a non-
leaf node contains the bounding box
associated with all its children.
 Bounding box of a node serves as its key
in its parent node (if any)
 Bounding boxes of children of a node are
allowed to overlap 151
7

Example R-Tree
 A set of rectangles (solid line) and the
bounding boxes (dashed line) of the nodes
of an R-tree for the rectangles. The R-tree is
shown on the right.

151
8

Search in R-Trees
 To find data items
(rectangles/polygons) intersecting
(overlaps) a given query
point/region, do the following,
starting from the root node:
 If the node is a leaf node, output the
data items whose keys intersect the
given query point/region.
 Else, for each child of the current node
whose bounding box overlaps the query
point/region, recursively search the
child
 Can be very inefficient in worst case
151
9

Insertion in R-Trees

 To insert a data item:
 Find a leaf to store it, and add it to the
leaf
 To find leaf, follow a child (if any) whose
bounding box contains bounding box of
data item, else child whose overlap with
data item bounding box is maximum
 Handle overflows by splits (as in B+ -
trees)
 Split procedure is different though (see
below)
 Adjust bounding boxes starting from the
leaf upwards 152
0

Splitting an R-Tree Node
 Quadratic split divides the entries in a
node into two new nodes as follows
1. Find pair of entries with ―maximum
separation‖
 that is, the pair such that the bounding
box of the two would has the maximum
wasted space (area of bounding box – sum
of areas of two entries)
2. Place these entries in two new nodes
3. Repeatedly find the entry with ―maximum
preference‖ for one of the two new nodes,
and assign the entry to that node
 Preference of an entry to a node is the
increase in area of bounding box if the
entry is added to the other node
152
1

Deleting in R-Trees

 Deletion of an entry in an R-tree done
much like a B+-tree deletion.
 In case of underfull node, borrow entries
from a sibling if possible, else merging
sibling nodes
 Alternative approach removes all entries
from the underfull node, deletes the node,
then reinserts all entries

152
2

152

Multimedia Databases

 To provide such database
functions as indexing and
consistency, it is desirable to store
multimedia data in a database
 rather than storing them outside the
database, in a file system
 The database must handle large
object representation.
 Similarity-based retrieval must be
provided by special index
structures.
152

 Must provide guaranteed steady 4

Multimedia Data Formats
 Store and transmit multimedia data
in compressed form
 JPEG and GIF the most widely used
formats for image data.
 MPEG standard for video data use
commonalties among a sequence of
frames to achieve a greater degree of
compression.
 MPEG-1 quality comparable to VHS
video tape.
 stores a minute of 30-frame-per-
second video and audio in
approximately 12.5 MB 152
5

Continuous-Media Data

 Most important types are video and
audio data.
 Characterized by high data volumes
and real-time information-delivery
requirements.
 Data must be delivered sufficiently fast
that there are no gaps in the audio or
video.
 Data must be delivered at a rate that does
not cause overflow of system buffers.
 Synchronization among distinct data
streams must be maintained 152
6

Video Servers
 Video-on-demand systems deliver
video from central video servers,
across a network, to terminals
 Must guarantee end-to-end delivery rates
 Current video-on-demand servers are
based on file systems; existing
database systems do not meet real-
time response requirements.
 Multimedia data are stored on several
disks (RAID configuration), or on
tertiary storage for less frequently
accessed data. 152
7

Similarity-Based Retrieval

Examples of similarity based retrieval
 Pictorial data: Two pictures or images
that are slightly different as
represented in the database may be
considered the same by a user.
 E.g., identify similar designs for registering
a new trademark.
 Audio data: Speech-based user
interfaces allow the user to give a
command or identify a data item by
speaking.
 E.g., test user input against stored
152
8

152

Mobile Computing Environments
 A mobile computing environment
consists of mobile computers,
referred to as mobile hosts, and a
wired network of computers.
 Mobile host may be able to
communicate with wired network
through a wireless digital
communication network
 Wireless local-area networks (within a
building)
 E.g. Avaya‘s Orinico Wireless LAN
 Wide areas networks 153
0

Mobile Computing Environments
(Cont.)
 A model for mobile communication
 Mobile hosts communicate to the wired
network via computers referred to as
mobile support (or base) stations.
 Each mobile support station manages
those mobile hosts within its cell.
 When mobile hosts move between cells,
there is a handoff of control from one
mobile support station to another.
 Direct communication, without going
through a mobile support station is 153
1

Database Issues in Mobile Computing
 New issues for query optimization.
 Connection time charges and number of
bytes transmitted
 Energy (battery power) is a scarce
resource and its usage must be
minimized
 Mobile user‘s locations may be a parameter of the query
 GIS queries
 Techniques to track locations of large
numbers of mobile hosts
 Broadcast data can enable any number of clients to receive
the same data at no extra cost
 leads to interesting querying and data
caching issues.
 Users may need to be able to perform database updates even
153

while the mobile computer is disconnected.
2

Routing and Query Processing
 Must consider these competing
costs:
 User time.
 Communication cost
 Connection time - used to assign
monetary charges in some cellular
systems.
 Number of bytes, or packets,
transferred - used to compute
charges in digital cellular systems
 Time-of-day based charges - vary
based on peak or off-peak periods
 Energy - optimize use of battery
power by minimizing reception and153
3

Broadcast Data
 Mobile support stations can broadcast frequently-requested
data
 Allows mobile hosts to wait for needed
data, rather than having to consume
energy transmitting a request
 Supports mobile hosts without
transmission capability
 A mobile host may optimize energy costs by determining if a
query can be answered using only cached data
 If not then must either;
 Wait for the data to be broadcast
 Transmit a request for data and must
know when the relevant data will be
broadcast.
 Broadcast data may be transmitted according to a fixed 153
schedule or a changeable schedule. 4

Disconnectivity and Consistency
 A mobile host may remain in
operation during periods of
disconnection.
 Problems created if the user of the
mobile host issues queries and
updates on data that resides or is
cached locally:
 Recoverability: Updates entered on a
disconnected machine may be lost if
the mobile host fails. Since the mobile
host represents a single point of
failure, stable storage cannot be
simulated well. 153
5

Mobile Updates
 Partitioning via disconnection is the normal mode of
operation in mobile computing.
 For data updated by only one mobile host, simple to
propagate update when mobile host reconnects
 in other cases data may become invalid
and updates may conflict.
 When data are updated by other computers, invalidation
reports inform a reconnected mobile host of out-of-date
cache entries
 however, mobile host may miss a
report.
 Version-numbering-based schemes guarantee only that if
two hosts independently update the same version of a
document, the clash will be detected eventually, when the
hosts exchange information either directly or through a
common host.
 More on this shortly 153
6

Detecting Inconsistent Updates
 Version vector scheme used to detect
inconsistent updates to documents at
different hosts (sites).
 Copies of document d at hosts i and j
are inconsistent if
1. the copy of document d at i contains
updates performed by host k that have not
been propagated to host j (k may be the
same as i), and
2. the copy of d at j contains updates
performed by host l that have not been
propagated to host i (l may be the same as
j)
153
7

Detecting Inconsistent Updates
(Cont.)
 When two hosts i and j connect to each
other they check if the copies of all
documents d that they share are
consistent:
1. If the version vectors are the same on both
hosts (that is, for each k, Vd,i [k] = Vd,j [k])
then the copies of d are identical.
2. If, for each k, Vd,i [k] Vd,j [k], and the
version vectors are not identical, then the
copy of document d at host i is older than
the one at host j
 That is, the copy of document d at host 153
j
8

Handling Inconsistent Updates
 Dealing with inconsistent updates is
hard in general. Manual intervention
often required to merge the updates.
 Version vector schemes
 were developed to deal with failures in a
distributed file system, where
inconsistencies are rare.
 are used to maintain a unified file system
between a fixed host and a mobile
computer, where updates at the two hosts
have to be merged periodically.
 Also used for similar purposes in
groupware systems. 153
9

END OF CHAPTER

154

Chapter 24: Advanced Transaction
Processing
 Transaction-Processing Monitors
 Transactional Workflows
 High-Performance Transaction
Systems
 Main memory databases
 Real-Time Transaction Systems
 Long-Duration Transactions
 Transaction management in
multidatabase systems
154
1

Transaction Processing Monitors
 TP monitors initially developed as
multithreaded servers to support large
numbers of terminals from a single
process.
 Provide infrastructure for building and
administering complex transaction
processing systems with a large
number of clients and multiple
servers.
 Provide services such as:
 Presentation facilities to simplify creating
user interfaces 154
2

TP Monitor Architectures

154
3

TP Monitor Architectures (Cont.)
 Process per client model - instead of
individual login session per terminal,
server process communicates with
the terminal, handles authentication,
and executes actions.
 Memory requirements are high
 Multitasking- high CPU overhead for
context switching between processes
 Single process model - all remote
terminals connect to a single server
process.
 Used in client-server environments 154
4

TP Monitor Architectures (Cont.)
 Many-server single-router model
- multiple application server
processes access a common
database; clients communicate
with the application through a
single communication process that
routes requests.
 Independent server processes for
multiple applications
 Multithread server process
 Run on parallel or distributed
database
 Many server many-router model -
154
5

Detailed Structure of a TP Monitor

154
6

Detailed Structure of a TP Monitor
 Queue manager handles incoming
messages
 Some queue managers provide
persistent or durable message
queueing contents of queue are safe
even if systems fails.
 Durable queueing of outgoing
messages is important
 application server writes message to
durable queue as part of a transaction
 once the transaction commits, the TP
monitor guarantees message is
eventually delevered, regardless of 154
7

Application Coordination Using TP
Monitors treats each subsystem
 A TP monitor
as a resource manager that provides
transactional access to some set of
resources.
 The interface between the TP monitor
and the resource manager is defined
by a set of transaction primitives
 The resource manager interface is
defined by the X/Open Distributed
Transaction Processing standard.
 TP monitor systems provide a
transactional remote procedure call 154
8

154

Transactional Workflows
 Workflows are activities that involve the
coordinated execution of multiple
tasks performed by different
processing entities.
 With the growth of networks, and the
existence of multiple autonomous
database systems, workflows provide a
convenient way of carrying out tasks
that involve multiple systems.
 Example of a workflow delivery of an
email message, which goes through
several mails systems to reach 155
0

Examples of Workflows

155
1

Loan Processing Workflow

 In the past, workflows were handled
by creating and forwarding paper
forms
 Computerized workflows aim to
automate many of the tasks. But the 155
2

Transactional Workflows
 Must address following issues to
computerize a workflow.
 Specification of workflows - detailing the
tasks that must be carried out and defining
the execution requirements.
 Execution of workflows - execute
transactions specified in the workflow while
also providing traditional database
safeguards related to the correctness of
computations, data integrity, and durability.
 E.g.: Loan application should not get lost
even if system fails.
 Extend transaction concepts to the 155
3

Workflow Specification
 Static specification of task coordination:
 Tasks and dependencies among them are
defined before the execution of the workflow
starts.
 Can establish preconditions for execution of
each task: tasks are executed only when
their preconditions are satisfied.
 Defined preconditions through
dependencies:
 Execution states of other tasks.
―task ti cannot start until task tj has ended‖

 Output values of other tasks.
―task ti can start if task tj returns a value
155
4

Workflow Specification (Cont.)

 Dynamic task coordination
E.g. Electronic mail routing system in
which the text to be schedule for a
given mail message depends on the
destination address and on which
intermediate routers are functioning.

155
5

Failure-Automicity Requirements
 Usual ACID transactional
requirements are too
strong/unimplementable for
workflow applications.
 However, workflows must satisfy
some limited transactional
properties that guarantee a process
is not left in an inconsistent state.
 Acceptable termination states -
every execution of a workflow will
terminate in a state that satisfies the
failure-atomicity requirements 155
6

Execution of Workflows

Workflow management systems include:
 Scheduler - program that process
workflows by submitting various tasks
for execution, monitoring various
events, and evaluation conditions
related to intertask dependencies
 Task agents - control the execution of
a task by a processing entity.
 Mechanism to query to state of the
workflow system.
155
7

Workflow Management System
Architectures
 Centralized - a single scheduler
schedules the tasks for all concurrently
executing workflows.
 used in workflow systems where the data is
stored in a central database.
 easier to keep track of the state of a
workflow.
 Partially distributed - has one (instance
of a ) scheduler for each workflow.
 Fully distributed - has no scheduler,
but the task agents coordinate their
execution by communicating with each 155
8

Workflow Scheduler
 Ideally scheduler should execute a
workflow only after ensuring that it
will terminate in an acceptable state.
 Consider a workflow consisting of
two tasks S1 and S2. Let the failure-
atomicity requirement be that either
both or neither of the
subtransactions should be
committed.
 Suppose systems executing S1 and S2 do
not provide prepared-to-commit states
and S1 or S2 do not have compensating
transactions. 155
9

Recovery of a Workflow
 Ensure that is a failure occurs in
any of the workflow-processing
components, the workflow
eventually reaches an acceptable
termination state.
 Failure-recovery routines need to
restore the state information of the
scheduler at the time of failure,
including the information about
the execution states of each task.
Log status information on stable
storage. 156
0

Recovery of a Workflow (Cont.)
 Persistent messages: messages are
stored in permanent message queue
and therefore not lost in case of failure.
 Described in detail in Chapter 19
(Distributed Databases)
 Before an agent commits, it writes to
the persistent message queue whatever
messages need to be sent out.
 The persistent message system must
make sure the messages get delivered
eventually if and only if the transaction
commits.
 The message system needs to resend a 156
1

HIGH PERFORMANCE
TRANSACTION
SYSTEMS

156

High-Performance Transaction
Systems
 High-performance hardware and
parallelism help improve the rate of
transaction processing, but are
insufficient to obtain high
performance:
 Disk I/O is a bottleneck — I/O time (10
milliseconds) has no decreased at a
rate comparable to the increase in
processor speeds.
 Parallel transactions may attempt to
read or write the same data item,
resulting in data conflicts that reduce
effective parallelism 156
3

Main-Memory Database
 Commercial 64-bit systems can
support main memories of tens of
gigabytes.
 Memory resident data allows faster
processing of transactions.
 Disk-related limitations:
 Logging is a bottleneck when
transaction rate is high.
 Use group-commit to reduce number of
output operations (Will study two slides
ahead.)
 If the update rate for modified buffer 156

blocks is high, the disk data-transfer 4

Main-Memory Database
Optimizations
 To reduce space overheads, main-
memory databases can use
structures with pointers crossing
multiple pages. In disk databases,
the I/O cost to traverse multiple
pages would be excessively high.
 No need to pin buffer pages in
memory before data are accessed,
since buffer pages will never be
replaced.
 Design query-processing
techniques to minimize space 156
5

Group Commit
 Idea: Instead of performing output of
log records to stable storage as soon
as a transaction is ready to
commit, wait until
 log buffer block is full, or
 a transaction has been waiting sufficiently
long after being ready to commit
 Results in fewer output operations
per committed transaction, and
correspondingly a higher throughput.
 However, commits are delayed until a
sufficiently large group of 156
6

Real-Time Transaction Systems

 In systems with real-time constraints,
correctness of execution involves both
database consistency and the
satisfaction of deadlines.
 Hard deadline – Serious problems may occur
if task is not completed within deadline
 Firm deadline - The task has zero value if it
completed after the deadline.
 Soft deadline - The task has diminishing
value if it is completed after the deadline.
 The wide variance of execution times for
read and write operations on disks
complicates the transaction
management problem for time- 156
7

Long Duration Transactions
Traditional concurrency control
techniques do not work
well when user interaction is
required:
 Long duration: Design edit
sessions are very long
 Exposure of uncommitted data:
E.g., partial update to a design
 Subtasks: support partial rollback
 Recoverability: on crash state
should be restored even for yet-156

to-be committed data, so user 8

Long-Duration Transactions
 Represent as a nested transaction
 atomic database operations (read/write)
at a lowest level.
 If transaction fails, only active short-
duration transactions abort.
 Active long-duration transactions
resume once any short duration
transactions have recovered.
 The efficient management of long-
duration waits, and the possibility of
aborts.
 Need alternatives to waits and 156
9

Concurrency Control
 Correctness without serializability:
 Correctness depends on the specific
consistency constraints for the
databases.
 Correctness depends on the
properties of operations performed by
each transaction.
 Use database consistency
constraints as to split the database
into subdatabases on which
concurrency can be managed
separately.
 Treat some operations besides
157
0

Concurrency Control (Cont.)
A non-conflict-
serializable schedule
that preserves the
sum of A + B

157
1

Nested and Multilevel Transactions
 A nested or multilevel transaction T is
represented by a set
T = {t1, t2, ..., tn} of subtransactions
and a partial order P on T.
 A subtransaction ti in T may abort
without forcing T to abort.
 Instead, T may either restart ti, or
simply choose not to run ti.
 If ti commits, this action does not
make ti, permanent (unlike the
situation in Chapter 15). Instead, ti,
commits to T, and may still abort (or 157
2

Nested and Multilevel Transactions
(Cont.)
 Subtransactions can themselves be
nested/multilevel transactions.
 Lowest level of nesting: standard read and
write operations.
 Nesting can create higher-level
operations that may enhance
concurrency.
 Types of nested/ multilevel
transactions:
 Multilevel transaction: subtransaction of T is permitted
to release locks on completion.
 Saga: multilevel long-duration transaction.
 Nested transaction: locks held by a subtransaction ti of T 157
3

Example of Nesting
 Rewrite transaction T1 using
subtransactions Ta and Tb that
perform increment or decrement
operations:
 T1 consists of
 T1,1, which subtracts 50 from A
 T1,2, which adds 50 to B
 Rewrite transaction T2 using
subtransactions Tc and Td that
perform increment or decrement
operations:
 T2 consists of
157
 T , , which subtracts 10 from B 4

Compensating Transactions
 Alternative to undo operation;
compensating transactions deal with
the problem of cascading rollbacks.
 Instead of undoing all changes made
by the failed transaction, action is
taken to ―compensate‖ for the failure.
 Consider a long-duration transaction
Ti representing a travel reservation,
with subtransactions Ti,1, which
makes airline reservations, Ti,2 which
reserves rental cars, and Ti,3 which
reserves a hotel room.
 Hotel cancels the reservation. 157
5

Implementation Issues
 For long-duration transactions to
survive system crashes, we must log
not only changes to the database, but
also changes to internal system data
pertaining to these transactions.
 Logging of updates is made more
complex by physically large data items
(CAD design, document text);
undesirable to store both old and new
values.
 Two approaches to reducing the
overhead of ensuring the recoverability
157
6

Transaction Management in
Multidatabase Systems
 Transaction management is
complicated in multidatabase systems
because of the assumption of
autonomy
 Global 2PL -each local site uses a strict
2PL (locks are released at the end); locks
set as a result of a global transaction are
released only when that transaction
reaches the end.
 Guarantees global serializability
 Due to autonomy requirements, sites 157
7

 Local transactions are executed by
each local DBMS, outside of the MDBS
system control.
 Global transactions are executed
under multidatabase control.
 Local autonomy - local DBMSs cannot
communicate directly to synchronize
global transaction execution and the
multidatabase has no control over
local transaction execution.
 local concurrency control scheme needed
to ensure that DBMS‘s schedule is 157
8

Two-Level Serializability
 DBMS ensures local serializability
among its local transactions, including
those that are part of a global
transaction.
 The multidatabase ensures
serializability among global
transactions alone- ignoring the
orderings induced by local
transactions.
 2LSR does not ensure global
serializability, however, it can fulfill
requirements for strong correctness. 157
9

Two-Level Serializability (Cont.)
 Local-read protocol : Local
transactions have read access to
global data; disallows all access to
local data by global transactions.
 A transaction has a value dependency
if the value that it writes to a data item
at one site depends on a value that it
read for a data item on another site.
 For strong correctness: No transaction
may have a value dependency.
 Global-read-write/local-read protocol;
Local transactions have read access to
158
0

Global Serializability
 Even if no information is available
concerning the structure of the various
concurrency control schemes, a very
restrictive protocol that ensures
serializability is available.
 Transaction-graph : a graph with
vertices being global transaction
names and site names.
 An undirected edge (Ti, Sk) exists if Ti
is active at site Sk.
 Global serializability is assured if
transaction-graph contains no
158
1

Ensuring Global Serializability
 Each site Si has a special data item,
called ticket
 Every transaction Tj that runs at site Sk
writes to the ticket at site Si
 Ensures global transactions are
serialized at each site, regardless of
local concurrency control method, so
long as the method guarantees local
serializability
 Global transaction manager decides
serial ordering of global transactions
by controlling order in which tickets
158
2

158

Weak Levels Consistency

 Use alternative notions of consistency
that do not ensure serializability, to
improve performance.
 Degree-two consistency avoids
cascading aborts without necessarily
ensuring serializability.
 Unlike two-phase locking, S-locks may be
released at any time, and licks may be
acquired at any time.
 X-locks be released until the transaction
either commits or aborts.
158
4

Example Schedule with Degree-Two
Consistency
Nonserializable schedule with degree-
two consistency (Figure 20.5) where T3
reads the value if Q before and after that
value is writtenTby T4.
3 T4
lock-S (Q)
read (Q)
unlock (Q) lock-X (Q)
read (Q)
write (Q)
lock-S (Q) unlock (Q)
read (Q)
unlock (Q)

158
5

Cursor Stability
 Form of degree-two consistency
designed for programs written in
general-purpose, record-oriented
languages (e.g., Pascal, C, Cobol,
PL/I, Fortran).
 Rather than locking the entire
relation, cursor stability ensures
that
 The tuple that is currently being
processed by the iteration is locked in
shared mode.
 Any modified tuples are locked in
exclusive mode until the transaction
158
6

Dbms

More Related Content

What's hot (20)

Similar to Dbms (20)

Recently uploaded (20)

Dbms