Phases of Database Design
Phases of Database Design
Data
Requirements Conceptual design begins with the
Specification of requirements
and results
collection of requirements and results
Conceptual
needed from the database (ER Diag.)
Design
Logical schema is a description of the
Conceptual Schema structure of the database (Relational,
Logical Network, etc.)
Design
Physical schema is a description of
Logical Schema
the implementation (programs, tables,
Physical dictionaries, catalogs
Design
Physical Schema
1
Overview of Database Design
• Requirements Analysis: Understand what data will be stored in
the database, and the operations it will be subject to.
• Conceptual Design: (ER Model is used at this stage.)
• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should
we store in the database?
• What are the integrity constraints or business rules that hold?
• A database `schema’ in the ER Model can be represented
pictorially (ER diagrams).
• Can map an ER diagram into a relational schema.
• Logical Design: Convert the conceptual database design into the
data model underlying the DBMS chosen for the application.
Overview of Database Design (cont.)
since Employees
name dname
super-
ssn lot did budget visor subor-
dinate
Reports_To
Employees Works_In Departments
• Relationship: Association among two or more entities. E.g., Peter works in
Pharmacy department.
• Relationship Set: Collection of similar relationships.
• An n-ary relationship set R relates n entity sets E1 ... En; each relationship in
R involves entities e1 E1, ..., en En
• Same entity set could participate in different relationship sets, or in different
“roles” in same set.
• Relationship sets can also have descriptive attributes (e.g., the since attribute of
Works_In). A relationship is uniquely identified by participating entities
without reference to descriptive attributes.
Self Relationship
Sometimes entities in a entity set may relate to other
entities in the same set. Thus self relationship
Here employees mange some other employees
The labels “manger” and “worker” are called roles
the self relationship
Key Constraints
(a.k.a. Cardinality) since
name dname
• Consider Works_In (in
ssn lot did budget
previous slide): An
employee can work in
many departments; a Employees Manages Departments
dept can have many
employees.
• In contrast, each dept
has at most one
manager, according to
the key constraint on
Manages.
1-to-1 1-to Many Many-to-1 Many-to-Many
Constraints are IMPORTANT because they must be ENFORCED
when IMPLEMENTING the database
Key Constraints name
(ternary relationships)
Location
name dname
ssn lot did budget
Each employee can work at
most in one department at
a single location
Employees works_In Departments
12-233 D10
12-354 • D12
12-243
•
• Rome
D13
12-299 •
London
Paris
Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint: the participation of Departments
in Manages is said to be total (vs. partial).
• Every Department MUST have at least an employee
• Every employee MUST work at least in one department
• There may exist employees managing no department
since
name dname
ssn lot did budget
Works_In
since
Weak Entities
• A weak entity can be identified uniquely only by considering the
primary key of another (owner) entity.
• Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities).
• Weak entity sets must have total participation in this identifying relationship
set.
• transac# is a discriminator within a group of transactions in an ATM.
address
atmID since transac# amount
type
ATM Transactions
name
ssn lot
ISA (`is a’) Hierarchies
Employees
As in C++, or other PLs, attributes
are inherited. hourly_wages hours_worked
ISA
Ifwe declare A ISA B, every A entity is contractid
also considered to be a B entity.
Hourly_Emps Contract_Emps
Aggregation Employees
Customer Rep
1
Customernum (PK) Repnum (PK)
Customername Lastname
Street Firstname
City Street
State City
Zip State
Balance Zip
CreditLimit Commission
Repnum (FK) M Rate
One Rep could have one or more customer (one to many) using
Primary and foreign key to create the relationship.
Notation example
Customer Rep
1
Customernum (PK) Repnum (PK)
Customername Lastname
Street Firstname
City Street
State City
Zip State
Balance Zip
CreditLimit Commission
Repnum (FK) M Rate
Always remember the many side has the Foreign key. In this case the many side is on the Customer
table and therefore has the foreign key Repnum which is the primary key of Rep table.
Referential Integrity
Referential Integrity means that the Foreign key must match in terms of
actual values and data types with the related Primary Key.
Referential Integrity
Example:
Attribute
Cardinality = 2
tuple/relational instance
SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
4 Degree
A Schema / Relation
From ER Model to Relational Model
So… how do we convert an ER diagram into a table?? Simple!!
Basic Ideas:
Build a table for each entity set
Make a column in the table for each attribute in the entity set
Primary Key
Contraints
Example – Strong Entity Set
SID Name SSN Name
Major Dept
GPA
◦ Construct a table with one column for each attribute in the weak entity set
◦ Remember to include discriminator
◦ Augment one extra column on the right side of the table, put in there the primary
key of the Strong Entity Set (the entity set that the weak entity set is depending on)
◦ Primary Key of the weak entity set = Discriminator + foreign key
Example – Weak Entity Set
Age
SID Name Name
Major GPA
Major GPA
P-Key3
E-Set 3
SSN Name
Street City
Representing Multivalue Attribute
For each multivalue attribute in an entity set/relationship set
◦ Build a new relation schema with two columns
◦ One column for the primary keys of the entity set/relationship set that has the
multivalue attribute
◦ Another column for the multivalue attributes. Each cell of this column holds only
one value. So each value is represented as an unique tuple
◦ Primary key for this schema is the union of all attributes
Example – Multivalue attribute
SID Name The primary key for this table is
Student_SID + Children, the union of
all attributes
Children
Student
Major GPA
Stud_SID Children
1234 Johnson
1234 Mary
SID Name Major GPA
5678 Bart
1234 John CS 2.8
5678 Lisa
5678 Homer EE 3.6
5678 Maggie
Representing Class Hierarchy
Two general approaches depending on disjointness and completeness
◦ For non-disjoint and/or non-complete class hierarchy:
create a table for each super class entity set according to normal entity set translation method.
Create a table for each subclass entity set with a column for each of the attributes of that entity
set plus one for each attributes of the primary key of the super class entity set
This primary key from super class entity set is also used as the primary key for this new table
Example SSN Name
Person
SID Status
Gender
ISA
Student
Major GPA
SSN Name Gender
1234 Homer Male
5678 Marge Female
SJSU people
ISA
SID
Student Faculty
Disjoint and Complete
mapping
Major GPA Dept
Dept
SID
Name
member
Primary Key of Advisor
Dept
SID Code
Code
1234 04
Primary key of Dept
5678 08
Query Processing
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Basic Steps in Query Processing (Cont.)
Parsing and translation
◦ translate the query into its internal form. This is then translated into relational algebra.
◦ Parser checks syntax, verifies relations
Evaluation
◦ The query-execution engine takes a query-evaluation plan, executes that plan, and returns the
answers to the query.
Basic Steps in Query Processing : Optimization
A relational algebra expression may have many equivalent expressions
◦ E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
Each relational algebra operation can be evaluated using one of several different algorithms
◦ Correspondingly, a relational-algebra expression can be evaluated in many ways.
Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.
◦ E.g., can use an index on salary to find instructors with salary < 75000,
◦ or can perform complete relation scan and discard instructors with salary 75000
Relational Algebra
Fundamentals
What is Relational Algebra?
PK Sno PK Pno
(0,n) (1,n)
Sname Pdesc
Location O_date Colour
Supplier Part
Sno Sname Location Pno Pdesc Colour
Supplies
Sno Pno O_date
s1 p1 nov 3
s2 p2 nov 4
s3 p1 nov 5
s3 p3 nov 6
s4 p1 nov 7
s4 p2 nov 8
s4 p4 nov 9
SELECTION:
alternatively
σ <condition> (table_name)
Supplier
Answer
Sno Sname Location
Sno Sname Location
Select Supplier
s1 Acme NY
s2 Ajax Bos
where Location = ‘Bos’
s2 Ajax Bos
s3 Apex Chi
s4 Ace LA σ Location = ‘Bos’ (Supplier)
s5 A-1 Phil
SELECTION Exercise:
alternatively
• Observations:
– There is only one input table.
– Both Cardholder and the answer table have the same schema (list of columns)
– Every row in the answer has the value ‘Modena’ in the b_addr column.
SELECTION:
same schema
Answer
alternatively
Supplier Answer
Sname
Sno Sname Location
Project Supplier over Sname
Acme
s1 Acme NY
Ajax
s2 Ajax Bos π Sname (Supplier) Apex
s3 Apex Chi
Ace
s4 Ace LA
A-1
s5 A-1 Phil
PROJECTION Exercise:
alternatively
π b_addr (Cardholder)
• Observations:
– There is only one input table.
– The schema of the answer table is the list of columns
– If there are many Cardholders living at the same address these are not duplicated
in the answer table.
PROJECTION:
schema of answer table
is the same as the list of
columns in the query
Answer
• The Cartesian product of two sets is a set of pairs of elements (tuples), one from each
set.
• If the original sets are already sets of tuples then the tuples in the Cartesian product
are all that bigger.
• Syntax:
<table_name> x <table_name>
• As we have seen, Cartesian products are usually unrelated to a real-world thing. They
normally contain some noise tuples.
• However they may be useful as a first step.
CARTESIAN PRODUCT:
5 rows 4 rows
Supplier Part
Sno Sname Location Pno Pdesc Colour
Names x Addresses
Names x Addresses
. noise
.
Names x Addresses
.
Info =
project cardholder
over b_name, b_addr
How many rows? 36
UNION:
alternatively
∩
Table1 Table2
• Observations:
– This operation is impossible unless both tables involved have the same schemas.
Why?
– Because rows from both tables must fit into a single answer table; hence they must
“look alike”.
– Because some rows might already belong to both tables
UNION Example:
∩
Answer = Part1Suppliers Part2Suppliers
Part1Suppliers
union
Part1Suppliers Part2Suppliers Part2Suppliers
Sno Sno Sno
s1 s2 s1
s3 s4 s2
s4 s3
s4
UNION Exercise:
• Find the borrower numbers of all borrowers who have either borrowed or reserved a
book (any book).
Reservers = project Reserves over borrowerid
Borrowers = project Borrows over borrowerid
Answer = Borrowers union Reservers
alternatively
Reservers = πborrowerid (Reserves)
Borrowers = πborrowerid(Borrows)
∩
Answer = Borrowers Reservers
Borrowers
union
Reservers
Borrowers Reservers
borrowerid
borrowerid borrowerid
1234
1234 1345 1325
1325 1325 2653
2653 9823 7635
not duplicated
7635 2653 9823
9823 7635 5342
5342 1345
INTERSECTION:
alternatively
Table1 ∩
Table2
• Observations:
– This operation is impossible unless both tables involved have the same schemas.
Why?
– Because rows from both tables must fit into a single answer table; hence they must
“look alike”.
INTERSECTION Example:
Part1Suppliers = project (select Supplies where Pno = ‘p1’) over Sno
Part2Suppliers = project (select Supplies where Pno = ‘p2’) over Sno
Answer = Part1Suppliers
∩
Part2Suppliers
Part1Suppliers
intersect
Part1Suppliers Part2Suppliers
Part2Suppliers
Sno Sno
Sno
s1 s2
s3 s4 s4
s4
INTERSECTION Exercise:
• Find the borrower numbers of all borrowers who have borrowed and reserved a book.
Answer = Borrowers ∩
Reservers Borrowers
intesect
Borrowers Reservers Reservers
borrowerid borrowerid borrowerid
1234 1345 1325
1325 1325 2653
2653 9823 7635
7635 2653 9823
9823 7635
5342
SET DIFFERENCE:
alternatively
Table1 \ Table2
• Observations:
– This operation is impossible unless both tables involved have the same schemas.
Why?
– Because it only makes sense to calculate the set difference if the two sets have
elements in common.
SET DIFFERENCE Example:
Part1Suppliers = project (select Supplies where Pno = ‘p1’) over Sno
Part2Suppliers = project (select Supplies where Pno = ‘p2’) over Sno
Part1Suppliers
minus
Part1Suppliers Part2Suppliers Part2Suppliers
Sno Sno Sno
s1 s2
s4
s1
s3
s4 s3
SET DIFFERENCE Exercise:
• Find the borrower numbers of all borrowers who have borrowed something and
reserved nothing.
Reservers = project Reserves over borrowerid
Borrowers = project Borrows over borrowerid
Borrowers
Answer = Borrowers \ Reservers minus
Borrowers Reservers Reservers
borrowerid borrowerid borrowerid
1234 1345
1234
1325 1325
5342
2653 9823
7635 2653
9823 7635
5342