Database Design
The process of designing the general structure of the database:
Logical Design – Deciding on the database schema. Database design
requires that we find a “good” collection of relation schemas.
Business decision – What attributes should we record in the database?
Computer Science decision – What relation schemas should we have
and how should the attributes be distributed among the various relation
schemas?
Physical Design – Deciding on the physical layout of the database
The Entity-Relationship Model
Models an enterprise as a collection of entities and relationships
Entity: a “thing” or “object” in the enterprise that is distinguishable
from other objects
Described by a set of attributes
Relationship: an association among several entities
Represented diagrammatically by an entity-relationship diagram:
Database Application Architectures
(web browser)
Two-tier Architecture Three-tier Architecture
Old Modern
Database Management System Internals
Storage management
Query processing
Transaction processing
Storage Management
Storage manager is a program module that provides the interface
between the low-level data stored in the database and the application
programs and queries submitted to the system.
The storage manager is responsible for:
Interaction with the file manager
Efficient storing, retrieving and updating of data
Issues:
Storage Access
File organization
Indexing and hashing
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Query Processing (Cont.)
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Cost difference between a good and a bad way of evaluating a query can
be enormous
Need to estimate the cost of operations
Depends critically on statistical information about relations which the
database must maintain
Need to estimate statistics for intermediate results to compute cost of
complex expressions
Transaction Management
What if the system fails?
What if more than one user is concurrently updating the same data?
A transaction is a collection of operations that performs a single
logical function in a database application
Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction
failures.
Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
Overall
System
Structure
Relational Model
Example of a Relation
Attribute Types
Each attribute of a relation has a name
The set of allowed values for each attribute is called the domain of the
attribute
Attribute values are (normally) required to be atomic; that is, indivisible
E.g. the value of an attribute can be an account number,
but cannot be a set of account numbers
Domain is said to be atomic if all its members are atomic
The special value null is a member of every domain
The null value causes complications in the definition of many operations
Relation Schema
A1, A2, …, An are attributes STUDENT(Name,
Ssn, Home_phone,
R = (A1, A2, …, An ) is a relation schema
Address,
Example: instructor = (ID, name, dept_name, salary) Office_phone, Age,
Formally, given domains D , D , …. D a relation r is a subset of
Gpa)
1 2 n
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
Schema of a relation consists of
attribute definitions
name
type/domain STUDENT(Name: string, Ssn:
integrity constraints string, Home_phone: string,
Address: string, Office_phone:
string, Age: integer, Gpa:
real)
Relation Instance: Unordered
The current values (relation instance) of a relation are specified by a
table
An element t of r is a tuple, represented by a row in a table
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
attributes
(or columns)
customer_name customer_street customer_city
Jones Main Harrison
Smith North Rye tuples
Curry North Rye (or rows)
Lindsay Park Pittsfield
customer
Database
A database consists of multiple relations
Information about an enterprise is broken up into parts, with each
relation storing one part of the information
E.g.
account : information about accounts
depositor : which customer owns which account
customer : information about customers
The customer The account
Relation Relation
The depositor
Relation
Why Split Information Across Relations?
Storing all information as a single relation such as
bank(account_number, balance, customer_name, ..)
results in
repetition of information
e.g.,if two customers own an account (What gets
repeated?)
the need for null values
e.g., to represent a customer without an account
Normalization theory deals with how to design relational
schemas
Keys
Let K R
K is a superkey of R if values for K are sufficient to identify a unique tuple of
each possible relation r(R)
by “possible r ” we mean a relation r that could exist in the enterprise
we are modeling.
Example: {customer_name, customer_street} and
{customer_name}
are both superkeys of Customer, if no two customers can possibly have
the same name
In real life, an attribute such as customer_id would be used instead of
customer_name to uniquely identify customers, but we omit it to keep
our examples small, and instead assume customer names are unique.
Keys (Cont.)
K is a candidate key if K is minimal
Example: {customer_name} is a candidate key for Customer, since it
is a superkey and no subset of it is a superkey.
Primary key: a candidate key chosen as the principal means of
identifying tuples within a relation
Should choose an attribute whose value never, or very rarely,
changes.
E.g. email address is unique, but may change
Foreign Keys
A relation schema may have an attribute that corresponds to the primary
key of another relation. The attribute is called a foreign key.
E.g. customer_name and account_number attributes of depositor are
foreign keys to customer and account respectively.
Only values occurring in the primary key attribute of the referenced
relation may occur in the foreign key attribute of the referencing
relation.
Schema Diagram