RDBMS Section A
RDBMS Section A
Lesson No.
1.1 Introduction
1.2 Database Architecture
1.3 Entity Relationship Model
1.4 Relational Data Model
1.5 Data Models, Keys and Languages
1.6 Relational Algebra
1.7 Data Base Design
1.8 Normalization
1.9 Data Base Integrity and Recovery
Website : [Link]
Syllabus
Course Objective:
● To give fundamental knowledge database and management system.
● To explain the basic concepts of architecture of database.
● To make the learners acquainted with the use of data management issues .
Learning Outcome:
● Understanding the core terms, concepts, and tools of relational database
management systems.
● Understanding database design and logic development for database
programming.
● Apply SQL or Relational Algebra operations to find solutions for a given
application
● Apply normalization techniques to improve database design
SECTION-A
Database Management Systems: Definition, Characteristics, Advantages of Using
DBMS Approach and disadvantages of traditional file environment systems, Three
Schema Architecture, Data Independence – Physical and Logical data Independence,
Database Administrators and its responsibilities.
Introduction to ER Model: Weak Entity Sets, Strong Entity Sets, mapping
cardinalities, generalization, specialization, aggregation.
Relational Database [RDBMS]: The Relational Database Model, Concepts and
Terminology, Characteristics of Relations, CODD’s 12 rules for a fully RDBMS.
Constraints: Integrity Constraints- Entity, Domain and Referential Integrity
constraints, Business Rules, Keys - Super Keys, Candidate Keys, Primary Keys,
Secondary Keys and Foreign Keys.
Relational Algebra: Basic Operations, Additional Operations, Example Queries.
Normalization: Functional Dependency, Full Functional Dependency, Partial
Dependency, Transitive Dependency, Normal Forms.
SECTION-B
Transaction Management: Transaction Concept, ACID Properties, Transaction
States. Database Concurrency: Problems of Concurrent databases, Serializability
and Recoverability, Concurrency Control Methods - Two Phase Locking,
Timestamping. Database Recovery: Recovery Concepts, Recovery Techniques-
Deferred update, Immediate Update, Shadow Paging.
Introduction to Oracle: Oracle as client/server architecture, getting started, creating,
modifying, and dropping databases. Tables - Inserting, updating, deleting data from
databases, SELECT statement, Data constraints (Null values, Default values,
primary, unique and foreign key concepts), Queries for Relational Algebra.
Computing expressions, renaming columns, logical operators, range searching,
pattern matching, Oracle functions, grouping data from tables in SQL, manipulating
dates.
Working with SQL: triggers, use of data base triggers, types of triggers, how to apply
database triggers, BEFORE vs. AFTER triggers, combinations, syntax for creating
and dropping triggers.
Introduction
1.1.0 Objectives
1.1.1 Introduction
1.1.2 Data and Information
1.1.3 Traditional File Processing System
1.1.4 Limitations of File Processing System
1.1.5 Database and Database System
1.1.6 Components of database system
1.1.7 Database Management System
1.1.8 Advantages of DBMS over Traditional File Processing System
1.1.9 Summary
1.1.10 Keywords
1.1.11Short Answer Type Questions
1.1.12 Long Answer Type Questions
1.1.13 Suggested Readings
1.1.0 Objective
After completing this lesson, you will be able to:
• Define data and information
• Discuss traditional file processing system, its characteristics and
limitations
• Define database and explain its components
1.1.1 Introduction:
Data is termed as the collection of meaningful unorganized facts, concepts or
instructions. Information is processed form of data. Traditional File Processing
System is a method for storing and organizing data stored in computer files. But
Traditional File Processing has significant disadvantages of data redundancy, lack
of flexibility, data dependency etc.
In this lesson, we first provide the formal definitions of data, information, and
traditional file processing system. Then we define the limitations of traditional file
processing system and finally discuss the concept of database
1
BCA Sem-4 PAPER: BCAB2204T
Processing
DATA INFORMATION
The term data refers to factual information especially that used for analysis and
based on reasoning or calculation. Data itself has no meaning, but becomes
information when it is interpreted. Information is a collection of facts or data that
is communicated. However, in many contexts they are considered and are used as
synonyms. Data, by the way, is the plural of datum. Information comes from Latin
informationem 'concept, idea' or 'outline'. The relationship between data and
information works this way:
2
BCA Sem-4 PAPER: BCAB2204T
…….
…….
In the similar manner, Department of Computer Science (DCS) will also maintain
the record if its employee for recording the day to day activities of the employees.
Thus this file maintained by DCS will also contain the name, DOB, DOJ,
permanent address, correspondence address and other field specific to department.
Formally, a record is a collection of related fields that are treated as a single unit.
Further all the records are grouped together to form a file. All the related files are
grouped together to form a database. Thus, in the above example, all the files of the
university are interrelated to each other, when grouped together they form database
of the university. The database of the university consists of account file, student
record file, employee record file etc.
So, Traditional file processing system has for each application a separate master
file and its own set of personal files. Such file based approaches which came into
being with the first commercial application of computers did provide an increased
efficiency in the data processing compared to earlier manual paper record-based
systems as the demand for efficiency and speed increased, the computer-based
simple file oriented approach to information processing started suffering from
number of limitations which is explained in the following section.
3
BCA Sem-4 PAPER: BCAB2204T
requires to know the format of the file from which the data is to be shared.
So, it becomes very difficult.
4. Incompatible File Formats: Each programmer stores the data in the files in
the format as per his choice as there is no standard file format for storing
the files. Thus, it becomes very difficult to share data among the files having
different format. Even If some other programmer has to work on a file stored
by different programmer, he has first understood the file format of that file
and then starts working on that file.
5. Lack of Data Security: The data stored in the database must be protected
from unauthorized access. Since in traditional File Processing System, the
application programs are added to the system on the basis of queries which
are not predefined so it is difficult to enforce security measures on these
application programs.
6. Lack of data integrity: Data Integrity means data correctness. For example:
Basic Pay cannot be negative. Such type of data constraints can be imposed
in the traditional file processing system by writing appropriate code in the
application programs. But in future, such type of more constraints is to be
added, we need to modify the application program and sometimes it is not
possible to change the application code. Thus it may lead to poor data which
may result in bad/wrong decisions.
7. Data Dependency: Data Dependence means it is impossible to change the
storage structure without affecting the application program. For example, if
we change the delimiter separating the fields in the records of a file from tab
to double space, we must change the code in the application program that
access data from that file as the structure of file get changed. This losses the
independence of changing the storage structure without changing the
application code.
8. Lack of Flexibility: In Traditional File Processing System, programmers are
already told about which type of queries need to be answered by the
application. And programmers code that query using the data files for that
application. But in the fast moving and competent business environment of
today, apart from such regular queries, there is a need for responding to un-
anticipatory queries, and then the system will fail. Then the programmer
again has to code for such queries. Thus it limitation leads to lack of
flexibility of the file system.
9. Inadequate data modeling of real world: The file system approach has
inability to design a database which shows the basic entities, relationships
and events facing the organization every day. Complex data and interfile
relationships cannot be formally defined to the system.
10. Concurrency Problem: Concurrency means simultaneous access of the
same file by two or more users. When data in a file is simultaneously
accessed by two or more users for updating the data, there must be system
that finally does not lead to inconsistent data. But in this file system, it is
not possible to implement such feature. If possible, it is very difficult to
implement.
4
BCA Sem-4 PAPER: BCAB2204T
1.1.5 Database and Database System: Database is collection of related data. The
database can be of varying sizes and varying complexity. In order to overcome the
limitations of the traditional file processing system, the concept of Database
Systems was introduced. A database system is basically a computerized record
keeping system i.e. it is a computerized system whose overall purpose is to
maintain information and to make the information available on demand.
In other words, A database is a collection of interrelated data stored in a
database server. These data will be stored in the form of tables. The primary aim of
database is to provide a way to store and retrieve database information in fast and
efficient manner. There are number of characteristics that differ from traditional file
management system. In file system approach, each user defines and implements
the needed files for a specific application to run. For example in sales department of
an enterprise, One user will be maintaining the details of how many sales personnel
are there in the sales department and their grades, these details will be stored and
maintained in a separate file. Another, user will be maintaining the salesperson
salary details working in the concern, the detailed salary report will be stored and
maintained in a separate file.
Although both of the users are interested in the data of the salespersons
they will be having their details in a separate file and they need different programs
to manipulate their files. This will lead to wastage of space and redundancy or
replication of data, which may lead to confusion, sharing of data among various
users is not possible, data inconsistency may occur. These files will not be having
any inter-relationship among the data’s stored in these files. Therefore in traditional
file processing every user will be defining their own constraints and implement the
files needed for the applications.
In database approach, a single repository of data is maintained that is
defined once and then accessed by many users. The fundamental characteristic of
database approach is that the database system not only contains data’s but it
contains complete definition or description of the database structure and
constraints. These definitions are stored in a system catalog, which contains the
information about the structure and definitions of the database. The information
stored in the catalog is called the metadata, it describes the primary database.
Hence this approach will work on any type of database for example, insurance
database, Airlines, banking database, Finance details, and Enterprise information
database. But in traditional file processing system the application is developed for a
specific purpose and they will access specific database only.
The other main characteristic of the database is that it will allow multiple
users to access the database at the same time and sharing of data is possible. The
database must include concurrency control software to ensure that several users
trying to update the same data at the same time, it should maintain in a controlled
manner. In file system approach many programmers will be creating files over a
long period and various files have different format, in various application
languages.
A multi-user database whose users have variety of applications must provide
facilities for defining multiple views. In traditional file system, if any changes are
5
BCA Sem-4 PAPER: BCAB2204T
made to the structure of the files if will affect all the programs, so changes to the
structure of a file may require changing of all programs that access the file. But in
case of database approach the structure of the database is stored separately in the
system catalog from the access of the application programs. This property is known
as program-data independence.
Database can be used to provide persistent storage for program objects and
data structures that resulted in object oriented database approach. Traditional
systems suffered from impedance mismatch problem and difficulty in accessing the
data, which is avoided in object oriented database system. Database can be used to
represent complex relationships among data as well as to retrieve and update
related data easily and efficiently.
It is possible to define and enforce integrity constraints for the data’s stored
in the database. The database also provides facilities for recovering hardware and
software failures. The backup and recovery subsystem is responsible for recovery. It
reduces the application development time considerably when compared to the file
system approach and availability of up-to-date information of all the users. It also
provides security to the data stored in the database system.
Self-Check Exercise-I
Q1. Why do we need database?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
Q2. What is data inconsistency?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
The database has five files, each which stores data records of the same type.
6
BCA Sem-4 PAPER: BCAB2204T
• Self contained nature of database systems (database contains both data and
meta-data).
• Data Independence: application programs and queries are independent of
how data is actually stored.
• Data sharing.
• Controlling redundancies and inconsistencies.
• Secure access to database; restricting unauthorized access.
• Enforcing Integrity Constraints.
• Backup and Recovery from system crashes.
• Support for multiple-users and concurrent access.
Database
Data
And
Application Programs
Programs End Users
7
BCA Sem-4 PAPER: BCAB2204T
Hardware:
The hardware of the database system consists of:
The secondary storage devices, usually, magnetic disks, hard disks, CD-ROMs,
Floppy Disks - that are used to hold the stored data, together with the associated
I/O devices, device controllers etc. The processor(s) and associated main
memory that are used to support the execution of the database system
software.
Software:
Software layer is present in the database system between the physical database
itself i.e. where the data is actually stored and the users of the system and is
known as Database Management System (DBMS). All the requests from the user for
access to the database are handled by DBMS. On general function of DBMS is thus
the shielding of database users from hardware level details. Basically, DBMS acts
like bridge between users and database.
Users:
There are a number of users who can access or retrieve data on demand using
application programs and interfaces provided by DBMS. Each type of user needs
different software capabilities. Following are the categories of the users:
Application Programmers
End Users
Database Administrator (DBA)
End Users: End Users are those who need not know about the presence of
database system or any other system supporting their usage. These users interact
with the system via the interface (menu- or- form driven) provided by DBMS. For
example: Users of Automatic Teller machines (ATM) falls under this category.
8
BCA Sem-4 PAPER: BCAB2204T
systems that run on personal computers to huge systems that runs on mainframes.
Some of common examples of database applications are as follows:
• Billing System at Super Stores
• Patient Management System
• Student Record Management System
• Computerized Account Department at Educational Institutes
• Computerized Flight Reservation System
The DBMS relieves the user from knowing how data is stored physically and the
complex algorithms used for performing operations on the database.
Conceptually, what happens is the following:
1. A user issues an access request, using some particular data sublanguage
(DDL, DML, and DCL using SQL).
2. The DBMS intercepts that request and analyzes it.
3. The DBMS inspects, in turn, the external schema for that user, the
corresponding external/conceptual mapping, the conceptual schema, the
conceptual/internal mapping, and the storage structure definition.
4. The DBMS executes the necessary operations on the stored database.
9
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-II
Q3. What is difference between DBMS and traditional file system?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
Q4. Does DBMS maintain integrity of data?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
10
BCA Sem-4 PAPER: BCAB2204T
For example: In case of college database, there is some common data of the
student which has to be mentioned in each application, like Rollno, Name, Class,
Phone no, Address etc. This will cause the problem of redundancy which results in
wastage of storage space and difficult to maintain, but in case of centralized
database, data can be shared by number of applications and the whole college can
maintain its computerized data with the database containing Roll no, Name, Class,
Father-Name, Address, Phone-No, Date of birth which are stored repeatedly in file
system in file system in each application, need not be stored repeatedly, because
every other application can access this information by joining of relations on the
basis of common column i.e. Roll no. Suppose any user of Library system need the
Name, Address of any particular student and by joining of Library and General
Office relations on the basis of column Roll no he\she can easily retrieve this
information. Thus we can say that centralized system of DBMS reduces the
redundancy of data to great extent but cannot eliminate the redundancy because
Roll no is still repeated in all the relations.
11
BCA Sem-4 PAPER: BCAB2204T
two entries of the same object do not agree with each other (that is one is updated
and other is not). At such time the database is said to be inconsistent.
An inconsistent database is capable of supplying incorrect or conflicting
information. So, there should be no inconsistency in database. It can be clearly
shows that inconsistency can be avoided in centralized system very well as
compared to file system.
Let us consider again the example of college system and suppose that
RollNo5 is shifted from Amritsar to Jalandhar, and then address information of Roll
Number 5 must be updated, whenever Roll number and address occurs in the
system. In case of file system, the information must be updated separately in each
application, but if we make updation only at three places and forget to make
updation at fourth application, the whole system shows the inconsistent results
about Roll Number 5.
In case of DBMS, Roll number and address occurs together only single time
in General-Office table. So, it needs single updation and then all other application
retrieve the address information from General-Office which is updated so, all
application will get the current and latest information by providing single update
operation and this single update operation is propagated to the whole database or
all application will get the current and latest information by providing single update
operation and this single updated operation is propagated to the whole database or
all other application automatically, this property is called as Propagation of Update.
We can say the redundancy of data greatly affect the consistency of data. If
redundancy is less, it is easy to implement consistency of data. Thus DBMS system
can avoid inconsistency to great extent.
12
BCA Sem-4 PAPER: BCAB2204T
be permitted only to retrieve data, whereas other is allowed both to retrieve and to
update. Hence the type of access operation retrieval or update must also be
controlled. Typically, users or user groups are given account numbers protected by
passwords, which they can use to gain access to the database. DBMS should
provide a security and authorization subsystem, which the DBA users to create
accounts and to specify account restrictions. T he DBMS should then enforce these
restrictions automatically.
1.1.9 Summary
Data is defined as a collection of meaningful facts which can be stored and
processed by computer or humans. Information is processed form of data which
helps us in making decisions. Traditional file processing system has for each
application a separate master file and its own set of personal files. a record is a
collection of related fields that are treated as a single unit. Further all the records
are grouped together to form a file. All the related files are grouped together to form
a database. the computer-based simple file oriented approach to information
processing started suffering from number of limitations like data redundancy, data
inconsistency, difficulty in sharing data, concurrency problems etc.
In order to overcome the limitations of the traditional file processing system,
the concept of Database Systems was introduced. A database system is basically a
computerized record keeping system i.e. it is a computerized system whose overall
purpose is to maintain information and to make the information available on
demand. A database system comprises of four major components namely, data,
hardware, software, and users.
13
BCA Sem-4 PAPER: BCAB2204T
1.1.10 Keywords
DBMS: Database Management System
Application Programs: Software runs on system to help users in performing
various tasks
Concurrency: Ability of a system to handle multiple tasks or processes
simultaneously
Data Integrity: Data integrity refers to the accuracy, consistency, and reliability of
data in a database or information system.
Data Inconsistency: Data inconsistency refers to situations in which there are
discrepancies or contradictions in information stored within a dataset, database, or
system.
Query Language: A query language is a computer programming language used to
communicate with and manipulate databases or information systems.
14
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
Database Architecture
1.2.0 Objectives
1.2.1 Introduction
1.2.2 DBMS Architecture
1.2.3 Data Independence
1.2.4 Mapping between Different Levels
1.2.5 Users of database
1.2.6 DBA and its responsibilities
1.2.7 Database Schema and Instance
1.2.8 Summary
1.2.9 Keywords
1.2.10 Short Answer Type Questions
1.2.11 Long Answer Type Questions
1.2.12 Suggested Readings
1.2.0 Objectives
After completing this lesson, you will be able to:
• Three Level DBMS Architecture or ANSI/SPARC Model
• Data Independence
• Mappings between Different levels of DBMS Architecture
• Users of database
• Roles played by DBA
1.2.1 Introduction
The Three Levels of the Architecture of DBMS is also known as ANSI/SPARC Model.
The goal of this three level architecture is to separate the user applications and the
physical database. The ANSI/SPARC model is divided into three levels, known as the
internal, conceptual, and external levels. In a DBMS based on three-schema
architecture, each user group refers to its own external schema. Hence, the DBMS
must transform a request specified on an external schema into a request against the
conceptual schema, and then into a request on the internal schema for processing
over the stored database.
The process of transforming requests and results between levels are called
mappings. There are two levels of mappings in the architecture – Conceptual/internal
15
BCA Sem-4 PAPER: BCAB2204T
mapping and external/conceptual mapping. The three level architecture is then used
to explain the concept of data independence, which can be defined as the capacity to
change the schema at one level of a database system without having to change the
schema at the next higher level. There are two types of data independence – Logical
data Independence and Physical data independence.
END USERS
External/conceptual mapping
Conceptual/internal mapping
CONCEPTUAL LEVEL
CONCEPTUAL VIEW
INTERNAL LEVEL
INTERNAL VIEW
STORED DATABASE
16
BCA Sem-4 PAPER: BCAB2204T
Internal View
It is at the lowest level of abstraction, closest to the physical storage method used. It
indicates how the data will be scored and describes the data structures and access
methods to be used by the database. The internal view is expressed by the internal
schema, which contains the definition of the stored record, the method of
representing the data fields, and the access aids used. At least the following aspects
are considered at this level:
17
BCA Sem-4 PAPER: BCAB2204T
Efficiency considerations are the most important at this level and the data structures
are chosen to provide an efficient database. The internal view does not deal with the
physical devices directly. Instead it views a physical device as a collection of physical
pages and allocates space in terms of logical pages.
The separation of the conceptual view from the internal view enables us to provide a
logical description of the database without the need to specify physical structures.
This is often called physical data independence. Separating the external views from
the conceptual view enables us to change the conceptual view without affecting the
external views. This separation is sometimes called logical data independence.
Assuming the three level view of the database, a number of mappings are needed to
enable the users working with one of the external views. For example, the payroll
office may have an external view of the database that consists of the following
information only:
The conceptual view of the database may contain academic staff, general staff, casual
staff etc. A mapping will need to be created where all the staff in the different
categories are combined into one category for the payroll office. The conceptual view
would include information about each staff's position, the date employment started,
full-time or part-time, etc etc. This will need to be mapped to the salary level for the
salary office. Also, if there is some change in the conceptual view, the external view
can stay the same if the mapping is changed.
18
BCA Sem-4 PAPER: BCAB2204T
19
BCA Sem-4 PAPER: BCAB2204T
2. It also hides the implementation details or hardware level details from its users
so that the users can concentrate on the program only.
3. The various changes made at different levels are absorbed by mapping at
deferent level.
4. Physical data independence allows the changes to be made in storage of data
without affecting application programs.
5. Logical data independence do not cases effect to application programs even if,
new fields are added or old records are deleted from the existing data.
6. Physical data independence allows the files to migrate from one kind of storage
device to another.
7. Logical data independence do not required the external schema to be changed.
8. Physical data independence do not requires the internal schema to be changed.
20
BCA Sem-4 PAPER: BCAB2204T
Mapping between the conceptual and the internal level specifies the method of
deriving the conceptual record from the physical database. Again, differences similar
to those that exist between external and conceptual views could exist between the
conceptual and internal views. Such differences are indicated and resolved in the
mapping.
Differences that could exist, besides the difference in names, include the following;
• Representation of numeric values could be different in the two views. One view
could consider a field to be decimal, whereas the other view may regard the
field as binary. A two-way transformation between such vales can be easily
incorporated in the mapping. If, however, the values are stored in a binary
format, the range of values may be limited by the underlying hardware.
• Representation of string data can be considered by the two views to be coded
differently. One view may perceive the string data to be in ASCII code, the other
view may consider the data to be in EBCDIC code. Again, two-way
transformation can be provided.
• The value for a field in one view could be computed from the values in one or
more fields of the other view. For example, the external view may use a field
containing a person’s age, whereas the conceptual view contains the date of
birth. The age value could be derived from the date of birth by using a date
function available from the operating system. Another example of a computed
field would be where an external view requires the value of the hours worked
during a week in a field, whereas the conceptual view contains fields
representing the hours worked each day of the week. The former can be derived
from the later by simple addition. These two examples of transformation
between the external and conceptual views are not bidirectional. One cannot
uniquely reflect a change in the total hours worked during a week to hours
worked during each day of the week. Therefore, a user’s attempt to modify the
corresponding external fields will not be allowed by the DBMS.
Such mapping between the conceptual and internal levels is a correspondence
that indicates how each conceptual record is to be stored and the
characteristics and size of each field of the record. Changing the storage
structure of the record involves changing the conceptual view to internal view
mapping so that the conceptual view does not require any alteration.
The conceptual view can assume that the database contains a sequence of
records for each conceptual record type. These records could be accessed
sequentially or randomly. The actual storage could have been done to optimize
performance. A conceptual record may be split into two records, with the less
frequently used record (part of the original record) on a slower storage device
and the more frequently used, record, on a faster device. The stored record
could be in a physical sequence, or one or more indices may be implemented
for faster access to record occurrences by the index fields. Pointers may exist
in the physical records to access the next record occurrence in various orders.
These structures are hidden from the conceptual view by the mapping between
the two.
21
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-II
Q3: Why is it necessary to enforce standards in Databases?
Ans...................................................................................................................
….....................................................................................................................
….....................................................................................................................
Q4: What are the different roles played by stakeholders in DBMS?
Ans...................................................................................................................
….....................................................................................................................
….....................................................................................................................
22
BCA Sem-4 PAPER: BCAB2204T
DDL. The object (compiled) form of that schema will be used by the DBMS in
responding to access requests. The source (uncompiled) form will act as a reference
document for the users of the system.
Defining the internal schema
The DBA must also decide how the data is to be represented in the stored database.
This process is usually referred to as physical database design. Having done the
physical design, the DBA must then create the corresponding storage structure
definition (i.e., the internal schema), using the internal DDL. In addition,
Liaising with users
It is the business of the DBA to liaise with users, to ensure that the data they require
is available, and to write (or help the users write) the necessary external schemas,
using the applicable external DDL. In addition, the mapping between any given
external schema and the conceptual schema must also be defined. In practice, the
external DDL will probably include the means for specifying that mapping should be
clearly separable.
Defining security and integrity procedures
The conceptual DDL should include facilities for specifying security and integrity rules
that can be regarded as part of the conceptual schema.
Defining backup and recovery procedures
Once an enterprise is committed to a database system, it becomes critically dependent
on the successful operation of that system. In the event of damage to any portion of
the database-caused by human error, say, or a failure in the hardware or supporting
operating system-it is essential to be able to repair the data concerned with the
minimum of delay and with as little effect as possible on the rest of the system. For
example, the availability of data that has not been damaged should ideally not be
affected. The DBA must define and implement an appropriate recovery scheme,
involving, e.g., periodic unloading “dumping” of the database to backup storage, and
procedures for reloading the database when necessary from the most recent dump.
Monitoring performance and responding to changing requirements
The DBA is responsible for so organizing the system as to get the performance that is
“best for the enterprise” and for making the appropriate adjustments as requirements
change. For example, it might be necessary to reorganize the stored database on a
periodic basis to ensure that performance levels remain acceptable.
University_Dept:
Dept_Id Dept_name Date_Of_Estb Head
23
BCA Sem-4 PAPER: BCAB2204T
Student_Record
Stu_Id Stu_Name D_O_Adm Class Session
Employee_Record
Emp_ID Emp_Name Emp_Add Dept D_O_J
The actual data in a database may change quite frequently. For Example: the
database
Above shown changes every time we add a student. The data in the database at a
particular moment of time is called a database state or snapshot. It is also called the
current set of occurrences or instances in the database. In the given database state,
each schema construct has its own current set of instances. For example, the
Student_Record construct will contain the set of individual student records as its
instances. Every time we insert or delete a record, or change the value of a data item
in a record, we change one state of the database into another state.
The distinction between database schema and database state is very important. When
we define a new database, we specify its database schema only to the DBMS. At this
point, the corresponding database state is the empty state with no data. We get the
initial state of the database when the database state is first populated or loaded with
the initial data. From then on, every time an update operation is applied to the
database, we get another database state. At any point in time, the database has a
current state.
1.2.8 Summary:
The three level architecture is divided into three levels: the external level, the
conceptual level, and the internal level. The external or user view is at the highest
level of database abstraction where only those portions of the database of concern to a
user or application program are included.
One conceptual view represents the entire database. The conceptual view is defined by
the conceptual schema. It describes all the records and relationships included in the
conceptual view and, therefore, in the database.
The internal view indicates how the data will be scored and describes the data
structures and access methods to be used by the database. It is expressed by the
internal schema, which contains the definition of the stored record, the method of
representing the data fields, and the access aids used. The DBMS provides users with
a method of abstracting their data requirements and removes the drudgery of
specifying the details of the storage and maintenance of data. The DBMS insulates
users from changes that occur in the database. Two levels of database independence
are provided by the system. Physical independence allows changes in the physical
level of data storage without affecting the conceptual view. Logical independence
allows the conceptual view to be changed without affecting the external view.
24
BCA Sem-4 PAPER: BCAB2204T
1.2.9 Keywords
View: A database view is a virtual table that is based on the result of a SELECT query.
Data Mapping: Data mapping refers to the process of defining the relationships and
connections between two distinct data models, schemas, or formats.
Stored Procedure: A stored procedure is a precompiled collection of one or more SQL
statements or procedural statements, which are stored in a database and can be
executed later.
DBA: Database Administrator is responsible for managing database
User liaison: user liaisoning typically involves establishing and maintaining a positive
relationship between an organization and its users or customers.
Schema: The overall structure of the database is called database schema
25
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
1.3.0 Objectives
1.3.1 Introduction
1.3.2 Entity Relationship Model
1.3.3 Basic Concepts
1.3.4 Mapping Cardinalities
1.3.5 Entity relationship Diagram
1.3.6 Weak and Strong Entity sets
1.3.7 Aggregation
1.3.8 Summary
1.3.9 Keywords
1.3.10 Short Answer Type Questions
1.3.11 Long Answer Type Questions
1.3.12 Suggested Readings
1.3.0 Objectives
After completing this lesson, you will be able to:
• Define E-R Model
• Explain the mapping Cardinalities
• Define and Draw E-R diagrams
• Define weak and strong entity sets
• Define aggregation
1.3.1 Introduction:
The Entity-Relationship Model is a high-level conceptual data model developed by Chem in
1976 to facilitate database design. The E-R Model is shown a diagrammatically using E-R
diagram which represents the elements of the conceptual model that shows the meanings
and the relationships between those elements independent of any particular DBMS and
implementation details. Cardinality of a relationship between entities is calculated by
26
BCA Sem-4 PAPER: BCAB2204T
measuring how many instances of one entity are related to a single instance of another. One
of the main limitations of the E-R Model is that it cannot express relationship among
relationships. So to represent these relationships among relationships, we combine the
entity sets and their relationship to form a higher level entity set. This process of combining
entity sets and their relationships to form a high entity set so as to represent relationships
among relationships is called Aggregation.
1.3.2 Entity Relationship Model
The entity-relationship model is based on the perception of a real world that consists of
a set of basic objects called entities, and of relationships among these objects. It was
developed to facilitate database design by allowing the specification of an enterprise
schema which represents the overall logical structure of a database. The E-R Model is
extremely useful in mapping the meanings and interactions of real-world enterprises
into a conceptual schema. Entity – relationship model was originally proposed by Peter
in 1976 as a way to unify the network and relational database views. Simply stated the
ER model is a conceptual data model that views the real world as entities and
relationships. A basic component of the model is the entity relationship diagram, which
is used to visually represent data objects. For the database designer, the utility of the
ER model is :
1) It maps well to the relational model. The constructs used in the ER model can
easily be transformed into relational tables.
2) It is simple and easy to understand with a minimum of training.
Therefore the model can be used by the database designer to
communicate the design to the end user.
3) In addition, the model can be used as a design plan by the database
developer to implement a data model in specific database management
software.
1.3.3 Basic Concepts
There are three basic notions that the E-R data model employs – entity sets,
relationship sets, and attributes.
Entity: An entity is a thing or object in the real world that is distinguishable from all
other objects. For example, each person in an enterprise is an entity. An entity has set of
properties and the values for some set of properties may uniquely identify an entity. For
example, the employee Id of a person uniquely identifies one particular person in the
enterprise. Entities are principal data object about which information is to be collected.
Entities are usually recognizable concepts, either concrete or abstract such as person, places,
things, or events which have relevance to the database. Some specific examples of entities are
EMPLOYEES, PROJECTS and INVOICES. An entity is analogous to a table in the relational
model. Entities are classified as :
1) Independent: An independent entity is one that does not rely on
another for identification.
27
BCA Sem-4 PAPER: BCAB2204T
2) Dependent: A dependent entity is one that relies on another for
identification.
Special Entity Types: Associative entities (also known as intersection entities) are
entities used to associate two or more entities in order to reconcile a many-to-many
relationship. Subtypes entities are used in generalization hierarchies to represent a
subset of instances of their parent entity, called the supertype, but which have
attributes or relationships that apply only to the subset.
Entity set: An entity set is a set of entities of the same type that share the same
properties, or attributes. The set of all persons who are customers at a given bank, for
example, can be defined as the entity set customer. The individual entities that
constitute the set are said to be the extension of the entity set. Thus all the customers
of the given bank are the extensions of the entity set customer.
Attributes: An entity is represented by a set of attributes. Attributes are descriptive
properties possessed by each member of an entity set. For example, a customer
entity set of a given bank has the attributes like account number, customer name,
customer address etc. For each attributes, there is a set of permitted values, called the
domain or value set, of that attribute. The domain of attribute customer-name might
be the set of all text strings of a certain length.
A database thus includes a collection of entity sets each of which contains any
number of entities of the same type. For example, a bank database consists of entity
sets like customer, loan etc.
An attribute, as used, in the E-R Model, can be characterized by the following
attribute types:
1. Simple and composite attributes: Simple attributes are those that
cannot be divided into subparts i.e. into other attributes. For example:
customer account number is a simple attribute. Composite attributes are
those that can be further divided into subparts or attributes. For
example: Customer_name attribute of an entity can be considered as
composite attribute because it can further be divide into subparts like
first name, middle name and last name.
2. Single valued and multivalued attributes: The attributes that have a single
value for a particular entity is known as single valued attributes. For example,
employee_Id of an employee in an enterprise will be single valued for every
employee. Multivalued attributes are those attributes that have multiple values
for an entity. For example, employee_dependent_names for a particular employee
in an enterprise can have zero, one or more names depending on the number of
dependents of an employee.
3. Null attributes: A null value is used when an entity does not have a
value for an atrbute. For example, if a particular employee has no
dependents, then the value of employee_dependent_names for that
employee in an enterprise will be null. Null can also designate that an
28
BCA Sem-4 PAPER: BCAB2204T
attribute value is unknown. An unknown value may be either missing or
not known.
4. Derived attributes: The value of these type of attributes is derived from the
values of other related attributes or entities. Some attributes may be related for a
particular entity. For example: the age of an employee can be derived from the
date_of_birth attribute of an employee therefore they are related .
Relationship Sets: A relationship is an association among several entities. For
example, we can define a relationship that associates an employee Ram Singh with
Department Computer Science. This relationship specifies that Employee Ram Singh is
working in the department Computer Science.
A relationship set is a set of relationships of the same type. Formally, it is a
mathematical relation between n>=2 entity sets. If E1,E2,E3…..En are entity sets, then
relationship set R is a subset of {(e1,e2,e3…..en)| e1 ε E1, e2 ε E2, e3 ε E3….. en ε En}
where (e1,e2,e3…..en) is a relationship.
Consider the two entity sets employee and Departments. We define the
relationship set works for to denote the association between employee and the
department that the employees have.
The association between entity sets is referred to as participation, that is the
entity sets e1 ε E1, e2 ε E2, e3 ε E3….. en ε En participate in relationship set R. A
relationship instance in an E-R schema represents that an association exists between
the named entities in the real world enterprise that is being modeled.
The function that an entity plays in a relationship is called that entity’s role.
Since entity sets participating in a relationship set are generally distinct, roles are
implicit and are not usually specified. However, they are useful when the meaning of a
relationship needs clarification. Such is a case when the entity sets of a relationship
set are not distinct, i.e., the same entity set participates in a relationship set more than
once, in different roles. In this type of relationship set, which is sometimes called a
recursive relationship set, explicit role names are necessary to specify how an entity
participates in a relationship instance.
A relationship can also have descriptive attributes. Consider a relationship set
depositor with entity set customer and account. We could associate the attribute
access-date to that relation to specify the most recent date on which a customer
accessed an account. The depositor relationship among the entities corresponding to
customer is described by access-date, which means when customer has most recently
accessed the account.
The number of entity sets that participate in a relationship set is also the
degree of the relationship set. A binary relationship set is of degree 2; a ternary
relationship set is of degree 3.
29
BCA Sem-4 PAPER: BCAB2204T
30
BCA Sem-4 PAPER: BCAB2204T
1.3.4 Entity Relationship Diagrams: The overall logical structure of a database can
be expressed graphically by an E-R Diagram. The relative simplicity and pictorial
clarity of this diagramming technique may well account in large part for the wide
spread use of the E-R Model. Such a diagram consists of the following components:
• Rectangles, which represents entity sets.
• Ellipses, which represents attributes.
• Diamonds, which represents relationship sets.
• Lines, which link attributes to entity sets and entity sets to
relationship sets.
• Double ellipses, which represents multivalued attributes.
• Dashed ellipses, which denote derived attributes.
• Double Lines, Which indicate total participation of an entity in a
relationship set.
To distinguish the mapping cardinalities of relationship set, we draw either directed
line (→) or an undirected line(-) between the relationship set and the entity set.
Consider the following E-R Diagram, which consists of two entity sets, customer
and loan, related through a binary relationship set borrower. The attributes associated
with customer are customer_name, social-security, customer-street, and customer-
city. The attributes associated with loan are loan-number and amount. The
relationship set borrower may be many to many, one to many, many to one or one to
one.
Social_secu
Customer_str
rity eet
Loan_num amou
Customer_n
ber nt
ame Customer_c
ity
Borro
Customer wer Loan
31
BCA Sem-4 PAPER: BCAB2204T
In the above E-R diagram, underlined attributes are acting as primary keys of the
corresponding entity sets.
Direction: The direction of a relationship indicates the originating entity of a relationship
. The entity from which a relationship originates is the parent entity ; the entity where the
relationship terminates is the child entity .
The type of the relation is determined by the direction of line connecting
relationship component and the entity. To distinguish different types of relation, we
draw either a directed line or an undirected line between the relationship set and the
entity set . Directed line is used to indicate one occurrence and undirected line is used
to indicate many occurrences in a relation.
DEPARTMENT , MANAGER , EMPLOYEE , PROJECT
The relationship between a DEPARTMENT and a MANAGER is usually one-to-
one ; there is only one manager per department . This relationship between entities is
shown below. Each entity is represented by a rectangle and a direct line indicates the
relationship between them. The relationship for MANAGER is both 1:1
Department Manager
Note that a one to one relationship between two entity set does not imply that for an
occurrence of an entity from one set at any time there must be an ocurrence of an
entity in the other set. In the case of an organistion , there could be times when a
department is without a manager or when an employee who is classified as a manager
may be without a department.
A one to many relationship exists from the entity MANAGER to the entity
EMPLOYEE because there are several employees reporting to the manager. As we have
pointed out there could be an occurrence of the entity type MANAGER having zero
occurrences of the entity type EMPLOYEE reporting to him or her. A reverse
relationship, from EMPLOYEE to MANAGER would be many to one since a single
manager may supervise many employees.
Manager Employee
The relationship between the entity EMPLOYEE and the entity PROJECT can be
derived as follows : Each employee could be involved in a number of different projects ,
and a number of employees could be working on a given project . This relationship
between EMPLOYEE and PROJECT is many to many .It is illustrated as below .
32
BCA Sem-4 PAPER: BCAB2204T
Employee Project
C name
Phone no
Loan No
Address
Amount
City
One-to-One Relationship
C_name
Phone no
Loan No
Address
Amount
City
One-to-Many Relationship
33
BCA Sem-4 PAPER: BCAB2204T
M:1 If multiple customers participate for a single loan and one customer can take only
one loan
C name
Phone no
Loan No
Address
Amount
City
Many-to-One Relationship
M:M If multiple customers participate for a single loan and one cutomer can take
more than one loan .
C name
Phone no
Loan No
Address
Amount
City
Many-to-Many Relationship
1.3.5 Weak and Strong Entity Sets
An entity Set may not have sufficient attributes to form a primary key, such an entity
set is termed as weak entity set. An entity set that has a primary key is termed as a
strong entity set. For example, consider the entity payment, which has the three
attributes: payment-number, payment-date, and payment-amount. Although each
payment entity is distinct, payments for different loans may share the same payment
34
BCA Sem-4 PAPER: BCAB2204T
number. Thus, this entity set does not have a primary key; it is a weak entity set. For a
weak entity set to be meaningful, it must be part of a one-to-many relationship set.
This relationship set should have no descriptive attributes, since any required
attributes can be associated with the weak entity set. A member of a strong entity set
is a dominant entity, whereas a member of a weak entity set is a subordinate entity.
Although a weak entity set does not have a primary key, we nevertheless need a
means of distinguishing among all those entities in the entity set that depend on one
particular strong entity. The discriminator of a weak entity set is a set of attributes that
allows this distinction to be made. For example, the discriminator of the weak entity set
payment is the attribute payment number, since, for each loan, a payment number uniquely
identifies one single payment for that loan. The discriminator of a weak entity set is also
called the partial key of the entity set.
The Primary key of a weak entity set is formed by the primary key of the strong
entity set on which set is existence dependent, plus the weak entity set’s discriminator.
In the case of the entity set payment, its primary key is (loan-number, payment-
number), where loan-number identifies the dominant entity of a payment, and
payment-number distinguishes payment entities within the same loan.
The identifying dominant entity set is said to own the weak entity set that it
identifies. The relationship that associates the weak entity set with an owner is the
identifying relationship. In our example, loan-payment is the identifying relationship
for payment.
A weak entity set in E-R Diagram is represented by a double outlined box, and
the corresponding identifying relationship by a double outlined diamond.
The relationship between weak entity and strong entity set can be expressed
with following example. In the following example loan-payment is the identifying
relationship for payment entity. A weak entity set is represented by double outlined box
corresponding identifying relation by a doubly outlined diamond. Here double line
indicate total participation of weak entity in strong entity set it means that every
payment must be related via loan-payment to some account. The arrow from loan
payment to loan indicates that each payment is for single loan. The discriminator of a
weak entity set is underlined with dashed lines rather than solid line.
35
BCA Sem-4 PAPER: BCAB2204T
Q4. Does every relation consist of both weak and strong entity sets?
Ans….….................................................................................................................
…...........................................................................................................................
…...........................................................................................................................
Loan_number Payment_date
Payment_numb
amount er
Payment_a
mt
Loan_
Loan payme Payment
nt
36
BCA Sem-4 PAPER: BCAB2204T
and loan as a higher-level entity set called borrower. Such an entity set is treated in
the same manner as is any other entity set. A common situation for aggregation is
shown in figure (b).
Social_security
Customer_street
Loan_number amount
Customer_name
Customer_city
Borrower
Customer Loan
Loan-officer
employee
Telephone_number
e- social_security
Employee_name
37
BCA Sem-4 PAPER: BCAB2204T
Customer_street
Social_securit Loan_number amount
y
Customer_na Customer_city
me
Borrowe
Customer r Loan
Loan-officer
employee
Telephone_numbe
e- social_security Employee_name r
38
BCA Sem-4 PAPER: BCAB2204T
Interest
Account Rate Account Balanc
e
Balance Overdraft_amt
Current
Saving account
account
Generalised
As
Acc_no Balance
Account
IS
Int Rate
A Overdraft
amt
Account is the higher level entity set and Saving account and Current account are
lower level entity sets. Generalization proceeds from the recognition that a number of
entity sets share some common features. Based on their commonalities, generalization
synthesizes these entity sets into a simple, higher-level entity set. Generalization is
used to emphasize the similarities among lower-level entity sets and to hide the
differences; it also permits an economy of representation in that shared attributes are
not repeated.
A crucial property of the higher-level and lower-level entities created by
specialization and generalization is attribute inheritance. The attributes of the higher-
level entity sets are said to be inherited by the lower-level entity sets. For example:
savings-account and checking-account inherit the attributes of account. Thus saving-
account is described by its account-number, balance and Interest rate and Checking
account is described by its account-number, balance, and overdraft-amount attributes.
39
BCA Sem-4 PAPER: BCAB2204T
A lower-level entity set also inherits participation in the relationship sets in which its
higher-level participates. Both the savings-account and checking-account entity sets
participate in the depositor relationship set. Attributes inheritance applies through all
tiers of the lower-level entity sets. The standard, gold, and senior lower-level entity sets
inherit the attributes and relationship participation of both checking-account and
account.
Whether a given portion of an E-R model was arrived at by specialization or
generalization, the outcome is basically the same:
• A higher-level entity set with attributes and relationships that apply to
all of its lower-level entity sets.
• Lower-level entity sets with distinctive features that apply only within a
particular lower-level entity set.
Converting E-R diagrams into tables
A database that conforms to an E-R database schema can be represented by a
collection of tables. For each entity set and for each relationship set in the database,
there is a unique table that is assigned the name of the corresponding entity set or
relationship set. Each table has multiple columns, each of which has a unique name.
Both the E-R model and the relational database are abstract, logical
representations of real world enterprises. Because the two models employ similar
design principles, we can convert an E-R design into a relational design. Consequently
a database representation from an E-R diagram to a table format is the basis for
deriving a relational database design from an E-R diagram. In the following section, we
will explain how to convert the E-R diagram or schema into tables.
Social_security
Customer_street
Loan_number amount
Customer_name
Customer_city
Borrower
Customer Loan
This entity set has two attributes: loan_number and amount. We represent this entity set by
table loan, with two columns, as shown in following figure. The row (L-17,1000) in the loan table
40
BCA Sem-4 PAPER: BCAB2204T
means that loan number L-17 has a amount of Rs. 1000. We can add a new entity to the
database by inserting a row into a table. We can also delete or modify rows.
Table : loan
Loan_number amount
L-17 1000
Let D1 denote the set of all loan numbers and let D2 denote the set of all balances. Any
row of the loan table must consist of a 2-tuple (v1,v2) where v1 is a loan, v1 belongs to
D1 and v2 is an amount where v2 belongs to D2. In general, the loan table will contain
only a subset of the set of all possible rows. We refer to the set of all possible rows of
loan as the Cartesian product of D1 and D2 denoted by D1 X D2.
In general, if we have a table of n columns, we denote the Cartesian product of
D1, D2, D3, … Dn by D1X D2 X D3 …..X Dn.
Similarly, another example, consider the entity set customer of the E-R Diagram
shown above. This entity set has the attributes customer_name, social_security,
customer_street, customer_city. The table corresponding to customer has four
columns as shown in Figure below:
Table : Customer
Customer_na Social_security Customer_street Customer_city
me
Suresh 123-4-567 Phase I Patiala
41
BCA Sem-4 PAPER: BCAB2204T
b3……,bn.. We represent the entity set A by a table called A with one column for each
attribute of the set:
{ a1, a2, a3……,an} U { b1, b2, b3……,bn }
As an illustration, consider the entity set payment consisting of attributes
payment_number, payment_date, payment_amount. The primary key of the loan entity
set, on which payment is dependent, is loan_number. Thus , payment is represented
by a table with four columns labeled loan_number, payment_number, payment_date,
payment_amount as shown in following figure:
Table: payment
Loan_nu Payment_nu Payment Payment_am
mber mber _date ount
L-17 67 10-11- 1000
2006
Tabular Representation of Relationship sets
Let R be a relationship set, let a1, a2, a3……,an be the set of attributes formed by the
union of the primary keys of each of the entity sets participating in R, and let
descriptive attributes (if any) of r be b1, b2, b3……,bn.. We represent this relationship set
by a table called R with one column for each attribute of the set:
{ a1, a2, a3……,an} U { b1, b2, b3……,bn }
As an illustration, consider the relationship set borrower in the E-R Diagram shown
above. This relationship set involves the following two entity sets:
3) Customer, with the primary key social security.
4) Loan, with the primary key loan_number.
Since the relationship set has no attributes, the borrower table has two columns labeled
social_security and loan_number as shown in figure below:
Table : borrower
Social_security Loan_number
123-4-567 67
The case of a relationship set linking a weak entity set to the corresponding strong entity set is
special. As we noted earlier, these relationships are many-to-one and have no descriptive
attributes. Furthermore, the primary key of a weak entity set includes the primary key of the
strong entity sets. In the E-R diagram shown below, the weak entity set payment is dependent
on the strong entity set loan via the relationship set loan-payment. The primary key of payment
is {loan_number, payment_number} and the primary key of loan is {loan_number}. Since
loan_payment has no descriptive attributes, the table for loan_payment would have two
columns, loan_number and payment_number. The table for the entity set payment has four
columns, loan _number, payment_number,payment_date,payment_amount. Thus, the loan-
payment table is redundant. In general, the table for the relationship set linking a weak entity
42
BCA Sem-4 PAPER: BCAB2204T
set with its corresponding strong entity set is redundant and does not need to be present in a
tabular representation of E-R diagram
Payment_date
Loan_payment payment
. Loan
Multivalued Attributes
We have seen that attributes in an E-R diagram generally map directly into columns
for the appropriate tables. Multivalued attributes, however, are an exception; new
tables are created for these attributes.
For a multivalued attribute M, we create a table T with a column C that
corresponds to M and columns corresponding to primary key of the entity set or
relationship set of which M is an attribute. For example, consider the E-R diagram that
includes the multivalued attribute dependent_name. For this multivalued attribute, we
create a table dependent_name, with columns dname, referring to the dependent_name
attribute of employee, and e_social_security, representing the primary key of the entity
set employee. Each dependent of an employee is represented as a unique row in the
table.
Tabular Representation of Generalization
There are two different methods for transforming to a tabular form an E-R diagram
that includes generalization.
1. Create a table for the higher-level entity set. For each lower-level entity set, create a
table that includes a column for each of the attributes of that entity set plus a column
for each attribute of the primary key of the higher-level entity set. Thus, for the E-R
diagram, we have three tables:
• Account, with attribute account_number and balance.
• Savings_account, with attributes account_number and interest_rate.
• Checking_account, with attributes account_number and
overdraft_amount.
2. If the generalization is disjoint and complete – that is if no entity is a member of two
lower-level entity sets directly below a higher-level entity set, and if every entity in the
higher-level entity set is also a member of one of the lower-level entity sets – then an
alternative representation is possible. Here, create no table for the higher level entity
set. Instead, for each lower-level entity set, create a table that includes a column for
43
BCA Sem-4 PAPER: BCAB2204T
each of the attributes of that entity set plus a column for each attribute of the higher
level entity set. Then, for the E-R diagram we have two tables:
• saving_account, with attributes account_number, balance, and
interest_rate.
• Checking_account, with attributes accounts_number, balance, and
overdraft_amount.
The saving_account and checking_account relations corresponding to these tables both
have account-numbers as the primary key.
If the second method were used for an overlapping generalization, some values
such as balance would be stored twice unnecessarily. Similarly, If the generalization
were not complete—that is , if accounts were neither savings nor checking accounts—
then such accounts could not be represented with the second option.
Specialization
Specialization is the process of taking subsets of a higher level entity set to form lower
level entity sets . It is a process of defining a set of subclasses of an entity type, which
is called as superclass of the specialization . The process of defining subclass is based
on the basis of some distinct characteristics of the entities in the superclass .
For example specialization of the Employee entity type may yield the set of
subclasses namely Salaried_Employee and Hourly_Employee on the method of pay as
shown below.
44
BCA Sem-4 PAPER: BCAB2204T
Employee
Emp Emp
number name
Employee
IS
A
Salaried-Employee Hourly_Employ
ee
Hourly
Basic_pa rate
y
1.3.7 Summary
The entity-relationship model is based on the perception of a real world that consists of
a set of basic objects called entities, and of relationships among these objects. There
are three basic notions that the E-R data model employs – entity sets, relationship
sets, and attributes. An entity is a thing or object in the real world that is
distinguishable from all other objects. An entity set is a set of entities of the same type
that share the same properties, or attributes. An entity is represented by a set of
attributes. Attributes are descriptive properties possessed by each member of an entity
set. A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Mapping
Cardinalities, or cardinality ratios, express the number of entities to which another
entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets,
although occasionally they contribute to the description of relationship sets that
involve more than two entity sets. The overall logical structure of a database can be
45
BCA Sem-4 PAPER: BCAB2204T
expressed graphically by an E-R Diagram. The relative simplicity and pictorial clarity of
this diagramming technique may well account in large part for the wide spread use of
the E-R Model. An entity Set may not have sufficient attributes to form a primary key,
such an entity set is termed as weak entity set. An entity set that has a primary key is
termed as a strong entity set. One limitation of the E-R Model is that it is not possible
to express relationships among relationships. This limitation can be removed through
aggregation. Aggregation is an abstraction through which relationships are treated as
higher-level entities.
1.3.8 Keywords
Aggregation: Aggregation generally refers to the process of combining different
elements or components into a single, unified entity.
Cardinality: it refers to the size or multiplicity of a relation
Set: a set is a well-defined collection of distinct objects, considered as an object in its
own right.
Specialization: Specialization is the process of defining a set of subclasses from a
superclass
1.3.9 Short Answer Type Questions:
Q1. Define entity, entity set, attribute, relationship and relationship set.
Q2. What do you mean by an attribute? Explain various types of attributes.
Q3. What is the difference between generalization and specialization?
Q4. How does ER diagram represent the concept of specialization and
generalization?
Q5. Define aggregation with suitable example. Why it is needed?
1.3.10 Long Answer Type Questions
Q1. What do you mean by mapping cardinality? Explain various mapping
cardinalities with suitable examples.
Q2. What is an E-R Diagram? Explain various constructs for drawing E-R
Diagram.
Q3. Explain E-R Model.
Q4. Explain the steps for converting E-R Diagrams into tables.
1.3.11Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New
Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.
46
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
1.4.0 Objectives
1.4.1 Introduction
1.4.2 Relational Data Model Concepts
1.4.3 Constraints
1.4.4 Summary
1.4.5 Keywords
1.4.6 Short Answer Type Questions
1.4.7 Long Answer Type Questions
1.4.8 Suggested Readings
1.4.0 Objectives
After reading this lesson you will be able to
• Understand the basic concepts of Relational Data Model
• Constraints in Relational Data Model
1.4.1 Introduction
The relational data model is an abstract theory of data that is based on certain aspects
of mathematics (principally set theory and predicate logic). The principles of relational
model were originally laid down in 1969-70 by Dr. E.F. Codd at that time a member of
IBM. Relational model is a way of looking at data. Relational model stores data in the
form of tables. A relational model database is defined as a database that allows you to
group its data items into one or more independent tables that can be related to one
another by using fields common to each related table.
48
BCA Sem-4 PAPER: BCAB2204T
Each column in the tuple is called an attribute. The number of attributes in a relation
determines its degree. The relation in above figure has degree 3.
Domains
A domain definition specifies the kind of data represented by the attribute. More
particularly, a domain is a set of all possible values that an attribute may validly
contain. Domains are often confused with data types, but this is inaccurate. Data type
is a physical concept while domain is a logical one. “Number “ is a data type and “Age”
is a domain .
To give another example “Street name” and “Sur name” might both be
represented as text fields but they are obviously different kinds of text fields, they
belong to different domain.
Domains is also a broader concept than data type in that domain definition includes a
more specific description of the valid data. For example, the domain Degree Awarded, which
represents the degrees awarded by a university in the database schema, this attribute might
be defined as Text[3[, but it’s not just any three–character string, it’s a member of the set {BS,
BA, MA, MS, PhD, LLB, MD}, of course not all hundred or so values if we are talking about
mauseums exhibit. In such instances it’s useful to define the domain in terms of the rules,
which can be used to determine the membership of any specific value in the set of all valid
values.
For example, Person Age could be defined as “an integer in the range 0 to 120”
whereas Exhibit Age (age of any object for exhibition) might simply by “an integer equal
to or greater than 0”.
Body of a Relation
The body of the relation consists of an unordered set of zero or more tuples.
There are some important concepts here. First the relation is unordered. Record
numbers do not apply to relations. Second a relation with O tuples still qualifies as a
relation. Third, a relation is a set. The items in a set are, by definition uniquely
identifiable. Therefore for a table to qualify as a relation each record must be uniquely
identifiable and the table must contain no duplicate constraints.
Keys of a Relation
It is a set of one or more columns whose combined values are unique among all
occurrences in a given table. A key is the relational means to specify uniqueness.
1.4.3 Constraints:
There are two types of constraints
1) Table Constraint
2) Column Constraint
Table Constraint
If the constraint spans across multiple columns , the user will have to use table level
constraints. If the data constraint attached to a specific cell in a table references the
contents of another cell in the table , then the user will have to use table level
constraints.
49
BCA Sem-4 PAPER: BCAB2204T
Primary Key as table level constraint :
E.g Create table sales_order_details(s_order_no varchar2(6), Product_no varchar2(6),
PRIMARY KEY (s_order_no, product no));
Column Level Constraint
If the constraints are defined with the column definition, it is called as a column
level constraint. They are local to a specific column.
Primary Key as a column level constraint
Create table client ( client_no varchar2(6) Primary Key…);
Features of Constraint
1) NOT NULL CONDITION
2) UNIQUENESS
3) PRIMARY KEY identification
4) FOREIGN KEY
5) CHECK the column value against a specified condition
Some important constraints features and their implementation have been discussed
below:
Primary Key Constraints
A PRIMARY KEY constraint designates a column or combination of columns as the
table’s primary key. To satisfy a PRIMARY KEY constraint, both the following
conditions must be true :
1) No primary key value can appear in more than one row in the table.
2) No column that is the part of the primary key can contain null value.
A table can have only one primary key.
A primary key column cannot be of data type LONG OR LONG ROW. You cannot
designate the same column or combination of columns as both a primary key and a
unique key or as both a primary key and a cluster key. However, you can designate the
same column or combination of columns as both a primary key and a foreign key.
Defining Primary Keys
You can use the column_constraint syntax to define a primary key on a single column.
Example
The following statement creates the DEPT table and defines and enables
a primary key on the DEPTNO column :
CREATE TABLE dept
( deptno NUMBER (2) CONSTRAINT pk_dept PRIMARY KEY,
dname VARCHAR2(10));
The pk_dept constraint identifies the deptno column as the primary key of the dept
table . This constraint ensures that no two departments in the table have the same
department number and that no department number is NULL .
Alternatively, you can define and enable this constraint with table constraint syntax :
CREATE TABLE dept
(deptno NUMBER(2),
50
BCA Sem-4 PAPER: BCAB2204T
dname VARCHAR2(9),
loc VARCHAR2(10),
constraint PK_DEPT PRIMARY KEY (deptno));
Self- Check Exercise-I
Q1. What are the conditions to implement primary key constraint?
Ans…..............................................................................................................
…....................................................................................................................
.......................................................................................................................
Q2. What are the types of structural constraints?
Ans…..............................................................................................................
…....................................................................................................................
.......................................................................................................................
51
BCA Sem-4 PAPER: BCAB2204T
1) The child and parent tables must be on the same database. They
cannot be on different nodes of a distributed database.
2) The foreign key and the referenced key can be in the same table.
In this case, the parent and child tables are the same.
3) To satisfy a referential integrity constraint, each row of the child
table must meet one following conditions:
a) The value of the row’s foreign key must appear as a
referenced key value in one of the parent table’s rows . The
row in the child table is said to depend on the referenced
key in the parent table.
b) The value of one of the columns that makes up the foreign
key must be null.
A referential integrity constraint is defined in the child table. A referential integrity
constraint definition can include any of the following key words:
) Foreign Key: Identifies the column or combination of columns in
the child table that makes up the foreign key. Only use this
keyword when you define a foreign key with a table constraint
clause.
) Reference: Identifies the parent table and the column or the
combination of columns that make up the referenced key. If you
only identify the parent table and omit the column names, the
foreign key automatically references the primary key on the
parent table. The corresponding columns of the referenced key
and the foreign key must match in number and data types.
On Delete Cascade : Allows deletion of referenced key values in the parent table that have
dependent rows in the child table and causes Oracle to automatically delete dependent rows
from the child table to maintain referential integrity . If you omit this Option, Oracle forbids
deletions or referenced key in the parent table that have dependent rows in the child table.
In the first example, we defined a referential integrity constraint in a CREATE TABLE
statement that contains as clause. Instead, you can create the table without the constraint
and then add it later with an ALTER TABLE statement.
You can define multiple foreign keys in a table . Also , a single column can be part of
more than one foreign key .
Defining Referential Integrity Constraints
You can use column_constraint syntax to define a referential integrity constraint in which
the foreign key is made up of a single column .
Example
The following statement creates the EMP table and defines and enables a foreign key
on the DEPTNO column that references the primary key on the DDPTNO column of the
DEPT table:
CREATE TABLE emp
52
BCA Sem-4 PAPER: BCAB2204T
(Empno NUMBER (4) ,
ename VARCHAR2 (10) ,
job VARCHAR2 (9) ,
ngr NUMBER (4) ,
hiredate DATE ,
sl NUMBER (7,2) ,
deptno CONSTRAINT fk_deptno REFERENCES dept (deptno));
The constraint FK_DEPTNO ensures that all employees in the EMP table work in a
department in the DEPT table. However, employees have null department numbers.
Before you define and enable this constraint you must define and enable a constraint
that designates the DEPTNO column of the DEPT table as a primary or unique key.
Note that the referential integrity constraint definition does not use the FOREIGN KEY
keyword to identify the columns that make up the foreign key. Because the constraint
is defined with a column constraint clause on the DEPTNO column, the foreign key is
automatically on the DEPTNO column.
Note that the constraint definition identifies both the parent table and the columns of
the referenced key. Because the referenced key is the parent table’s primary key, the
referenced key column names are optional.
Note that the above statement omits the DEPTNO column’s data type. Because
this column is a foreign key, Oracle automatically assigns it the data type DEPTNO
column to which the foreign key refers.
Alternatively, you can define a referential integrity constraint with table_constraint
syntax :
CREATE TABLE emp
( empno NUMBER (4) ,
ename VARCHAR2 (10) ,
job VARCHAR2(9) ,
ngr VARCHAR2(9) ,
Hiredate DATE ,
Sl NUMBER(7,2) ,
Comm NUMBER(7,2) ,
Deptno CONSTRAINT fk_deptno FOREIGN KEY
(deptno) REFERENCES dept(deptno));
Note that the foreign key definitions in both the above statements omit the ON DELETE
CASCADE option , causing Oracle to forbid the deletion of a department if any
employee works in that department .
Now if we take a simple example with following relations based on which we see the
various operations in relational model. Relations are :
1) Supplier records
2) Part records
3) Shipment records
53
BCA Sem-4 PAPER: BCAB2204T
The Supplier records
SNo Name Status City
S1 Suneet 20 Qadian
S2 Ankit 10 Amritsar
S3 Amit 10 Amritsar
As we discussed earlier , in this context we assume that each row in the Supplier table
is identified by a unique SNo (Supplier Number), which uniquely identifies the entire
row in the table . Likewise each part has a unique PNo (Part Number ) .Also we
assume that no more than one shipment exists for a given supplier/part combination
in the shipments table .
Note that the relations Parts and Shipments have PNo (Part Number) in common
and Supplier and Shipments relations have SNo (Supplier Number) in common. The
Supplier and Parts relation have City in common. For example, the fact that supplier
S3 and part P2 are located in the same city is represented by the appearance of the
same value, Amritsar, in the city column of the two tuples in relations.
Self- Check Exercise-II
Q3. Referential Integrity constraint is applied on foreign key or primary key or
both?
Ans…..............................................................................................................
…...................................................................................................................
.......................................................................................................................
54
BCA Sem-4 PAPER: BCAB2204T
Q4. Which command is given to delete all referenced tables of a parent table?
Ans…..............................................................................................................
…...................................................................................................................
......................................................................................................................
55
BCA Sem-4 PAPER: BCAB2204T
In order to get this information we have to search the information of part P2 in
the SP table. For this a loop is constructed to find the rcords of P2 and on getting the
records, corresponding supplier numbers are printed.
Algorithm
do until no more shipments;
get next shipment where PNO=P2;
print SNO;
end;
Query 2 : Find part numbers for parts supplied by supplier S2.
In order to get this information we have to search the information of supplier S2
in the SP table. For this a loop is constructed to find the records of S2 and on getting
the records corresponding part numbers are printed.
Algorithm
do until no more parts ;
get next shipment where SNO=S2;
print PNO;
end;
Since both the queries involve the same logic and are very simple, so we can conclude that
retrieval operation of this model is simple and symmetric.
Structured Query Language (SQL)
Structured query language (SQL) pronounced as “sequel” is the set of commands that
all programs and users must use to access data within the database. Application
programmers and Oracle tools often allow users to access the database without directly
using SQL, but these applications in turn must use SQL when executing the user’s
request.
Historically, the paper, “ A Relational Model of Data for Large Shared Data
Banks,” by Dr E F Codd, was published in June 1970 in the Association of Computer
Machinery (ACM) journal, Communications of the ACM. Codd’s model is now accepted
as the definitive model for relational database management systems (RDBMS). The
language, Structured English Query Language (SEQUEL) was developed by IBM
Corporation, Inc .to use Codd’s model. SEQUEL, later became SQL. In 1979, Relational
Software, Inc introduced the first commercially available implementation of SQL.
Today, SQL is accepted as the standard RDBMS language. The latest SQL standard
published by ANSI and ISO is often called SQL-92 (and sometimes SQL2).
Benefits of SQL
This section describes many of the reasons for SQL’s widespread acceptance by relational
database vendors as well as end users. The strengths of SQL benefit all ranges of users
including application programmers, database administrators, and management and end
users.
Non-Procedural Language
SQL is a non-procedural language because it :
56
BCA Sem-4 PAPER: BCAB2204T
1) Processes sets of records rather than just one at a time ;
2) Provides automatic navigation to the data .
3) System administrators
4) Database administrators
5) Security administrators
6) Application programmers
7) Decision support system personnel
8) Many other types of end users
Unified Language
SQL provides commands for a variety of tasks including :
1. Querying data;
2. Inserting, updating and deleting rows in a table ;
3. Creating ,replacing , altering and dropping objects ;
4. Controlling access to the database and its object ;
5. Guaranteeing database consistency and language .
SQL unifies all the above tasks in one consistent language .
Common Language for all Relational Databases
Because all major relational database management systems support SQL, you can
transfer all skills you have gained with SQL from one database to another. In addition,
since all programmes written in SQL are portable, they can often be moved from one
database to another with very little modification.
Embedded SQL
Embedded SQL refers to the use of standard SQL commands embedded within a
procedural programming language. Embedded SQL is a collection of these commands:
All SQL commands, such as SELECT and INSERT, available with SQL with
interactive tools;
Flow control commands, such as PREPARE and OPEN, which integrate the standard
SQL, commands with a procedural programming language.
The Oracle precompilers support embedded SQL. The Oracle precompilers
interpret embedded SQL statements and translate them into statements that can be
understood by procedural language compilers. Each of these Oracle precompilers
translate embedded SQL programmes into a different procedural language:
The Pro *Ada precompiler
The Pro *C/C++ Precompiler
The Pro * COBOL precompiler
The Pro * FORTRAN precompiler
The Pro * Pascal precompiler
The Pro * PL/I precompiler
Database Objects :
Oracle supports two types of data objects .
57
BCA Sem-4 PAPER: BCAB2204T
Schema Objects : A schema is a collection of logical structures of data, of schema
objects. A schema is owned by a database user and has the same name as the user.
Each user owns a single schema. Schema objects can be created and manipulated with
SQL and include the following types of objects.
Views
Non-schema Objects : Other types of objects are also stored in the database and can
be created and manipulated with SQL , but are not contained in a schema .
Profiles Rates
Users
58
BCA Sem-4 PAPER: BCAB2204T
9) Columns in the same table or view cannot have the same name. However,
column in different tables or views can have the same name.
10) Procedures or functions contained in the same package can have the
same name, provided that their arguments are not of the same number
and data types. Creating multiple procedures of functions with the same
name in the same package with different arguments is called overloading
the procedure or function.
Objects Naming Guidelines
There are several helpful guidelines for naming objects and their parts :
1) Use full, descriptive, pronounceable names (or well-known
abbreviations).
2) Use consistent naming rules.
3) Use the same name to describe the same entity or attributes across
tables.
4) When naming objects, balance the objective of keeping names short and
easy to use with the objective of making names as long and descriptive
as possible. When in doubt, choose the more descriptive name because
many people may use the objects in the database over a period of time.
Your counterpart ten years from now may have difficulty understanding
a database with names like PMDD instead of PAYMENT_DUE_DATE.
5) Using consistent naming rules helps understand the part plays in your
application. One such rule might be to begin the names of all tables
belonging to the FINANCE application with FIN_.
6) Use the same names to describe the same things across tables. For
examples, the department number columns of the EMP and DEPT tables
should be named DEPTNO.
Advantages of Relational Model
The major advantages of the relational model are :
1. Structural independence
In relational model changes in the database structure do not affect the
data access. When it is possible to make change to the database structure
without affecting the DBMS’s capability to access data, we can say that
structural independence has been achieved. So, relational database model has
structural independence.
2. Conceptual simplicity:
We have seen that both the hierarchical and the network database
models were conceptually simple. But the relational database model is even
simpler at the conceptual level. Since the relational data model frees the
designer from the physical data storage details, the designers can concentrate
on the logical view of the database.
3. Design, implementation, maintenance and usage case:
59
BCA Sem-4 PAPER: BCAB2204T
The relational database model achieves both data independence and
structure independence making the database design, maintenance,
administration and usage much easier than the other models.
4. Ad hoc query capability:
The presence of very powerful, flexible and easy-to-use query capability is
one of the main reasons for the immense popularity of the relational database
model. The query language of the relational database models structured query
language or SQL makes ad hoc queries a reality. SQL is a fourth generation
language. A 4GL allows the user to specify what must be done without
specifying how it must be done. So, using SQL the users can specify what
information they want and leave the details of how get the information to the
database.
Disadvantages of Relational Model
The relational model’s disadvantages are very minor as compared to the advantages and
their capabilities far outweigh the shortcomings . Also the drawbacks of the relational
database systems could be avoided if proper corrective measures are taken. The drawbacks
are not because of the shortcomings in the database model, but the way it is being
implemented.
Some of the disadvantages are :
1. Hardware overheads:
Relational database system hides the implementation complexities and the
physical data storage details from the users. For doing this, for making things easier
for the users, the relational database systems need more powerful hardware
computers and data storage devices. So the RDBMS needs powerful machines to run
smoothly. But as the processing power of modern computers is increasing at an
exponential rate and in today’s scenario, the need for more processing is no longer a
very big issue.
2. Ease of design can lead to bad design
The relational database is an easy to design and use. The users need not
know the complex details of physical data storage. They need not know how the
data is actually stored to access it. This ease of design and use can lead to the
development and implementation of very poorly designed databases. Since the
database is efficient, these design inefficiencies will not come to light when the
database is designed and when there is only a small amount of data. As the
database grows, the pooly designed databases will slow the system down and
will result in performance degradation and data corruption.
3. Information island phenomenon
As we have said before, the relational database systems are easy to
implement and use. This will create a situation where too many people or
departments will create their own databases and applications.
60
BCA Sem-4 PAPER: BCAB2204T
These information islands will prevent the information integration that is
essential for the smooth and efficient functioning of the organization. These
individual databases will also create problems like data inconsistency, data
duplication, data redundancy and so on.
But as we have said all these issues are minor when compared to the advantages and
all these issues could be avoided if the organization has a properly designed database
and has enforced good database standards.
1.4.4 Summary
The relational data model was first introduced by Dr. E.F. Codd, an Oxford trained
mathematician while working in IBM Research Center in 1970’s. He represented this
idea in a classic paper and attracted immediate attention due to its simplicity and
mathematical foundations. It also drew immediate attention of the computing industry
because of its simple way in which it represented information by well understood
convention of tables of values as its building block. The relational model is considered
one of the most popular developments in the database technology because it can be
used for representing most of the real-world objects and the relationships between
them.
1.4.5 Keywords
Referential Integrity: Referential Integrity is a concept in Database Management
Systems (DBMS) that ensures relationships between tables are maintained
consistently.
Schema Object: A database schema is a collection of database objects, including
tables, views, indexes, sequences, procedures, functions, and more, organized and
defined to represent the structure of a database.
Stored Procedure: A stored procedure is a precompiled collection of one or more SQL
statements that can be executed as a single unit.
Rollback: Refers to the process of undoing or reverting a set of transactions that were
not committed to the database.
1.4.6 Short Answer Type Questions
Q1. What is difference between key an index?
Q2. Differentiate between UNIQUE Key and PRIMARY key.
Q3. What are the features of Column-level constraints.
Q4. What are the schema and non-schema objects?
1.4.7 Long Answer Type Questions
Q1. Explain the various concepts of relational data model.
Q2. Explain the various constraints in relational data model.
Q3. Explain row level constraint with examples.
1.4.8 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia Publication,
New Delhi.
61
BCA Sem-4 PAPER: BCAB2204T
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.
62
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
1.5.0 Objectives
1.5.1 Introduction
1.5.2 Database Utilities
1.5.3 Data Models
[Link] High-level data models
[Link] Low-level data models
[Link] Representational data model
1.5.4 What is a Key?
[Link] Super Key
[Link] Candidate Key
[Link] Primary Key
[Link] Unique Key
[Link] Foreign Key
1.5.5 Database Languages
1.5.6 Data Definition Language
1.5.7 Data Manipulation Language
1.5.8 Data Control Language
1.5.9 Summary
1.5.10 Keywords
1.5.11 Short Answer Type Questions
1.5.12 Long Answer Type Questions
1.5.13 Suggested Readings
1.5.0 Objectives
After completing this lesson, you will be able to:
• Explain the database utilities
• Data Model and various data models
• Define Key
• Different type of keys – Super, Candidate, Primary, Unique and
Foreign
• What is DBMS Language?
• DDL, DML, DCL
63
BCA Sem-4 PAPER: BCAB2204T
1.5.1 Introduction
DBMSs have database utilities that help the DBA in managing the database
system. Common utilities include loading, backup, file reorganization and
performance monitoring. One fundamental characteristic of the database approach
is that it provides some level of data abstraction by hiding details of data storage
that are not needed by most database users. A data model is a collection of
concepts that can be used to describe the structure of a database and that provides
the necessary means to achieve the data abstraction. We can categorize the data
models into High-level or conceptual data models and Low-level or physical data
models. High-level or conceptual data models are those that provide concepts that
are close to the way many users perceive data. Entity Relationship model is a
popular example of high-level data model. Low-level or physical data models are
those that provide concepts that describe the details of how data is stored in the
computer. Between the two broad categories of data models is representational or
implementation data model which provide concepts that may be understood by end
users. Hierarchical, Network and relational are the popular examples of this
category of data model. A Key is a property of the entity set, rather than of the
individual entities. Any two individual entities in the set are prohibited from having
the same value of the key attributes at the same time. There is different type of keys
– Super, Candidate, Primary, Unique and Foreign. Keys help in maintaining the
data integrity and consistency.
64
BCA Sem-4 PAPER: BCAB2204T
Other utilities may be available for sorting files, handling data compression,
monitoring access by users, and performing other functions.
65
BCA Sem-4 PAPER: BCAB2204T
66
BCA Sem-4 PAPER: BCAB2204T
with the object, that is, its behavior. The object is said to encapsulate both
state and behavior.
[Link] Low-level or physical data models: Low-level or physical data models are
those that provide concepts that describe the details of how data is stored in the
computer. It represents information such as record formats, record ordering, and
access paths. An access path is a structure that makes the search for particular
database records efficient. One of the most important low-level data model is
unifying data model.
[Link] Representational or implementation data models: Between the two
broad categories of data models is representational or implementation data model
which provide concepts that may be understood by end users. These are used in
describing data at logical and view levels of three level architecture of DBMS. In
contrast to object based data models, these are used to specify the overall logical
structure of the database and to provide a higher-level description of the
implementation. The most popular representational data models are
a. Hierarchical data model
b. Network data model
c. Relational data model
Representational data models represents data by using record structures and
hence are sometimes called record-based data models.
Self-Check Exercise-I
Q1: What is data modelling?
Ans……….....................................................................................................
…...........................................................................................................................
…...........................................................................................................................
Q2: What are the utilities of DBMS?
Ans…......................................................................................................................
…...........................................................................................................................
…....................................................................................................................
67
BCA Sem-4 PAPER: BCAB2204T
[Link] Unique
It is same as primary key. The only difference is that Unique allows NULLs but
Primary key does not allow NULL.
68
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-II
Q3: Which keys consist of primary key?
Ans…………............................................................................................................
…...........................................................................................................................
…...........................................................................................................................
Q4: Is it possible to have a foreign key when there is only one table in a database?
Ans…………............................................................................................................
…...........................................................................................................................
…...........................................................................................................................
69
BCA Sem-4 PAPER: BCAB2204T
The following are basic SQL commands that used as DML statements:
[Link]. Purpose SQL statement
1. Retrieve data from one or more tables Select
2. Add new rows in a table Insert
3. Removes rows from table Delete
4. Change data in rows in a table Update
70
BCA Sem-4 PAPER: BCAB2204T
1.5.9 Summary
The main purpose of data modeling is to help in understanding the meaning of the
data and to facilitate communication about information requirements. A data model
supports communication between the users and database designers. The goal of
data model is to make sure that all data objects required by the database are
completely and accurately represented because the data model uses easily
understood notations and natural language which can be reviewed and verified as
correct by the end users. a key is a property of the entity set, rather than of the
individual entities. Any two individual entities in the set are prohibited from having
the same value of the key attributes at the same time. There is different type of keys
– Super, Candidate, Primary, Unique and Foreign.
DBMS must provide appropriate languages for various categories of DBMS
users. After database design, a DBMS is chosen to implement the database. Most
common DBMS languages are DDL, DML, and DCL. Data Definition Language
(DDL) is used by the DBA and by database designers to define database schemas.
The DBMS provides a data manipulation language (DML) for manipulating data in
database. Data Control Language (DCL) is used for granting and revoking the
privileges to the users for using the database.
1.5.10 Keywords
DDL: DDL stands for Data Definition Language, which is a subset of SQL
(Structured Query Language) used for defining and managing the structure of a
relational database.
DML: DML stands for Data Manipulation Language, which is a subset of SQL
(Structured Query Language) responsible for managing data within a relational
database.
DCL: DCL stands for Data Control Language, which is a subset of SQL (Structured
Query Language) responsible for managing access to data within a relational
database.
71
BCA Sem-4 PAPER: BCAB2204T
72
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
RELATIONAL ALGEBRA
1.6.0 Objectives
1.6.1 Introduction
1.6.2 Relational Algebra
1.6.3 Summary
1.6.4 Keywords
1.6.5 Short Answer Type Questions
1.6.6 Long Answer type Questions
1.6.7 Suggested Readings
1.6.0 Objectives
After reading this lesson you will be able to learn the concepts of Relational
Algebra.
1.6.1 Introduction:
Relational Algebra is a procedural language that can be used to tell the DBMS
how to build a new relation from one or more relations in the database. While using the
relational algebra user has to specify what is required and what are the procedure or
steps to obtain the required output. Relational algebra is a formal and user friendly
language. It is used as the basis for other high level Data Manipulation Languages
(DMLs) for relational databases. It illustrates the basic operations required of any DML
and serve as the standard of comparison for other relational databases.
1.6.2 Relational Algebra
The relational algebra is a theoretical language with operations that work on one
or more relations to define another relation without changing the original relation(s).
Thus, both the operands and the results are relations and so the output from one
operation can become the input to another operation. This allows expressions to be
nested in the relational algebra just as we nest arithmetic operations. This property is
called closure: relations are closed under the algebra just as numbers are closed under
arithmetic operations.
There are many variations of the operations that are included in relational
algebra Codd originally proposed Eight operations, but several others have been
developed.
The five fundamental operations in relational algebra are :
1) Selection
73
BCA Sem-4 PAPER: BCAB2204T
2) Projection
3) Cartesian Product
4) Union
5) Difference
They perform most of the data retrieval operations, which can be expressed in
terms of the five basic operations.
In relational algebra each operation takes one or more relations as its operands
and produces another relation as its result. Consider an example of mathematical
algebra as shown below :
3+5=8
Here 3 and 5 are operands and + is an arithmetic operator which gives result as 8.
Similarly, in relational algebra R1+ R2 = R3. Here R1 and R2 are relations
(operands) and + is the relational operator which gives R3 as a resultant relation.
A) BASIC RELATIONAL ALGEBRA OPERATIONS
Basic relational algebra operations are also called as traditional set operators,
the various traditional set operators are :
1) UNION
2) INTERSECTION
3) DIFFERENCE
4) CARTESIAN PRODUCT
UNION
In mathematical set theory, the union of two sets is the set of all elements
belonging to both sets. The set, which results from the union, must not of course
contain duplicate elements. It is denoted by U. Thus the union of sets:
S1 = { 1 , 2 , 3 , 4, 5 } and
S2 = { 4 , 5 , 6, 7 , 8 }
would be the set { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } .
A union operation on two relational tables follows the same basic principle but
is more complex in practice. In order to perform the Union operation, both operand
relations must be union compatible i.e. they must have same number of columns
drawn from the same domain (means must be of same data type}
Suppose two tables, R and S have the following tuples at some instant in time
and that their header parts are as shown below:
R
Cust_name Cust_status
Sham Good
Rahul Excellent
Mohan Bad
Sachin Excellent
Dinesh Bad
74
BCA Sem-4 PAPER: BCAB2204T
S
Cust_name Cust_status
Karan Bad
Sham Good
Sachin Excellent
Rohan Average
These can certainly be combined into one table containing a valid relation by the
relational union operator (R U S) as follows :
RUS
Cust_name Cust_status
Sham Good
Rahul Excellent
Mohan Bad
Sachin Excellent
Dinesh Bad
Karan Bad
Rohan Average
INTERSECTION
In mathematics an intersection of two sets produces a set, which contains all
the elements that are common to both sets. Thus the intersection of two sets:
S1 = { 1 , 2 , 3 , 4 , 5 } and
S2 = { 4 , 5 , 6 , 7 , 8 }
would be { 4 , 5 } .
In above example both the tables are union compatible and can be intersected
together. The intersection operation on the R and S tables defined above would be :
Cust_name Cust_status
Sham Good
Sachin Excellent
The intersection operator is used in the similar fashion to the union operator,
but provides an ‘and ‘ function.
DIFFERENCE
In mathematics, the difference between two sets S1 and S2 produces a set,
which contains all the members of one set, which are not in the other. It is denoted by
“ – “ sign. The order in which the difference is taken is obviously significant. Thus the
difference between two sets:
S1 = { 1 , 2 , 3 , 4 , 5 }
Minus
75
BCA Sem-4 PAPER: BCAB2204T
S2 = { 4 , 5 , 6 , 7 , 8 }
Would be { 1 , 2 , 3 } and between
S2 = { 4 , 5 , 6 ,7 , 8 }
Minus
S1 = { 1, 2 , 3 , 4 , 5 }
would be { 6 , 7 , 8 }
As for the other set operations discussed so far, the difference operation can also be
performed on tables that are union compatible. The difference operation on the R and S (R –
S) defined above would return.
R–S
Cust_name Cust_status
Rahul Excellent
Mohan Bad
Dinesh Bad
And for S – R
Cust_name Cust_status
Karan Bad
Rohan Average
76
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-I
Q1: What are the basic algebric operations?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
………………………..…..............................................................................................
Q2. What is difference between algebric operations and boolean operations?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
………………………..…..............................................................................................
CARTESIAN PRODUCT
In mathematics, the Cartesian product of two sets is the set of all ordered pairs
of elements such that the first element in each pair belongs to the first set and the
second element in each pair belongs to the second set. It is denoted by cross (x). It is
for example, given two sets:
S1 = { 1 , 2 , 3 } and
S2 = { 4 , 5 , 6 }
The Cartesian product S1 x S2 is the set :
{ ( 1, 4 ), (1, 5 ), (1 , 6 ), ( 2, 4 ), (2, 5 ), (2 , 6 ), ( 3, 4 ), (3, 5 ), (3 , 6 ) }
Consider the two tables with sample population as below
Female
Name Job
Komal Clerk
Amita Sales
Sonia Production
Nidhi Clerk
Male
Name Job
Rohit Clerk
Amit Sales
Sohan Production
Nitin Clerk
77
BCA Sem-4 PAPER: BCAB2204T
Assume that the tables refer to male and female staff respectively. Now, in order
to obtain all possible inter-staff marriages, the Cartesian product can be taken giving
the Table MALE_FEMALE.
Male-Female
In order to preserve unique names for attributes, the original attribute names
have had to be concatenated with the original table names. The new table has also
been given an identity.
B) SPECIAL RELATIONAL OPERATIONS
There are four special relational algebra operations which are as under
1) SELECTION
2) PROJECTION
3) JOIN
4) DIVISION
Selection
The selection operator yields a horizontal subset of a given relation that is that
subset of tuples or rows of a table should be selected within the given relation for
which a particular condition is satisfied.
78
BCA Sem-4 PAPER: BCAB2204T
In mathematics a set can have any number of subsets. A set is said to be a
subset of another if all its members are also members of the other set. Thus, in the
following example:
S1 = { 1 , 2 , 3 , 4 , 5 }
S2 = { 2 , 3 , 4 }
S2 is a subset of S1. Since the body part of a table is a set, it is possible for it to
have subsets, that is a selection from its tuples can be used to form another relation.
However, this would be a meaningless operation of no new information were to
be gained from the new relation. On the other hand a subset if say an EMPLOYEE
relation , which contained all tuples where the employee represent those employees
who earn more than some given values of salary, would be useful. What is required is
that some explicit restriction be placed on the sub-setting operation.
Restriction as originally defined was defined on relations only and is achieved
using the comparison operators such as equal to ( = ), not equal to ( != ), greater than (
> ), less than ( < ), greater than or equal to (>=) and less than or equal to ( <= ).
Example : Consider the database having following tables :
The Supplier table
S1 Suneet 20 Qadian
S2 Ankit 10 Amritsar
S3 Amit 30 Amritsar
S4 Raj 20 Amritsar
79
BCA Sem-4 PAPER: BCAB2204T
Sno - Supplier number of supplier that is unique
Sname - Supplier name
City - City of the supplier
Status - Status of the city e.g A grade cities may have status 10 , B
grade cities may have status 20 and so on .
Examples :
S WHERE CITY = ‘ Qadian ‘
S1 Suneet 20 Qadian
S1 P1 300
PROJECTION
The projection operation on a table simply forms another table by copying
specified columns (both header and body parts) from original table eliminating any
duplicated rows. The projection operator yields a vertical subset of a given relation –
that is, the subset obtained by selecting specified attributes, in a specified left to right
order, and then eliminating duplicate tuples within the attributes selected. It is
denoted by pi (π). For example consider the table EMPLOYEE as shown:
Table Employee
Personnel_number Name Age Salary
80
BCA Sem-4 PAPER: BCAB2204T
23
43
B
π age,salary (employee)
Age Salary
23 7500
43 10000
23 10000
C
π personnel_number,name (employee)
Personnel_number Name
123 Sham
124 Karan
125 Rahul
JOIN
The most general form of join operation is called a theta join, where theta has
the same meaning as ‘compares with’ as it was used in the context of the restriction
operation. That is, it stands for any of the comparative operators equals, not equals,
greater than and so forth. A theta join is performed on two tables, which have one or
more columns in common which are domain compatible.
It forms a new table which contains all the columns from both the joined tables
whose tuples are those defined by the restriction applied.
For example consider the tables:
EMPLOYEE_PRODUCT
Name Product
Raja Pen
Sparsh Pen
Raja Pencil
Sparsh Rubber
PRODUCT_CUSTOMER
C_Product Customer
Pen Karan
81
BCA Sem-4 PAPER: BCAB2204T
Pen Suneet
Pencil Suneet
The tables list employees who make products and customers who buy those
products and can be joined over the columns ‘product’ and ‘c_product’ in both tables
since the values in both columns are domain compatible. The result of a theta join,
where the restriction is that the product attribute values in EMPLOYEE_PRODUCT
should be equal to the product attribute values in PRODUCT_CUSTOMER would be:
Table EMPLOYEE_PRODUCT_CUSTOMER
Note: If both tables have same common column then one of the common column
has to be renamed in the resultant table to preserve the uniqueness of the names in its
header part.
In the above example the theta operator was ‘equals’ and this , the most
common form of theta join is referred to as an equi-join. Note that an equi-join must
always result in a table which has pairs of columns like ‘product; and ‘c_product’ in the
above example, which contain identical lists of attribute values.
By far the most common form of join is a variation of the equi-join where this
duplication of column values is eliminated by taking a projection of the table which
includes only one of the duplicated columns. This is referred to as a natural join.
The natural join of the tables in the last example would give the table:
82
BCA Sem-4 PAPER: BCAB2204T
It may help in understanding the different types of join if the operation is looked
at from a different point of view. The join is actually a composite operator. The theta
join is a Cartesian product operation on the two tables followed by a restriction
operation on the resultant table.
Self-Check Exercise-II
Q3: Which algebric operation is used to find common elements in two tables?
Ans……………………………………………………………………………………………………
…………………………………………………………………………………………………………
………………………..…..........................................................................................
Q4. What is join?
Ans……………………………………………………………………………………………………
…………………………………………………………………………………………………………
………………………..…...........................................................................................
The tuples of the Cartesian product of the two tables in the earlier example
would be :
Name Product C_Product C_Customer
Raja Pen Pen Karan
Raja Pen Pen Suneet
Raja Pen Pencil Suneet
Sparsh Pen Pen Karan
Sparsh Pen Pen Suneet
Sparsh Pen Pencil Suneet
……. …… …… …..
Raja Pencil Pencil Suneet
The restriction operation on this product selects only those tuples from this
relation, which confirm to the restriction. In the example, the restriction was that the
‘product’ attributes should have equal values in each tuple and the result of this as
shown below:
83
BCA Sem-4 PAPER: BCAB2204T
Since theta equated to ‘equals’ this was an equi-join. By carrying out a further projection
operation which eliminates one of the duplicated ‘product’ column resulting from the equi-join,
the natural join is obtained.
Thus, Join operator is combination of Cartesian product, Selection and
Projection operator.
The examples given so far have all been of so-called inner joins. The fact that
Jones makes Rubbers is not recorded in any of the resultant tables from the joins,
because the joining values must exist in both tables. If it suffices that the value exist in
only one table, then a so-called outer join is produced.
An outer join of the EMPLOYEE_PRODUCT and PRODUCT_ CUSTOMER tables
exemplified above would return :
Employee_name Product_name Customer_name
Raja Pen Karan
Raja Pen Suneet
Sparsh Pen Karan
Sparsh Pen Suneet
Raja Pencil Suneet
Sparsh Rubber -
The expression A JOIN B is defined if and only if, for every unqualified attribute-name
that is common to A and B, the underlying domain is the same for both relations. Assume that
this condition is satisfied. Let the qualified attribute –names for A and B, in their left-to-right
order, be A.A1,.............[Link] AND B.B (m+1)……………., B.B (m+n) respectively;
Let Ci …….,Cj be the unqualified attribute name that are common to A and B
and let Br………..Bs be the unqualified attribute- names remaining for b (with
their relative order undisturbed) after removal of Ci,………..Cj.
Then A JOIN B defined to be equivalent to (A TIMES B ) [A.A1 ……….[Link] ,
[Link]……….[Link] ]
where [Link] = [Link]
and ……………..
and [Link] = [Link]……….
Apply this definition to JOIN operation on Emp and Dept tables with following
attributes:
EMP(empno,ename,job,sal,deptno)
DEPT(deptno,dname,loc)
EMP JOIN DEPT = EMP TIMES DEPT
[[Link],[Link],[Link],[Link],[Link],[Link],[Link]]
where [Link] = DEPT. deptno
So, w can say that JOIN is a combination of Product, Selection and Projection
operators. JOIN is an associative operator, which means:
(A JOIN B ) JOIN C = A JOIN ( B JOIN C ) .
JOIN is also commutative .
84
BCA Sem-4 PAPER: BCAB2204T
A JOIN B = B JOIN A
DIVISION
The division operator divides a dividend relation A of degree (means number of
columns in a relation ) m+n by a divisor relation B of degree n and produces a
resultant relation of degree m .
Relation A
Sno Pno
S1 P1
S1 P2
S1 P3
S1 P4
S1 P5
S1 P6
S2 P1
S2 P2
S3 P2
S4 P2
S4 P4
S4 P5
Relation B
CASE 1 CASE 2
Pno
P1
Pno
P2
P4
CASE 3
Pno
P1
P2
P3
P4
P5
P6
A DIVIDED BY B
85
BCA Sem-4 PAPER: BCAB2204T
S1
S2
Sno Sno
S1 S1
S4
In this example dividend relation A has two attributes of Sno,Pno (of degree 2) and
division relation B has only one attribute Pno (of degree 1 ). Then A divided by B gives
a resultant relation of degree 1. It means it has only one attribute of Sno.
A SNO * PNO
--- = ------------------ = SNO
B PNO
The resultant relation has those tuples that are common values of those
attributes, which appears in the resultant attribute sno .
For example ,in CASE 2,
P2 has Snos S1,S2,S3,S4
P4 has Snos S1,S4
S1, S4 are the common supplier who supply both P2 and P4. So the resultant
relation has tuples S1 and S4.
In CASE 3
There is only one supplier S1 who supply all the parts from P1 to P6.
1.6.3 Summary
Relational Algebra is a procedural language which specifies the operations to be
performed on the existing relations to derive result relations. Relational Algebric
operations can be divided into basic and special relational operators. Relational
Calculus is a non procedural language which is an alternate way of formulating
queries. It is based on Predicate Calculus which means to formulate set of predicates
to which the answer to a query must conform instead of specifying a series of
subsequent singular operations together with objects involved in these operations.
1.6.4 Keywords
Universal Set: a "universal set" is a set that contains all the elements or objects under
consideration for a particular discussion or problem.
Union: it is a fundamental operation that combines the elements of two or more sets
to create a new set.
Intersection: the intersection operation in set theory refers to the set of elements that
are common to two or more sets.
Difference: It refers to the set of elements that belong to one set but not to another.
Cartesian product: It is an operation in set theory and mathematics that combines
elements from two sets to create a new set of ordered pairs. It is denoted by the symbol
"×".
1.6.5 Short Answer Type Questions
Q1. What is relational Algebra and what are its uses?
Q2. Explain the following operations with examples:
86
BCA Sem-4 PAPER: BCAB2204T
1. Union 2. Intersection 3. Difference
4. Cartesian Product 5. Division
1.6.6 Long Answer Type Questions
Q1: Consider the following relational database schema consisting of the four
relation schemas:
passenger ( pid, pname, pgender, pcity)
agency ( aid, aname, acity)
flight (fid, fdate, time, src, dest)
booking (pid, aid, fid, fdate)
Answer the following questions using relational algebra queries;
a) Get the complete details of all flights to New Delhi.
b) Get the details about all flights from Chennai to New Delhi
c) Find only the flight numbers for passenger with pid 123 for flights to Chennai
before 06/11/2020.
d) Find the passenger names for passengers who have bookings on at least one
flight.
1.6.7 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New
Delhi.
• C. J. D ate, An introduction to database Systems, Sixth Edition, Addison Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.
87
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
1.7.0 Objectives
1.7.1 Introduction
1.7.2 Functional Dependency
[Link] Basic Concepts
[Link] Closure of a Set of Functional Dependencies
[Link] Closure of Attribute Sets
1.7.3 Decomposition
[Link] Desirable Properties of Decomposition
[Link] Dependency Preservation
[Link] Repetition of Information
1.7.4 Problems arising out of bad database Design
1.7.5 Summary
1.7.6 Keywords
1.7.7 Short Answer Type Questions
1.7.8 Long Answer Type Questions
1.7.9 Suggested Readings
1.7.0 Objectives
After completing this lesson, you will be able to:
• Define Functional Dependency and its importance in database design
• Understand decomposition of relation
• Understand the problems that arise due to bad database design
1.7.1 Introduction
The concept of functional dependency is the basis for Normalization. The
functional dependencies are the consequence of the interrelationships among
attributes of a relation (table) represented by some link or association. It must be taken
care that the database design must be very good and that needs careful decomposition
88
BCA Sem-4 PAPER: BCAB2204T
of the relations into further relations. In the following sections we will study how to
decompose the relations so that it leads to good database design. And if we do not do
decomposition with care it will result in bad database design which includes repetition
of data like problems.
1.7.2 Functional Dependencies
Functional dependencies play a key role in differentiating good database designs
from bad database design. A functional dependency is a type of constraint that is a
generalization of the notion of key.
[Link] Basic Concepts
Functional dependencies are constraints on the set of legal relations. They allow
us to express facts about the enterprise that we are modeling with our database.
Functional Dependency is a many-to-one relationship from one set of attributes
to another within a given relation.
We define the notion of a super-key as follows. Let R be a relation schema. A
subset K of R is a super-key of R if, in any legal relation r(R), for all pairs t1 and t2 of
tuples in r such that t1 ≠ t2, then t1[K] ≠ t2[K]. That is no two tuples in any legal relation
in r (R) may have the same value on attribute set K.
The notion of functional dependency generalizes the notion of super-key.
Consider a relation schema R, and let α R and β R. The functional dependency
α→β
holds on schema R if, in any legal relation r(R) for all pairs of tuples t1 and t2 in r such
that t1[α] = t2 [α], it is also the case that t1[β] = t2[β].
Using the functional-dependency notation, we say that K is a super-key of R if
K→ R. That is K is a super-key if, whenever t1[K] = t2 [K] it is also the case that t1[R] =
t2 [R] (that is t1 = t2).
Functional dependencies allow us to express constraints that we cannot express
with super-keys. Consider the schema :
Loan-info-schema = (loan-number, branch-name, customer-name, amount) which
is simplification of the lending-schema that we saw earlier. The set of functional
dependencies that we expect to hold on this relation schema is :
loan-number → amount
loan-number → branch-name
We would not, however, expect the functional dependency
loan-number → customer-name
to hold, since in general a given loan can be made to more than one customer (for
example, to both members of a husband – wife pair).
We shall use functional dependencies in two ways:
To test relations to see whether they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
89
BCA Sem-4 PAPER: BCAB2204T
To specify constraint on the set of legal relations. We shall thus concern
ourselves with only those relations that satisfy a given set of functional
dependencies. If we wish to constrain ourselves to relations on schema R
that satisfy a set F of functional dependencies, we say that F holds on R.
Let us consider the relation r of figure below:
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
Sample relation r
to see which functional dependencies are satisfied. Observe that A→C is satisfied.
There are two tuples that have an A value of a1. These have the same C value – namely
c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are
not other pairs of distinct tuples that have the same a value. The functional
dependency C—A is not satisfied however. To see that it is not, consider the tuples t1 =
(a2, b3,c2, d3) and t2 = (a3, b3, c2, d4) these two tuples have the same C values c2, but
they have different A values a2 and a3, respectively. Thus we have found a pair of
tuples t1 and t2 such that t1 [C] = t2 [C] but t1[A]≠t2[A].
Many other functional dependencies are satisfied by r, including, for example,
the functional dependency AB→D. Note that we use AB as shorthand for {A, B}, to
conform with standard practice. Observe that there is no pair of distinct tuples t1 and
t2 such that t1[AB] = t2[AB]. Therefore, if t1[AB] = t2[AB], it must be that t1 = t2 and thus
t1[D] = t2[D]. So satisfies AB→D.
Some functional dependencies are said to be trivial because they are satisfied by all relations.
For example, A→A is satisfied by all relations involving attribute A. Reading the definition of
functional dependency literally, we see that, for all tuples t1 and t2 such that t1[A] = t2[A] it is the
case that t1[A] = t2[A]. Similarly, AB → A is satisfied by all relations involving attribute A. In
general a functional dependency of the form α → β is trivial if β ≤ α.
To distinguish between the concepts of a relation satisfying a dependency and a
dependency holding on a schema, we return to the banking example. If we consider the
customer relation (on customer-schema) in Figure below, we see that customer-street →
customer-city is satisfied. However, we believe that in the real world, two cities can have
streets with the same name.
Customer-name Customer-street Customer-city
Jones Main Harrison
Smith North Rye
90
BCA Sem-4 PAPER: BCAB2204T
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford
The customer relation
Thus, it is possible, at some time to have an instance of the customer relation in which
customer-street→ customer-city is not satisfied. So we would not include customer-street→
customer-city in the set of functional dependencies that hold on Customer-schema.
Self-Check Exercise-I
Q1: What is functional dependency?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
…………..….................................................................................................................
Q2. Why is it necessary to identify functional dependencies?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
…………..…...............................................................................................................
In the loan relation (on loan-schema) of figure below, we see that the dependency loan-
number → amount is satisfied. In contrast to the case of customer-city and customer-street in
customer-schema, we do believe that the real world enterprise that we are modeling requires
each loan to have only one amount. Therefore we want to require that loan-number→ amount
be satisfied by the loan relation at all times. In other words, we require that the constraint loan
number→ amount hold on loan-schema.
The loan relation:
Loan-number Branch-name Amount
L-17 Downtown 1000
L-23 Redwood 2000
L-15 Perryridge 1500
91
BCA Sem-4 PAPER: BCAB2204T
L-14 Downtown 1500
L-93 Mianus 500
L-11 Round Hill 900
L-29 Pownal 1200
L-16 North Town 1300
L-18 Downtown 2000
L-25 Perryridge 2500
L-10 Brighton 2200
In the branch relation of Figure below, we see that branch-name→ assets is
satisfied, as is assets→ branch-name. We want to require that branch-name→ assets
hold on branch-schema. However we do not wish to require that assets→ branch-name
hold since it is possible to have several branches that have the same asset value.
92
BCA Sem-4 PAPER: BCAB2204T
Account-number→ branch-name
Account-number→ balance
• On depositor-schema:
No functional dependencies
93
BCA Sem-4 PAPER: BCAB2204T
Axioms or rules of inference provide a simpler technique for reasoning about functional
dependencies. In the rules that follow, we use Greek letters for sets of attributes, and
uppercase Roman letters from the beginning of the alphabet for individual attributes. We use
αβ to denote α U β.
We can use the following three rules to find implied functional dependencies. By
applying these rules repeatedly, we can find all of F+, given F. This collection of rules is
called Armstrong’s axioms in honor of the person who first proposed it.
• Reflexivity rule. If α is a set of attributes and β α, then
α → β holds.
• Augmentation rule. If α → β holds and γ is a set of attributes, then
γα → γβ holds.
• Transitivity rule. If α → β holds and β→ γ holds, then α → γ holds.
Armstrong’s axioms are sound, because they do not generate any incorrect
functional dependencies. They are complete, because for a given set F of functional
dependencies, they allow us to generate all F+.
Although Armstrong’s axioms are complete, it is tiresome to use them directly
for the computation of F+. To simplify matters further, we list additional rules. It is
possible to use Armstrong’s axioms to prove that these rules are correct.
• Union rule. If α → β holds and α → γ holds, then α → βγ holds.
• Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.
• Pseudotransitivity rule. If α → β holds and γβ→ δ holds, then αγ→ δ
holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A→ B, A→ C, CG→ H, CG→ I, B→ H}. We list several
members of F+ here.
• A→ H. Since A→ B and B→ H hold, we apply the transitivity rule. Observe
that it was much easier to use Armstrong’s axioms to show that A→ H
holds than it was to argue directly from the definitions, as we did earlier
in this section.
• CG→ HI. Since CG→ H and CG→ I, the union rule implies that CG→ HI.
• AG→ I. Since A→ C and CG→ I, the pseudotransitivity rule implies that
AG→ I holds.
Another way of finding that AG→ I holds is as follows. We use the augmentation
rule on A→ C to infer AG→ CG. Applying the transitivity rule to this dependency and
CG→ I, we infer AG→ I.
Figure below shows a procedure that demonstrates formally how to use
Armstrong’s axioms to compute F+. In this procedure, when a functional dependency is
added to F+, it may be already present, and in that case there is no change to F+. We
will also see an alternative way of computing F in next section.
94
BCA Sem-4 PAPER: BCAB2204T
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
until F does not change any further
+
The left-hand and right-hand sides of a functional dependency are both subsets
of R. Since a set of size n has 2n subsets, there are a total of 2 x 2 n = 2 n+1 possible
functional dependencies, where n is the number of attributes in R. Each iteration of
the repeat loop of the procedure, except the last iteration, adds at least one functional
dependency to F+. Thus, the procedure is guaranteed to terminate.
[Link] Closure of Attribute Sets
To test whether a set α is a super-key, we must devise an algorithm for computing
the set of attributes functionally determined by α. One way of doing this is to compute F+,
take all functional dependencies with α as the left-hand side, and take the union of the
right-hand sides of all such dependencies. However doing so can be expensive, since F+ can
be large.
An efficient algorithm for computing the set of attributes functionally determined by
α is useful not only for testing whether α is a super-key, but also for several other tasks, as
we will see later in thus section.
Let α be a set of attributes. We call the set of all attributes functionally
determined by α under a set F of functional dependencies the closure of α under F; we
denote it by α+. Figure below shows an algorithm written in pseudocode to compute α +.
The input is a set F of functional dependencies and the set α of attributes. The output
is stored in the variable result.
result := α
while (changes to result) do
for each functional dependency β→ γ in F do
begin
if β ≤ result then result := result U γ
end
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the
functional dependencies defined in preceding section. We start with result = AG. The
first time that we execute the while loop to test functional dependency, we find that
95
BCA Sem-4 PAPER: BCAB2204T
• A→ B cause us to include B in result. To see fact, we observe that A→ B is in F,
A ≤ result (which is AG), so result := result U B.
• A→ C causes result to become ABCG.
• CG→ H causes result to become ABCGH.
• CG→ I causes result to become ABCGHI.
The second time that we execute the while loop, no new attributes are added to
result, and the algorithm terminates.
Let us see why the algorithm of Figure above is correct. The step is correct, since α→ α
always holds (by the reflexivity rule). We claim that for any subset β of result, α→β. Since we
start the while loop with α→ result being true, we can add γ to result only if β ≤ result and β→
γ. But then result → β by the reflexivity rule, so α→ β by transitivity. Another application of
transitivity shows that α→ γ (using α→ β and β→ γ). The union rule implies that α→ result U γ,
so functionally determines any new result generated in the while loop. Thus any attribute
returned by the algorithm is in α+.
It is easy to see that the algorithm finds all α+. If there is an attribute in α+ that is not
yet in result, then there must be a functional dependency β→ γ for which β ≤ result, and at
least one attribute in γ is not in result.
It turns out that, in the worst case, this algorithm may take an amount of time
quadratic in the size of F.
There are several uses of the attribute closure algorithm:
• To test if α is a super-key, we compute α+, and check if α+ contains all
attributes of R.
• We can check if a functional dependency α→ β holds (or, in other words,
is in F+) by checking if β α+. That is we compute α+ by using attribute
closure and then check if it contains β. This test is particularly useful, as
we will see later in this chapter.
• It gives us an alternative way to compute F+: for each γ R, we find the
closure γ+, and for each S γ+, we output a functional dependency
γ → S.
1.7.3 Decomposition
The bad design of database suggests that we should decompose a relation schema that has
many attributes into several schemas with fewer attributes. Careless decomposition, however may
lead to another form of bad design.
Consider an alternative design in which we decompose Lending-schema into the
following two-schemas:
Branch-customer-schema = (branch-name, branch-city, assets, customer-name)
Customer-loan-schema = (customer-name, loan-number, amount)
96
BCA Sem-4 PAPER: BCAB2204T
Using the lending relation described in “Problems arising out of bad database
design” topic (Figure W) that we will discuss next, we construct our new relations
branch-customer (Branch-customer) and customer-loan (customer-loan-schema):
branch–customer = Π branch-name, branch-city, assets, customer-name (lending)
customer-loan = Π customer-name, loan-number, amount (lending)
Figure X and Y respectively show the resulting branch-customer and customer-
name relations.
Branch-name Branch-city Assets Customer-name
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Mianus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
Pownal Bennington 300000 Williams
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson
Perryridge Horseneck 1700000 Glenn
Brighton Brooklyn 7100000 Brooks
Figure X: The relation branch-customer
97
BCA Sem-4 PAPER: BCAB2204T
Figure Z below shows the result of computing branch-customer customer-loan.
Branch- Branch-city Assets Customer- Loan- Amount
name name number
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Perryridge Horseneck 1700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-17 1000
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-15 1500
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
When we compare this relation and the lending relation with which we started
(Figure W), we notice a difference: Although every tuple that appears in the lending
relation appears in branch-customer customer-loan, there are tuples in branch-
customer customer-loan that are not in lending. In our example, branch-customer
customer-loan has the following additional tuples:
(Downtown, Brooklyn, 9000000, Jones, L-93, 5000
(Perryridge, Horseneck, 1700000, Hayes L-16, 1300)
(Mianus, Horseneck, 400000 Jones, L-17, 1000)
(North Town, Rye 3700000, Hayes L-15, 1500)
Consider the query “Find all bank branches that have made a loan in an
amount less than $1000. If we look back at Figure W, we see that the only branches
with loan amounts less than $1000 are Mianus and Round Hill. However, when we
apply the expression :
Π branch-name (σ amount <1000 (branch-customer customer-loan)
We obtain three branch names Mianus, Round Hill and Downtown.
A closer examination of this example shows why. If a customer happens to have
several loans from different branches, we cannot tell which loan belongs to which
branch. Thus when we join branch-customer and customer-loan, we obtain not only the
tuples we had originally in lending, but also several additional tuples. Although we
have more tuples in branch-customer customer-loan, we actually have less information.
98
BCA Sem-4 PAPER: BCAB2204T
We are no longer able, in general, to represent in the database information about which
customers are borrowers from which branch. Because of this loss of information, we
call the decomposition of lending-schema into Branch-customer-schema and customer-
loam-schema a lossy decomposition, or a lossy-join decomposition. A decomposition
that is not a lossy join decomposition is a lossless join decomposition. It should be
clear from our example that a lossy-join decomposition is, in general a bad database
design.
Why is the decomposition lossy? There is one attribute in common between
branch customer-schema and customer-loan-schema:
Branch-customer-schema ∩ customer-loan-schema = {customer-name}
The only way that we can represent a relationship between, for example, loan number
and branch-name is through customer-name. This representation is not adequate because a
customer may have several loans, yet these loans are not necessarily obtained from the same
branch.
Let us consider another alternative design, in which we decompose Lending-
schema into the following two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
There is one attribute in common between these two schemas:
Branch-loan-schema ∩ customer-loan-schema = {branch-name}
Thus the only way that we can represent a relationship between for example
customer-name and asset is through branch-name. The difference between this example
and the preceding one is that the assets of a branch are the same, regardless of the
customer to which we are referring, whereas the lending branch associated with a
certain loan amount does depend on the customer to which we are referring. For a
given branch-name, there is exactly one assets value and exactly one branch-city;
whereas a similar statement cannot be made for customer-name. That is the functional
dependency
Branch-name→ {assets, branch-city}
holds, but customer-name does not functionally determine loan-number.
The notion of lossless joins is central to much of relational database design.
Therefore, we restate the preceding examples more concisely and more formally. Let r
be a relational schema. A set of relational schema {R1, R2, … , Rn} is a decomposition of
R if
R = R1 U R2 U … U Rn
That is {R1, R2, … , Rn} is a decomposition of R if, for i = 1,2,……n, each Ri is a subset
of R, and every attribute in R appears in at least one Ri.
Let r be a relation on schema r, and let ri = ΠRi (r) for i = 1,2….n . That is {r1, r2,
… , rn} is the database that results from decomposition of r into {R1, R2,……Rn} it is
always the case that
99
BCA Sem-4 PAPER: BCAB2204T
r ≤ r1 ∞ r 2 ∞ … ∞ r n
To see that this assertion is true consider a tuple t in relation r,. When we compute
the relations r1, r2,…rn the tuple t gives rise to one tuple ti in each ri, i = 1,2...n .These n
tuples combine to regenerate t when we compute r1 ∞ r2 ∞ … ∞ rn. The details are left for you
to complete as an exercise. Therefore every tuple in r appears in r1 ∞ r2 ∞ … ∞ rn.
In general r ≠ r1 ∞ r2 ∞ … ∞ rn. As an illustration, consider our earlier example in
which
• n=2
• R = Lending –schema.
• R1 = Branch-customer-schema.
• R2 = customer-loan-schema.
• r = the relation shown in Figure W.
• r1 = the relation shown in Figure X.
• r2 = the relation shown in Figure Y.
• r1 ∞ r2 = the relation shown in Figure Z.
Note that the relations in Figure W and Z are not the same.
To have a lossless-join decomposition, we need to impose constraints on the set
of possible relations. We found that the decomposition of Lending-schema into Branch-
schema and Loan-info-schema is lossless because the functional dependency.
branch-name → branch-city assets
holds on branch-schema We say that a relation is legal if it satisfies all rules, or
constraints that we impose on our database.
Let C represent a set of constraints on the database and let R be a relation
schema. A decomposition {R1, R2…….Rn} of R is a lossless join decomposition if for all
relations r on schema R that are legal under C,
r = ΠR1 (r) ∞ ΠR2 (r) ∞ … ∞ ΠRn (r)
Self-Check Exercise-II
Q3. What is transitive closure?
Ans…................................................................................................................
..…..................................................................................................................
…...................................................................................................................
Q4. Which algebric notation is used to list tuples??
Ans…................................................................................................................
….....................................................................................................................
….....................................................................................................................
100
BCA Sem-4 PAPER: BCAB2204T
[Link] Desirable Properties of Decomposition
We can use a given set of functional dependencies in designing a relational
database in which most of the undesirable properties discussed above do not occur.
When we design such systems, it may become necessary to decompose a relation into
several relations.
Lending-schema = (branch-name, branch-city, assets, customer-name, loan-
number, amount)
The set F of functional dependencies that we require to hold Lending schema
are : branch-name → {branch-city, assets}
loan-number → {amount, branch-name}
Lending-schema is an example of a bad database design. Assume that we
decompose it to the following three relations:
Branch-schema = (branch-name, branch-city, assets)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
We claim that this decomposition has several desirable properties, which we
discuss next.
Lossless-join decomposition
When we decompose a relation into a number of smaller relations, it is crucial
that the decomposition be lossless. We must first present a criterion for determining
whether decomposition is lossy.
Let R be a relation schema, and let F be a set of functional dependencies on R.
Let R1 and R2 form a decomposition of R. This decomposition is a lossless-join
decomposition of R if at least one of the following functional dependencies is in F+.
o R1 ∩ R2 → R1
o R1 ∩ R2 → R2
In other words if R1 ∩ R2 forms a super-key of either R1 or R2 the decomposition
of r is a lossless-join decomposition. We can use attribute closure to efficiently test for
super-keys as we have seen earlier.
We now demonstrate that our decomposition of Lending-schema is a lossless-
join decomposing Lending-schema into two schemas:
Branch-schemas = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since branch-name → {branch-name, assets} the augmentation rule for
functional dependencies implies that
Branch-name→ {branch-name, branch-city, assets}
Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our
initial decomposition is a lossless-join decomposition.
Next we decompose loan-info-schema into
101
BCA Sem-4 PAPER: BCAB2204T
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
This step results in a lossless-join decomposition since loan-number is a
common attribute and loan-number→ amount branch-name.
For the general case of decomposition of a relation into multiple parts at once
the test for lossless join decomposition is more complicated.
While the test for binary decomposition is clearly a sufficient condition for
lossless join, it is a necessary condition only if all constraints are functional
dependencies.
[Link] Dependency Preservation
There is another goal in relational database design: dependency preservation.
When an update is made to the database, the system should be able to check that the
update will not create an illegal relation-that is, one that does not satisfy all the given
functional dependencies. If we are to check updates efficiently, we should design
relational-database schemas that allow update validation without the computation of
joins.
To decide whether joins must be computed to check an update, we need to determine
what functional dependencies can be tested by checking each relation individually. Let F be
a set of functional dependencies on a schema R and let R1, R2…..Rn be a decomposition of R.
The restriction of F to Ri is the set Fi of all functional dependencies in F+ that include only
attributes of Ri. Since all functional dependencies in a restriction involve satisfaction of only
one relation schema, it is possible to test such a dependency for satisfaction by checking
only one relation.
Note that the definition of restriction uses all dependencies in F+, not just those in
F. For instance, suppose F= {A→B, B→C} and we have a decomposition into AC and AB.
The restriction of F to AC is then A→C, since A→C is in F+ even though it is not in F.
The set of restrictions F1, F2….Fn is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F’ =
F1 U F2 U …..U Fn. F’ is a set of functional dependencies on schema R but in general F
≠ F. However even if F’ ≠ F may be that F’+ = F+. If the latter is true, then every
dependency in F is logically implied by F’ and if we verify that F’ is satisfied we have
verified that F is satisfied. We say that a decomposition having the property F’+ = F+ is a
dependency preserving decomposition.
Figure V shows an algorithm for testing dependency preservation. The input is a set
D = {R1, R2….Rn} of decomposed relation schemas and a set F of functional dependencies.
This algorithm is expensive since it requires computation of F; we will describe another
algorithm that is more efficient after giving an example testing for dependency preservation.
compute F+;
for each schema Ri in D do
begin
102
BCA Sem-4 PAPER: BCAB2204T
Fi = the restriction of F+ to Ri
end
F’:=Φ
for each restriction Fi do
begin
F’ = F’ U Fi
end
compute F’+;
if (F’+ = F+) then return (true)
else return (false);
Figure V: Testing for dependency preservation
103
BCA Sem-4 PAPER: BCAB2204T
Note that instead of precomputing the restriction of F on Ri and using it for
computing the attribute closure of result, we use attribute closure on (result ∩ Ri) with
respect to F, and then intersect it with Ri, to get an equivalent result. This procedure
takes polynomial time, instead of the exponential time required to compute F+.
[Link] Repetition of Information
The decomposition of Lending-schema does not suffer from the problem of
repetition of information that we will discuss in section about Bad Database Design. In
Lending-schema, it was necessary to repeat the city and assets of a branch for each
loan. The decomposition separates branch and loan data into distinct relations,
thereby eliminating this redundancy. Similarly observe that, if a single loan is made to
several customers, we must repeat the amount of the loan once for each customer (as
well as the city and assets of the branch) in Lending-schema. In the decomposition, the
relation on schema Borrower-schema contains the loan-number, customer-name
relationship and not other schema does. Therefore we have one tuple for each
customer for a loan in only the relation on Borrower-schema. In the other relational
involving loan-number (those on schemas Loan-schema and Borrower-schema) only one
tuple per loan needs to appear.
1.7.4 Problems arising out of bad database design (Pitfalls in Relational-Database
design)
Let us look at what can go wrong in a bad database design. Among the
undesirable properties that a bad design may have are:
• Repetition of information
• Inability to represent certain information.
We shall discuss these problems with the help of a modified database design for
our banking example: Suppose the information concerning loans is kept in one single
relation, lending which is defined over the relation schema
Lending-schema = (branch-name, branch-city, assets, customer-
name, loan-number, amount)
Figure below shows an instance of the relation lending (Lending-schema). A tuple t in
the lending relation has the following intuitive meaning:
• t[assets] is the asset figure for the branch named t[branch-name]
• t[branch-city] is the city which the branch named t[branch-name] is
located
• t[loan-number] is the number assigned to a loan made by the branch
named t[branch-name] to the customer named t[customer-name]
• t[amount] is the amount of the loan whose number is t[loan-number]
Suppose that we wish to add a new loan to our database. Say that the loan is
made by the Perryridge branch to Adams in the amount of $1500. Let the loan-number
be L-31. In our design, we need a tuple with values on all the attributes of Lending
104
BCA Sem-4 PAPER: BCAB2204T
schema. Thus we must repeat the asset and city data for the Perryridge branch, and
must add the tuple
(Perryridge, Horseneck, 1700000, Adams, L-31, 1500)
to the lending relation. In general, the asset and city data for a branch must appear
once for each loan made by that branch.
Branch-name Branch-city Assets Customer-name Loan- Amount
number
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
Figure W: Sample lending relation
The repetition of information in our alternative design is undesirable. Repeating
information wastes space. Furthermore it complicates updating the database. Suppose
for example, that the assets of the Perryridge branch change from 1700000 to
1900000. Under our original design, one tuple of the branch relation need to be
changed. Under our alternative design many tuples of the lending relation need to be
changed. Thus updates are more costly under the alternative design than under the
original design. When we perform the update in the alternative design database, we
must ensure that every tuple pertaining to the Perryridge branch is updated, or else
our database will show two different asset values for the Perryridge branch.
That observation is central to understanding why the alternative design is bad.
We know that a bank branch has a unique value of assets, so given a branch name we
can uniquely identify the assets value. On the other hand, we know that a branch may
make many loans, so given a branch name, we cannot uniquely determine a loan
number. In other words, we say that the functional dependency.
branch-name → assets
Holds on Lending-schema, but we do not expect the functional dependency
branch-name → loan- number to hold. The fact that a branch has particular value of
assets and the fact that a branch makes a loan are independent, and, as we have seen,
these facts are best represented in separate relations. We shall see that we can use
functional dependencies to specify formally when a database design is good.
105
BCA Sem-4 PAPER: BCAB2204T
Another problem with the Lending-schema design is that we cannot represent
directly the information concerning a branch 9branch-name, branch-city, assets)
unless there exists at least one loan at the branch. This is because tuples in the
lending relation require value for loan-number, amount and customer-name.
One solution to this problem is to introduce null values, as we did to handle updates
through views. However, Null values are difficult to handle. If we are not willing to deal with Null
values, then we can create the branch information only when the first loan application at that
branch is made. Worse, we would have to delete this information when all the loans have been
paid. Clearly, this situation is undesirable, since, under our original database design, the branch
information would be available regardless of whether or not loans are currently maintained in
the branch, and without restoring to null values.
1.7.5 Summary
Functional dependencies play a key role in differentiating good database designs
from bad database design. An attribute Y of a relation R is said to be functionally
dependent upon attribute X of relation R if and only if for each value of X in R has
associated with it only one of Y in R at any given time. It is represented by as X-> Y,
where X attributes is known as determinant and Y is known as determined. Using the
concept of Functional Dependencies we decompose the relations. The bad design of
database suggests that we should decompose a relation schema that has many
attributes into several schemas with fewer attributes. Careless decomposition, however
may lead to another form of bad design. When we decompose a relation into a number
of smaller relations, it is crucial that the decomposition be lossless.
1.7.6 Keywords
Determinant: In a functional dependency X -> Y, X is referred to as the determinant.
The determinant uniquely determines the values of the dependent attribute Y.
Dependent: In a functional dependency X -> Y, Y is referred to as the dependent. The
values of Y depend on the values of the determinant X.
Transitivity: It refers to a property of functional dependencies. It describes how
dependencies can be inferred or propagated through a set of functional dependencies.
1.7.7 Short Answer Type Questions
Q1. Decompose relation employee(ID,name,street,Credit,street,city,salary) .
Q2. How can we combine two functional dependecies:
a) A->BC b) A->B
Q3. How many superkeys are in the given relation R(A,B,C,D,E) with the
following functional dependecies:
a) ABC -> DE and b) D -> AB
1.7.8 Long Answer Type Questions
Q1. What do you mean by Functional Dependency? What is its importance in
Database design? Explain with example.
106
BCA Sem-4 PAPER: BCAB2204T
Q2. Why we need decomposition? What is its need? What are the steps involved
in decomposing the relations.
Q3. What are the various problems that arise due to bad database design?
1.7.9 Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.
107
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
NORMALIZATION
1.8.0 Objectives
1.8.1 Introduction
1.8.2 Normalization
1.8.3 First Normal Form
1.8.4 Second Normal Form
1.8.5 Third Normal Form
1.8.6 Boyce-Codd Normal Form
1.8.7 Multi-valued Dependency
1.8.8 Fourth Normal Form
1.8.9 Join Dependencies and Fifth Normal Form
1.8.10 Database Design Process
1.8.11 Summary
1.8.12 Keywords
1.8.13 Short Answer Type Questions
1.8.14 Long Answer Type Questions
1.8.15 Suggested Readings
1.8.0 Objectives
After completing this lesson, you will be able to:
• Define Normalization, its need and importance
• Various types of Normal Forms
• Define Multivalued Functional Dependency
• Understand database design process
1.8.1 Introduction:
In this lesson, we will discuss the normalization process and define the first
three normal forms for relation schemas. The definitions of second and third normal
form presented here are based on the functional dependencies and primary keys of a
relation schema. More general definitions of these normal forms, which take into
account all candidate keys of a relation rather than just the primary key, are also
presented. We also define Boyce-Codd Normal Form (BCNF), and further normal
forms that are based on other types of data dependencies. We first informally discuss
what normal forms are and what the motivation behind their development was. We
then present first normal form (1NF). Then we present definitions of second normal
108
BCA Sem-4 PAPER: BCAB2204T
form (2NF) and third normal form (3NF) respectively that are based on primary keys.
Then we will proceed for multivalued dependency and further the fourth and fifth
Normal Forms that are based on MVDs. In the last we will discuss about the
database design process.
1.8.2 Normalization
The normalization process as first proposed by Codd (1972) takes a relation
schema through a series of tests to “certify” whether or not, it belongs to a certain
normal form. Initially Codd proposed three normal forms, which he called first,
second and third normal form. A stronger definition of 3NF was proposed later by
Boyce and Codd and is known as Boyce-Codd normal form. All these normal forms
are based on the functional dependencies among the attributes of a relation. Later
fourth normal form (4NF) and a fifth normal forms (5NF) were proposed, based on the
concepts of multi-valued dependencies and join dependencies, respectively.
Normalization of data can be looked on as a process during which unsatisfactory
relation schemas are decomposed by breaking up their attributes into smaller
relation schemas that possess desirable properties. One objective of the original
normalization process forms provides database designers with:
• A formal framework for analyzing relation schemas based on their keys and on
the functional dependencies among their attributes.
• A series of tests that can be carried out on individual relation schemas
so that the relational database can be normalized to any degree. When
a test fails, the relation violating that test must be decomposed into
relations that individually meet the normalization tests.
• To free relations from undesirable insertion, deletion and update
anomalies.
Normal forms, when considered in isolation from other factors, do not guarantee a
good database design. It is generally not sufficient to check separately that each relation
schema in the database is, say, in BCNF or 3NF. Rather, the process of normalization
through decomposition must also confirm the existence of additional properties that the
relation schemas, taken together, should possess. Two of these properties are:
• The loss less join or no additive join property, which guarantees that
the spurious tuple problem does not occur
• The dependency preservation property, which ensures that all
functional dependencies are represented in some of the individual
resulting relations.
In this section we concentrate on an intuitive discussion of the normalization
process. Notice that the normal forms mentioned in this section are not only the possible
ones. Additional normal forms may be defined to meet other desirable criteria, based on
additional types of constraints. The normal forms up to BCNF are defined by considering
only the functional dependency and key constraints, whereas 4NF considers an additional
constraint called a multi-valued dependency and 5NF considers an additional constraint
called a join dependency. The practical utility of normal forms becomes questionable when
109
BCA Sem-4 PAPER: BCAB2204T
the constraints on which they are based are hard to understand or to detect by the
database designers and users who must discover these constraints.
Another point worth noting is that the database designers need not normalize
to the highest possible normal form. Relations may be left in lower normal forms for
performance reasons.
Before proceedings further, we recall the definitions of keys of a relation
schema. A super key of a relation schema R = {A1, A2,…………, An} is a set of
attributes S (sub set of) R with the property that no two tuples t1 and t2 in any
legal relation state r of R will have t1[S] = t2[S]. A key K is a super-key with the
additional property that removal of any attribute from K will cause K not to be a
super-key any more. The difference between a key and super key is that a key has to
be “minimal” that is, if we have a key K = {A1, A2……., Ak} then K – A is not a key for
1<=i<=k. In figure given below {SSN} is a key for EMPLOYEE, whereas {SSN}, {SSN,
ENAME}, {SSN, ENAME, BDATE} etc. are all super keys.
EMPLOYEE
f.k.
ENAME SSN BDATE ADDRESS DNUMBER
p.k.
If relation schema has more than one “minimal” key, each is called a
candidate key. One of the candidates keys is arbitrarily designated to be the primary
key, In figure above {SSN} is the only candidates key for EMPLOYEE, so it is also the
primary key.
An attribute of relation schema R is called a prime attribute of R if it is a
member of any key of R. An attribute is called nonprime if it is not a prime attribute-
that is, if it is not a member of any candidate key.
We now present the first three normal forms: 1NF, 2NF and 3NF. These were
proposed by Codd (1972) as a sequence to ultimately achieve the desirable state of
3NF relations by progressing through the intermediate states of 1NF and 2NF if
needed.
1.8.3 First Normal Form (1 NF)
First normal form is now considered to be part of the formal definition of a
relation; historically, it was defined to disallow multi-valued attributes, composite
attributes, and their combinations. It states that the domains of attributes must
include only atomic (simple, indivisible) values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute.
Hence, 1NF disallows having a set of values, a tuple of values or a combination of
both as an attribute value for a single tuple. In other words, 1NF disallows “relations
within relations” or “relations as attributes of tuples”. The only attribute values
permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT relation schema shown in following figure, whose
primary key is DNUMBER, and suppose that we extend it by including the
110
BCA Sem-4 PAPER: BCAB2204T
DLOCATIONS attribute shown within dotted lines. We assume that each department
can have a number of locations. The DEPARTMENT schema and example extension
are shown in Figures that follow. As we can see, this is not in 1NF because
DLOCATIONS is not an atomic attribute, as illustrated by the first tuple in Figure (b).
There are two ways we can look at the DLOCATIONS attribute:
• The domain of DLOCATIONS contains atomic values, but some tuple
can have a set of these values. In this case, DNUMBER*→
DLOCATIONS.
• The domain of DLOCATIONS contains sets of values and hence in
monatomic. In this case, DNUMBER→DLOCATIONS, because each set is
considered a single member of the attribute domain (In this case we can
consider the domain of DLOCATIONS to e the power set of single locations;
that is, the domain is made up of all possible subsets of the set of single
locations).
a)
DEPARTMENT
DNAME DNUMBER DMGRSSN DLOCATIONS
b)
DEPARATMENT
DNAME DNUMBER DMGRSSN DLOCATIONS
c) DEPARTMENT
DNAME DNUMBER DMGRSSN DLOCATIONS
Figure showing Normalization into 1NF. (a) A relation schema that is not in 1NF. (b)
Example relation instance. (c) 1NF relation with redundancy.
111
BCA Sem-4 PAPER: BCAB2204T
In either case, the DEPARTMENT relation of figures above is not in 1NF; in
fact, it does not even qualify as a relation, we break up its attributes into the two
relations DEPARTMENT and DEPT_LOCATIONS shown in Figure here:
DEPARTMENT DEPT_LOCATIONS
The idea is to remove the attribute DLOCATIONS that violates 1NF and place
it in a separate relation DEPT_LOCATIONS along with the primary key DNUMBER of
DEPARTMENT. The primary key of this relation is the combination {DNUMBER,
DLOCATION}, as shown in Figure above. A distinct tuple in DEPT_LOCATIONS exists
for each location of a department. The DLOCATIONS attribute is removed from the
DEPARTMENT relation of Figure showing the normalization into 1NF, decomposing
the non-1NF relation into two 1NF relations DEPARTMENT and DEPT_DLOCATIONS
of Figure above.
Notice that a second way to normalize into 1NF is to have a tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in Figure (c). In
this case, the primary key becomes the combination {DNUMBER, DLOCATION}, and
redundancy exists in the tuples. The first solution is superior because it does not suffer
from this redundancy problem. In fact, if we choose the second solution, it will be
decomposed further during subsequent normalization steps into the first solution.
The first normal form also disallows composite attribute that are themselves
multi-valued. These are called nested relations because each tuple can have a
relation within it. Figure A below shows how an EMP_PROJ relation can be shown if
nesting is allowed. Each tuple represents an employee entity, and a relation PROJS
(PNUMBER, HOURS} within each tuple represents the employee’s projects and the
hours per week that the employee works on each project. The schema of the
EMP_PROJ relation can be represented as follows:
EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})
Self-Check Exercise-I
Q1: What are the challenges in non-normalized data handling?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
Q2. Who proposed the concept of Normalization?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
112
BCA Sem-4 PAPER: BCAB2204T
The set braces {} identify the attribute PROJS as multi-valued, and we list the component
attribute that form PROJS between parentheses (). Interestingly, recent research into the
relational model is attempting to allow and formalize nested relations, which were
disallowed early on by 1NF.
Notice that SSN is the primary key of the EMP_PROJ relation in Figure A(a)
and (b), while PNUMBER is the partial primary key of each nested relation; that is,
within each tuple, the nested relation attributes into a new relation and propagate
the primary key into; the primary key of the new relation will combine the partial key
with the primary key of the original relation. Decomposition and primary key
propagation yield the schemas shown in Figure A(c).
Here is the figure A:
a)
EMP_PROJ
SSN ENAME PROJS PNUMBER HOURS
b) EMP_PRO
J
SSN ENAME PNUMBER HOURS
c) EMP_PROJ1
SSN ENAME
EMP_PROJ2
SSN PNUMBER HOURS
113
BCA Sem-4 PAPER: BCAB2204T
restricting relations to 1NF leads to the problems associated with multi-valued
dependencies and 4NF.
1.8.4 Second Normal Form (2NF)
Second Normal form is based on the concept of full functional dependency. A
functional dependency X→Y is a full functional dependency if removal of any
attribute a from X means that the dependency does not hold any more; that is, for
any attribute A ε X,
(X-{A}) *→ Y. A functional dependency X→Y is a partial dependency if some attribute
A ε X can be removed from X and the dependency still holds; that is for some A ε X,
(X – {A})→Y. In figure below, {SSN, PNUMBER} → HOURS is a full dependency (neither
SSN →HOURS nor PNUMBER→HOURS holds). However, the dependency {SSN
PNUMBER}→ENAME is partial because SSN→ENAME holds.
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATION
fd1
fd2
fd3
Figure B:
114
BCA Sem-4 PAPER: BCAB2204T
a)
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATION
fd1
fd2
fd3
2NF Normalization
EP1 EP2
SSN PNUMBER HOURS SSN ENAME
fd1 fd2
fd3
115
BCA Sem-4 PAPER: BCAB2204T
EMP_DEPT
3NF Normalization
ED1
ENAME SSN BDATE ADDRESS DNUMBER
ED2
DNUMBER DNAME DMGRSSN
Intuitively, we see that Ed1 and Ed2 represent independent entity facts about
employees and departments. A NATURAL JOIN operation on ED1 and ED2 will
recover the original relation EMP_DEPT without generating spurious tuples.
1.8.6 Boyce-Codd Normal Form (BCNF)
Boyce-Codd normal form is stricter than 3NF, meaning that every relation in
BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF,
intuitively, we can see the need for a stronger normal form than 3NF by going back to
the LOTS relation schema of Figure below with its four functional dependencies fd1
through fd4.
LOTS
PROPERTY_ID# COUNTY_NAME LOT# AREA PRICE TAX_RATE
fd1
fd2
fd3
fd4
Suppose that we have thousands of lots in the relation but the lots are from
only two counties; Marion County and Liberty County. Suppose also that lot size in
Marion County are only 0.5, 0.6, 0.7, 0.8, l 0.9, and 2.0 acres. In such a situation we
116
BCA Sem-4 PAPER: BCAB2204T
should have the additional functional dependency fd5; AREA → COUNTY_NAME. If
we add this to the other dependencies, the relation schema LOTS1A still is in 3NF
because COUNTY_NAME is a prime attribute.
The area versus county relationship represented by fd5 can be represented by
16 tuples in a separate R (AREA, COUNTY_NAME) since there are only 16 possible
AREA values. This representation reduces the redundancy of repeating the same
information in the thousands of LOTS1A tuples. BCNF is a stranger normal form that
would disallow LOTS1A and suggest the need for decomposing it.
This definition of Boyce-Codd differs slightly from the definition of 3NF. A
relation schema R is in BCNF if whenever a functional dependency X → A holds
in R, then X is a super-key of R. The only difference between BCNF and 3NF is that
condition (b) of 3NF, which allows A to be prime if X is not a super-key, is absent
from BCNF.
In our example, fd5 violates BCNF in LOTS1A because AREA is not a super-
key of LOTS1A. Note that fd5 satisfies 3NF LOTS1A because COUNTY_NAME is a
prime attribute (Condition (b)), but this condition does not exist in the definition of
BCNF. We can decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY,
shown in Figure C(a).
In practice most relation schema that are in 3NF are also in BCNF. Only if a
dependency X A exists in a relation schema R with X not a super-key and A a prime
attribute will R be in 3NF but not in BCNF. The relation schema R shown in Figure
C(b) illustrates the general case of such a relation.
It is best to have relation schemas in BCNF, if that is not possible, 3NF will do.
However, 2NF and 1 NF are not considered good relation schema designs. These
normal forms were developed historically as stepping stones to 3NF and BCNF.
Here is Figure C:
(a) LOTS1A
PROPERTY_ID# COUNTY_NAME LOT# AREA
fd1
fd2
fd3
BCNF Normalization
LOTS1AX LOTS1AY
PROPERTY_ID# AREA LOT# AREA COUNTY_NAME
117
BCA Sem-4 PAPER: BCAB2204T
A B C
fd1
fd2
Self-Check Exercise-II
Q3. How to identify if the data is in which normal form?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
Q4. Does normalization reduce the no of tables?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
118
BCA Sem-4 PAPER: BCAB2204T
(a) EMP
ENAME PNAME DNAME
(c) SUPPLY
SNAME PARTNAME PROJNAME
(d) R1 R2 R3
SNAME PARTNAME SNAME PROJNAME PARTNAME PROJNAME
Smith Bolt Smith Smith ProjX Smith Bolt ProjX Nut ProjY
Nut Adamsky Bolt ProjY Adamsky Proj Bolt ProjY Nut ProjZ
Walton Nut Ada Y Walton ProjZ A Nail ProjX
msky Nail damsky ProjX
119
BCA Sem-4 PAPER: BCAB2204T
• t3 [X] = t4[X]=t1[X]=t2[X]
• t3[4]=t1 [Y] and t4[Y]=t2[Y]
• t3[R-(XY)] = t2[R-(XY)] and t4[R-(XY)] = t1[R-(XY)].
Whenever X→→Y holds, we say that X multi-determines Y. Because of the
symmetry in the definition, whenever X→→Y holds in R, so does X→→ (R-(XY)).
Recall that (R-XY) is the same as R-(X U Y) = Z. Hence X→→Y implies X→→Z and
therefore it is sometimes written as X→→Y/Z.
The formal definition specifies that given a particular value of X, the set
of values of Y determined by this value of X is completely determined by X
alone and does not depend on the values of the remaining attributes Z of the
relation schema R. Hence whenever two tuples exist that have distinct values of Y
but the same value of X these values of Y must be related with every distinct value of
Z that occurs with that same value of X. This informally corresponds to Y being a
multi-valued attribute of the entities represented by tuple in R.
In Figure D (a) the MVDs ENAME→→PNAME and ENAME→→DNAME or
ENAME→→PNAME/DNAME hold in the EMP relation. The employee with ENAME
Smith works on project with PNAME ‘X’ and ‘Y’ and has two dependents with DNAME
John and ‘Anna’. If we stored only the first two tuples in EMP (< Smith’, ‘X’, ‘John’>
and <Smith’, ‘Y’ ‘Anna’> and <Smith’, ‘Y’, ‘John’>) to show that {‘X’, ‘Y’} and {John’,
‘Anna} are associated only ‘Smith’ that is there is no association between PNAME and
DNAME.
An MVD X→→Y in R is called a trivial MVD if (a) Y is a subset of X or (b) X U
Y=R, for example the relation EMP_PROJECTS in Figure D (b) has the trivial MVD
ENAME→→PNAME. An MVD that satisfies neither (a) nor (b) is called a nontrivial
MVD. A trivial MVD will hold in any relation instance r of R, it is called trivial
because it does not specify any constraint on R.
If we have a nontrivial MVD in relation, we may have to repeat values
redundantly in the tuples. In the EMP relation of Figure the values ‘X’ and ‘Y’ of
PNAME are repeated with each value of DNAME (or by symmetry, the values ‘John’
and ‘Anna’ of DNAME are repeated with each value of PNAME). This redundancy is
clearly undesirable. However; the EMP schema is in BCNF because no functional
dependencies hold in EMP. Therefore, we need to define a fourth normal form that is
stronger than BCNF and disallows relation sch4emas such as EMP. We first discuss
some of the properties of MVDs and consider how they are related to functional
dependencies.
1.8.8 Fourth Normal Form
We now present the definition of 4NF which is violated when a relation has
undesirable multi-valued dependencies, and hence can be used to identify and
decompose such relations. A relation schema R is in 4NF respect to a set of
dependencies F if for every nontrivial multi-valued dependency X→→Y in F+, X
is a super-key for R.
The EMP relation of Figure D (a) is not 4NF because in the nontrivial MVDs
ENAME→→PNAME and ENAME →→ DNAME, ENAME is not a super-key of EMP. We
120
BCA Sem-4 PAPER: BCAB2204T
decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS shown in Figure D(b).
Both EMP PROJECTS and EMP_DEPENDENTS are in 4NF, because
ENAME→→PNAME is a trivial MVD in EMP PROJECTS and ENAME→→DNAME is a
trivial MVD in EMP_DEPENDENTS. In fact no nontrivial MVDS hold in either
EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas
either.
To illustrate why it is important to keep relations in 4NF, Figure E(a) shows
the EMP relation with an additional employee Brown who has three dependents (‘Jim’
‘Joan’, and ‘Bob) and works on four different projects (‘W’, ‘X’, ‘Y’) and ‘Z’). There are
16 tuples in EMP in figure E(a). I few decompose EMP into EMP_PROJECTS and
EMP_DEPENDENTS as shown in Figure E(b) we need only store a total of 11 tuples in
both relations. More ever these tuples are much smaller than the tuples in EMP. In
addition the update anomalies associated with multi-valued dependencies are
avoided. For example, if Brown starts working on another project, we must insert
three tuples in EMP – one for each dependent. If we forgot to insert any one of those,
the relation becomes inconsistent in that is incorrectly implies a relationship between
project and dependent. However only a single tuple need be inserted in the 4NF
relation EMP_PROJECTS. Similar problems occur with deletion and modification
anomalies if a relation is not in 4NF.
The EMP relation in Figure D (a) is not in 4NF because it represents two independent 1: N
relationships—one between employees and the projects they work on and the other
between employees and their dependents. We sometimes have a relationship between
three entities that depends on all three participating entities, such as the SUPPLY relation
shown in Figure D (c) (Consider only the tuple in Figure D(c) above the dotted line for now).
In this case a tuple represents a supplier supplying a specific part to a particular project,
so there are no nontrivial MVDs. The SUPPLY relation is already in 4NF and should not be
decomposed. Notice that relations containing nontrivial MVDs tend to be all key relations;
that is, their key is all their attributes taken together.
Figure E:
121
BCA Sem-4 PAPER: BCAB2204T
EMP_DEPENDENTS
ENAME DNAME
122
BCA Sem-4 PAPER: BCAB2204T
1.8.9 Join Dependencies and Fifth Normal Form
We saw that Lj1 and Lj1’ give the condition for a relation schemas R to be
decomposed into two schemas R1 and R2 where the decompositions has the lossless
join property. However in some cases there may be no lossless join decomposition
into two relation schemas but there may be a lossless join decomposition into more
than two relation schemas. These cases are handled by join dependency and fifth
normal form. It is important to note that these cases occur very rarely and are
difficult to detect in practice.
A join dependency (JD) denoted by JD (R1, R2……….Rn) specified on relation
schema R, specifies a constraint on instances r of R. The constraint states that every
legal instance r of R should have a lossless join decomposition into R1, R2 ….. Rn that
is,
* ( Π<R1>(r), Π<R2>(r), …, Π<Rn>(r) ) = r
Notice that a MVD is a special case of a JD where n=2. A join dependency
JD(R1 R2,………Rn) specified on relation schema R, is a trivial if one of the relation
schemas Ri in JD(R1, R2…….Rn) is equal to R. Such dependency is called trivial
because it has the lossless join property for any relation instance r of R and hence
does not specify any constraint on R. We can now specify fifth normal form, which is
also called project join normal form. A relation schema R is in fifth normal form (5NF)
(or project join normal form (PJNF)) with respect to a set functional multi-valued and
join dependencies if for every nontrivial join dependency JD (R1, R2……..Rn) in F+
(that is implied by F) every R, is a super-key of R.
For an example of a consider once again the SUPPLY relation of Figure D (c). If it
does not have a lossless decomposition into any number of smaller tables. Suppose that
the following additional constraint always holds: Whenever a supplier supplies part p and
a project j uses part p and the supplies at least one part to project j, then supplier will also
be supplying part p to project j. This constraint can be restated in other ways and specifies
a join dependency JD (R1, R2, R3) among the three projections R (SNAME, PARTNAME),
R2 (SNAME, PROJNAME) and R3 (PARTNAME, PROJNAME) of supply. If this constraint
holds the tuples below the dotted line in Figure D (c) must exist in any legal instance of the
SUPPLY relation with the join dependency is decomposed into three relations R1, R2 and
R3 that are each in 5NF. Notice that applying NATURAL JOIN to any two of these relations
produces spurious tuples, but applying NATURAL JOIN to all three together does not. The
reader should verify this on the example relation of Figure D(c) and its projections in
Figure D(d). This is because only the JD exists but no MVDS are specified. Notice too that
the JD (R1, R2, R3) is specified on all legal relation instance not just on the one shown in
Figure D(c).
Discovering JDs in practical data based with hundreds of attributes is
difficult; hence current practice of data base design pays scant attention to them.
1.8.10 Overall Database design process
In Normalization we have assumed that we have a schema R, and proceeded
to normalize it. There are several ways in which we could have come up with the
schema R:
123
BCA Sem-4 PAPER: BCAB2204T
R could have been generated when converting an E-R Diagram to a set
of tables.
R could have been a single relation containing all the attributes that
are of interest. The normalization process breaks up R into smaller
relations.
R could have been the result of some ad hoc design of relations, which we then
test to verify that it satisfies a desired normal form.
No we examine the implications of these approaches and also the practical
issues in database design, including de-normalization for performance and example
of bad database design not detected by normalization.
E-R model and Normalization
We carefully define an E-R Diagram, identifying all entities correctly; the tables
generated from the E-R diagram should not need further normalization. However, there
can be functional dependencies between the attributes of an entity. For instance, suppose
an employee entity had attributes department-number and department-address, and there
is a functional dependency department-number → department-address. We would then
need to normalize the relation generated from employee.
Most examples of such dependencies arise out of poor E-R diagram design. In
the above example, if we did the E-R diagram correctly, we would have created a
department entity with attribute department-address and a relationship between
employee and department. Similarly, a relationship involving two or more than two
entities many not be in a desirable normal form, since most relationships are binary,
such cases are relatively rare. (In fact, some E-R diagram variants actually make it
difficult or impossible to specify non-binary relations.).
Functional dependencies can help us detect poor E-R design. If the generated
relations are not in desired normal form, the problem can be fixed in the E-R
diagram. That is normalization can be done formally as part of data modeling.
Alternatively, normalization can be left to the designer’s intuition during E-R
modeling and can be done formally on the relations generated from the E-R model.
De-normalization for Performance
Occasionally database designers choose a schema that has redundant
information; that is, it is not normalized. They use the redundancy to improve
performance for specific applications. The penalty paid for not using a normalized
schema is the extra work in terms of coding time and execution time) to keep
redundant data consistent.
For instance, suppose that the name of an account holder has to be displayed
along with the account number and balance every time the account is accessed. In
our normalized schema, this requires a join of account with depositor.
One alternative to computing the join on the fly is to store a relation containing all
the attribute of account and depositor. This makes displaying the account information
faster. However the balance information for an account is repeated for every person who
owns the account and all copies must be updated by the application, when ever the
account balance is updated. The process of taking a normalized schema and making it
124
BCA Sem-4 PAPER: BCAB2204T
non0normalized is called de-normalization, and designers use it to tune performance of
systems to support time-critical operations.
A better alternative, supported by many database systems today, is to use the
normalized schema, and additionally store the join or account and depositor as a
materialized view. (Recall that a materialized view is a view whose result is stored in
the database, and brought up to date when the relations used in the view are
updated.) Like de-normalization, using materialized view does have space and time
overheads; however, it has the advantage that keeping the view up to date is the job
of the database system, not the application programmer.
Other Design Issues
There are some aspects of database design that are not addressed by
normalization and can thus lead to bad database design. We give examples here
obviously, such designs should be avoided.
Consider a company database, where we want to store earnings of companies
in different years. A relation earnings (company-id, year, amount) could be used to
store the earnings information. The only functional dependency on this relation is
company-id, year→ amount, and the relation is in BCNF.
An alternative design is to use multiple relations, each storing the earnings for a
different year. Let us say the years of interest are 2000, 2001 and 2002; we would then
have relations of the form earnings-2000, earnings-2001, and earnings-2002, all of which
are on the schema (company-id, earnings). The only functional dependency here on each
relation would be company-id→ earnings so these relations are also in BCNF.
However this alternative design is clearly a bad idea—we would have to create a
new relation every year, and would also have to write new queries every year, to take each
new relation into account. Queries would also be more complicated since they may have to
refer to many relations.
Yet another way of representing the same data is to have a single relation
company-year (company-id, earnings-2000, earnings-2001, earnings-2002). Here the
only functional dependencies are from company-id to the other attributes, and again
the relation is in BCNF. This design is also a bad idea since it has problems similar to
the previous every year. Queries would also be more complicated, since they may
have to refer many attributes.
Representations such as those in the company-year relation with one column
for each value on an attribute, are called crosstab; they are widely used in
spreadsheets and reports and in data analysis tolls. While such representations are
useful for display to users, for the reasons just given, they are not desirable in a
database design SQL extensions have been proposed to convert data from a normal
relational representation to a crosstab, for display.
1.8.11 Summary
Normalization is a design technique that is widely used as a guide in designing
relational databases. It is a process of decomposing a relation into relation(s) with fewer
attributes by minimizing the redundancy of data and minimizing insertion, deletion and
updation anomalies. It may be defined as step by step reversible process of transforming an
unnormalized relation into relations with progressively simpler structures. The relation is in
125
BCA Sem-4 PAPER: BCAB2204T
first normal form if all the attribute values are atomic and non decomposable. A relation is in
2NF if it is in 1 NF and non key attributes should be fully functionally dependent on the
primary key. A relation is in 3 NF if it is 2 NF and non key attributes should not be transitively
functionally dependent on Primary key. A relation is in BCNF if and only if every determinant
is a candidate key. A relation is 4 NF if it is in BCNF and it contains no multivalued
dependencies. And finaly a relation is in 5NF or Project Join Normal form if it cannot have a
lossless decomposition into any number of smaller tables.
1.8.12 Keywords
Multi-valued dependencies: is a concept in relational database theory that extends
the idea of functional dependencies. While functional dependencies express
relationships between attributes within a single tuple (row), multi-valued
dependencies describe relationships between sets of attributes across multiple
tuples.
Lossless Decomposition: Lossless decomposition is a property in database
normalization, specifically in the context of decomposing a relation (table) into
multiple smaller relations while preserving the ability to reconstruct the original
relation through a join operation.
1.8.13 Short Answer Type Questions
Q1. Does every relation having two attributes satisfy Boye Codd Normal form? If
Yes, justify your answer giving suitable example.
Q2. What do you mean by Normalization? Why there is a need for normalization?
Q3. Define Join Dependency with example.
Q4. Define Multi-valued dependency giving example.
1.8.14 Long Answer Type Questions
Q1. Explain First, Second and third Normal Forms with the help of examples.
Q2. Explain Boyce-Codd Normal Form with example. How it is different from
3rd Normal form.
Q3. Explain Fourth Normal form with example.
Q4. Explain fifth Normal form using Join Dependency using suitable example.
1.8.15 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.
126
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems
1.9.0 Objectives
1.9.1 Introduction
1.9.2 Database Integrity
1.9.3 Domain Constraints
1.9.4 Referential Integrity
[Link] Referential Integrity and ER Model
[Link] Database Modification
[Link] Referential Integrity in SQL
1.9.5 Assertions
1.9.6 Database Recovery
1.9.7 ACID Properties
1.9.8 System Recovery
1.9.9 Summary
1.9.10 Keywords
1.9.11 Short Answer Type Questions
1.9.12 Long Answer Type Questions
1.9.13 Suggested Readings
1.9.0 Objectives
After completing this lesson, you will be able to:
• Understand database Integrity
• Understand database recovery
1.9.1 Introduction
After completing the database we need to take measures for protecting the database.
For protecting the database we have to take care of database integrity and in the
coming section we will study the various methods for maintaining the integrity of the
database. Database Protection also includes data recovery that means if database get
corrupted due to some reasons like Hard Disk failure or other reasons – how to
recover the database.
127
BCA Sem-4 PAPER: BCAB2204T
The term integrity refers to the correctness or accuracy of data in database. Integrity
constraints ensure that changes made to the database by authorized users do not result
in a loss of data consistency. Thus integrity constraints guard against accidental damage
to the database. We have already seen two forms of integrity constraints: key
declarations – the stipulation that certain attributes form a candidate key for a given
entity set.
• Form of a relationship- many to many, one to many, one to one. In
general, an integrity constraint can be an arbitrary predicate pertaining to
the database. However arbitrary predicates may be costly to test. Thus we
concentrate on integrity constraints that can be tested with minimal
overhead. In addition to protecting against accidental introduction of
inconsistency, the data stored in the database needs to be protected from
unauthorized access and malicious destruction or alteration.
128
BCA Sem-4 PAPER: BCAB2204T
programmer error, where the programmer forgot about the differences in currency.
Declaring different domains for different currencies helps catch such errors.
Values of one domain can be cast (that is, converted) to another domain. If the
attribute A in relation r is of type Dollars, we can convert it to Pounds by writing
cast r.A as Pounds
In a real application we would of course multiply r.A by a currency conversion
facts before casting it to pounds. SQL also provides drop domain and after domain
clauses to drop or modify or modify domains that have been created earlier.
The check clause in SQL permits domains to be restricted in powerful ways
that most programming language type systems do not permit. Specifically the check
clause permits the schema designer to specify a predicate that must be satisfied by
any value assigned to a variable whose type is the domain. For instance a check
clause can ensure that an hourly wage domain allows only values greater than a
specified value (such as the minimum wage):
create domain HourlyWage numeric(5,2)
constraint wage-value-test check (value >=4.00)
The domain HourlyWage has constraint that ensures that the hourly wage is greater
than 4.00. The clause constraint wage-value-test is optional, and is used to give the name
wage-value-test to the constraint. The name is used to indicate which constraint an update
violated.
The check can also be used to restrict a domain to not contain any null
values:
create domain AccountNumber char(10)
constraint account-number-test check (value not null)
Another example, the domain can be restricted to contain only a specified set
of values by using the in clause:
create domain Account type char(10)
constraint account-type-test
check (value in (‘Checking’, ‘Saving’))
The preceding check conditions can be tested quite easily when a tuple is
inserted or modified. However in general the check conditions can be more complex
(and harder to check), since sub queries that refer to other relations are permitted in
the check condition. For example this constraint could be specified on the relation
deposit.
check (branch-name in (select branch-name from branch))
The check condition verifies that the branch-name in each tuple in the deposit
relation is actually the name of a branch in the branch relation. Thus the condition has to
be checked not only when a tuple is inserted or modified in deposit but also when the
relation branch changes (in this case, when a tuple is deleted or modified in relation
branch).
The preceding constraint is actually an example of a class of constraints called
referential-integrity constraints.
Complex check conditions can be useful when we want to ensure integrity of
data but we should use them with care, since they may be costly to test.
1.9.4 Referential Integrity
129
BCA Sem-4 PAPER: BCAB2204T
Often, we wish to ensure that a value that appears in one relation for a given
set of attributes also appears for a certain set of attributes in another relation. This
condition is called referential integrity
Basic Concepts
Consider a pair of relations r(R) and s(S) and the natural join r s. There may
be a tuple tr in r that does not join with any tuple in s. That is, there is no ts in s
such that tr [R ∩ S ] = ts [R ∩ S]. Such tuples are called dangling tuples. Depending on
the entity set or relationship set being modeled dangling tuples may or may not be
acceptable.
Suppose there is a tuple t1 in the account relation with t1[branch-name] =
"Lunartown" but there is no tuple in the branch relation for the “Lunartown” branch.
This situation would be undersirable. We expect the branch relation to list all bank
branches. Therefore tuple t1 would refer to an account at a branch that does not
exist. Clearly we would like to have an integrity constraint that prohibits dangling
tuples of this sort.
Not all instances of dangling tuples are undersirable however. Assume that there is a
tuple t2 in the branch relation with t2 [branch-name] = “Mokan” but there is no tuple in the
account relation for the Mokan branch. In this case a branch exists that has no accounts.
Although this situation is not common it may arise when a branch is opened or its about to
close. Thus we do not want to prohibit this situation.
The distinction between these two examples arises from two facts :
• The attribute branch-name in Account schema is a foreign key
referencing the primary key of Branch schema.
• The attribute branch name in Branch schema is not a foreign key. (Recall
that a foreign key is a set attribute in a relation schema that forms a
primary key for another schema.)
In the Lunartown example, tuple t1 in account has a value on the foreign key branch-
name that does not appear in branch. In the Mokan-branch example tuple t2 in branch has a
value on branch-name that does not appear in account, but branch-name is not a foreign key.
Thus the distinction between our two examples of dangling tuples is the presence of a foreign
key.
Let r1 (R1) and r2 (R2) be relations with primary keys K1 and K2 respectively. We
say that a subset α of R2 is a foreign key referencing K1 in relation r1 if it is required
that for every t2 in r2 there must be a tuple t1 in r1 such that t1 [K1] = t2 [α].
Requirements of this form are called referential integrity constraints or subset
dependencies. The latter term arises because the preceding referential-integrity
constraint can be written as Πα (r2) ≤ ΠK1(r1). Note that for a referential-integrity
constraint to make sense either must be equal to K1 or α and K1 must be compatible
sets of attributes.
130
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-I
Q1. What are dangling tuples?
Ans…................................................................................................................
…......................................................................................................................
…......................................................................................................................
Q2. What is check clause?
Ans…................................................................................................................
…......................................................................................................................
…......................................................................................................................
E1
E2
R
.
.
.
En-1
En
131
BCA Sem-4 PAPER: BCAB2204T
• Insert. If a tuple t2 is inserted into r2, the system must ensure that
there is a tuple t1 in r1 such that t1 [K] = t2[α] :
t2[α] ε Πk (r1)
• Delete. If a tuple t1 is deleted from r1 the system must compute the set
of tuples in r2 that reference t1:
σα = t1[k] (r2)
If this set is not empty, either the delete command is rejected as an error, or
the tuples that reference t1 must themselves be deleted. The latter solution may lead
to cascading deletions, since tuples may reference tuples that reference t1 and so on.
Update : We must consider two cases for update: updates to the referencing relation
and updates to the referenced relation (r1).
❑ If a tuple t2 is updated in relation r2 and the update modifies values for
the foreign key then a test similar to the insert case is made. Let t2'
denote the new value of tuple t2. The system must ensure that
t2'[a] ε Πk (r1)
❑ If a tuple t1 is updated in r1 and the update modifies values for the
primary key (K), then a test similar to the delete case is made. The
system must compute.
σα =t1[K] (r2)
Using the old value of t1 (the value before the update is applied). If this set is not
empty, the update is rejected as an error or the update is cascaded in a manner
similar to delete.
132
BCA Sem-4 PAPER: BCAB2204T
Because of the clause on delete cascade associated with the foreign key
declaration if a delete of a tuple in branch results in this referential integrity
constraint being violated the system does not reject the delete. Instead the delete
"cascades" to the account relation, deleting the tuple that refer to the branch tuple
that was deleted. Similarly, the system does not reject an update to a field referenced
by the constraint even if it violates the constraint; instead the system updates the
field branch-name of the referencing tuples in account to the new value as well. SQL
also allows the foreign key clause to specify actions other than cascade, if the
constraint is violated. The referencing field (here, branch-name) can be set to null (by
using set null in place of cascade), or to the default value for the domain (by using set
default).
If there is a chain of foreign key dependencies across multiple relations, a deletion or
update at one end of the chain can propagate across the entire chain. An interesting
case where the foreign key constraint on a relation references the same relation
appears in Exercise. If a cascading update or delete cause a constraint violation that
cannot be handled by a further cascading operation, the system aborts the
transaction. As a result, all the changes caused by the transaction and its cascading
actions are undone.
create table customer
(customer-name char(20)
customer-street char(30)
customer-city char(30)
primary key (customer-name))
create table branch
(branch-name char(15)
branch-street char(30)
assets integer
primary key (branch-name)
check (assets >=0))
create table account
(account-number char(10)
branch-name char(15)
balance integer
primary key (account-number)
foreign key (branch-name) references branch,
check (balance >=0))
create table depositor
(customer-name char(20)
account-number char(10)
primary key (customer-name, account-number)
foreign key (customer-name) references customer,
foreign key (account-number) references account)
133
BCA Sem-4 PAPER: BCAB2204T
Null values complicate the semantics of referential integrity constraint in SQL.
Attributes of foreign keys are allowed to be null, provided that they have not other
wise been declared to be non-null. If all the columns of a foreign key are non-null in a
given tuple, the usual definition of foreign key constraint is used for that tuple. If any
of the foreign key columns is null, the tuple is defined automatically to satisfy the
constraint.
This definition may not always be the right choice, so SQL also provides
constructs that allow you to change the behavior with null values, we do not discuss
the constructs here. To avoid such complexity, it is best to ensure that all columns of
a foreign key specification are declared to be non-null.
Transactions may consist of several steps, and integrity may be violated
temporarily after one step but a later step may remove the violation. For instance,
suppose we have a relation-married person with primary key-name, and an attribute
spouse, and suppose that spouse is a foreign key on married person. That is the
constraint says that the spouse attribute must contain a name that is present in the
person table. Suppose we wish to note the fact that John and Mary are married to
each other by inserting two tuples one for John and one for Mary in the above
relation. The insertion of the first tuple violate the foreign key constraint, regardless
of which of the two tuples is inserted first. After the second is inserted the foreign key
constraint would hold again.
To handle such situations integrity constraints are checked at the end of a
transaction and not at intermediate steps.
1.9.5 Assertions
An Assertion is a predicate expressing a conditions that we wish the database
always to satisfy. Domain constraint and referential integrity constraints are special
forms of assertions. We have paid substantial attention to these forms of assertions
because they are easily tested and apply to a wide range of database applications.
However there are many constraints that we cannot express by using only these
special forms. Two examples of such constraints are:
• The sum of all loan amounts for each branch must be less than the
sum of all account balances at the branch.
• Every has at least one customer who maintains an account with a
minimum balance of $1000.00.
An assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
Here is how the two examples of constraints can be written. Since SQL does
not provide a "for all X P (X)" construct (where P is a predicate) we are forced to
implement the construct by an equivalent "not exists X" such that not P(X) construct
which can be written in SQL. We write
create assertion sum-constraint check
(not exists (select * from branch
where (select sum (amount) from loan
where loan. branch-name = [Link]-name)
>=(select sum (balance) from account
where [Link]-name = branch,branch-name)))
134
BCA Sem-4 PAPER: BCAB2204T
135
BCA Sem-4 PAPER: BCAB2204T
BEGIN TRANSACTION;
INSERT INTO SP
RELATION { TUPLE {S# S# (‘S5’),
P# P# (‘P1’),
QTY QYT (1000) } } ;
IF any error occurred THEN GO TO UNDO; END IF;
COMMIT;
GO TO FINISH;
UNDO:
ROLLBACK;
FINISH:
RETURN;
The point of the example is that what is presumably intended to be a single
atomic operation- “add a new shipment”- in fact involves two updates to the
database, one INSERT operation and one UPDATE operation. What is more, the
database is not even consistent between those two updates; it temporarily violates
the constraint that the value of TOTQTY for part P1 is supposed to be equal to the
sum of all QTY values for part P1. Thus a logical unit of work (i.e., a transaction) is
not necessarily just a single database operation; rather, it is a sequence of several
such operations, in general that transforms a consistent state of the database into
another consistent state, without necessarily preserving consistency at all
intermediate points.
Now, it is clear that what must not be allowed to happen in the example is for
one of the updates to be executed and the other not, because that would leave the
database in an inconsistent state. Ideally of course we would like a cast iron
guarantee that both updates will be executed. Unfortunately, it is impossible to
provide such a guarantee-there is always a chance that things will go wrong, and go
wrong moreover at the worst possible moment. For example, a system crash occur
between the INSERT and the UPDATE, or an arithmetic overflow might occur on the
UPDATE, etc. But a system that support transaction management does provide the
next best thing to such a guarantee. Specifically, it guarantees that if the transaction
reaches some updates and then a failure occurs (for whatever reason) before the
transaction reaches its planned termination, then those updates will be undone.
Thus the transaction either executes in its entirety or is totally canceled i.e. made as
if it never executed at all. In this way, a sequence of operations that is fundamentally
not atomic can be made to look as if it were atomic from an external point of view.
The system component that provides this atomicity- or resemblance of
atomicity- is known as the transaction manager (also known as the transaction
136
BCA Sem-4 PAPER: BCAB2204T
processing monitor or TP monitor ) and the COMMIT and ROLLBACK operations are
the keep to the way it works;
• The COMMIT operation signals successful end of transaction; it tells
the transaction manager that a logical unit of work has been
successfully completed and the database is (or should be) in a
consistent state again and all of the updates made by that unit of work
can now be committed or made permanent.
• By contrast the ROLLBACK operation signals unsuccessful end of
transaction; it tells the transaction manager that something has gone
wrong. The database might be in an inconsistent state and all of the
updates made by the logical unit of work so far must be rolled back or
undone.
In the example therefore we issue a COMMIT if we get through the two
updates successfully which will commit the changes in the data base and make them
permanent. If any thing goes wrong however- i.e., if either of the updates raises an
error condition- then we issue a ROLLBACK instead to undo any changes made so
far. Note: Even if we issue a commit instead the system should in principal check the
database integrity constraint. It detects the fact that the database is inconsistent and
force a ROLLBACK any way. However we don’t assume that the system is aware of all
pertinent constraint and so the users issued ROLLBACK is necessary. Commercial
DBMSs do not do very much COMMIT time integrity checking at the time of writing.
Incidentally we should point out that a realistic application will not only
update the database (or attempt to) but will also send some kind of message back to
the end user indicating what has happened. In the example we might send the
message shipment added if the COMMIT is reached or the message error shipment
not added otherwise. Message handling in turn has additional implications for
recovery.
Note: At this juncture you might be wondering how it is possible to undo and
update. The answer of course is that the system maintains a log or journal on tape or
(more commonly) disk on which details of all updates- in particular before and
images of the updated objects- are recorded. Thus, if it becomes necessary to undo
some particular update the system can use the corresponding log entry to restore the
updated object to its previous value.
(Actually the fore going paragraph is somewhat over simplified . In practice
the log will consist of two portions an active or online portion and an archive or
offline portion. The online portion is used during normal system operation to record
details of updates as they are performed and is normally held on disk. When the
online portion becomes full its contents are transferred to the offline portion which-
because it is always processed sequentially- can be held on the tape.
One further point; the system must guarantee that individual statements are
themselves atomic (all or nothing). This consideration becomes particularly
significant in relational system, where statements are set-level and typically operate
on many tuples at a time; it must not be possible for such a statement to fail in the
middle and leave the database in an inconsistent state (e.g. with some tuples update
137
BCA Sem-4 PAPER: BCAB2204T
and some not). In other words if an error does occur in the middle of such a
statement, then the database must remain totally unchanged.
• Transaction Recovery
A transaction begins with successful execution of a BEGIN TRANSACTION
statement and it ends with successful execution of either COMMIT or a ROLLBACK
statement. COMMIT establishes what is called, among many other things, a commit point
(also especially in commercial products-known as a synch point). A commit point thus
corresponds to the end of a logical unit of work, and hence to a point at which the
database is or should be in a consistent state. ROLLBACK, by constraint rolls the
database back to the state it was in at BEGIN TRANSACTION which effectively means
back to the previous commit point. (The phrase “the previous commit point” is still
accurate, even in the case of the first transaction in the program, if we agree to think of he
first BEGIN TRANSACTION in the program as tacitly establishing an initial “ commit
point”.
Note: Throughput this section the term “database” really means just that portion of
the database being accessed by the transaction under consideration. Other
transactions might be executing in parallel with that transaction and making
changes to their own portions, and so “the total database” might not be in a fully
consistent state at a commit point. However we are ignoring the possibility does not
materially affect the issue at hand, of course.
When a commit point is established:
All updates made by the executing program since the previous commit
points are committed; that is, they are made permanent. Prior to the
commit point, all such updates should be regarded as tentative only—
tentative in the sense that they might subsequently be undone (i.e.
rolled back). Once committed an update is guaranteed never to be
undone (this is the definition of “ committed”).
All database positioning is lost and all tuple locks are released.
“Database poisoning” here refers to the idea that at any given time an
executing program will typically have address ability to certain tuples
(e.g., via certain cursors in the case of SQL, this address ability is lost
at a commit point. “Tuple locks” are explained in the next chapter. Note
some systems do provide an option by which the program in fact might
be able to retain address ability to certain tuples (and therefore retain
certain tuple locks) from one transaction to the next.
Paragraph 2 here – excluding the remark about possibly retaining some
address ability and hence possibly retaining certain tuple locks—also applies if a
transaction terminates with ROLLBACK instead of COMMIT. Paragraph 1 of course
does not.
138
BCA Sem-4 PAPER: BCAB2204T
Self-Check Exercise-II
Q3. What is transaction recovery?
Ans…..................................................................................................................
…........................................................................................................................
…........................................................................................................................
Q2. What is the difference between Rollback and commit?
Ans…..................................................................................................................
…........................................................................................................................
…........................................................................................................................
1st TRANSACTION
BEGIN ROLLBAC
TRANSACTION K
3rd TRANSACTION
Note carefully that COMMIT and ROLLBACK terminate the transaction, not
the program. In general a single program execution will consist of a sequence of
several transactions running one after another, as illustrated in Figure above. Now let
us return to the example of the previous section. In that example we include explicit
tests for errors, and issued an explicit ROLLBACK if any error was detected. But of
course the system cannot assume that application programs will always include
explicit tests for all possible errors. Therefore the system will issue an implicit
ROLLBACK for any transaction that fails for any reason to reach its planned
termination (where “planned termination” means either an explicit COMMIT or an
explicit ROLLBACK).
We can now see therefore, that transactions are not only the unit of works but
also the unit of recovery. For if a transaction successfully commits, then the system
will guarantee that its updates will be permanently installed in the database, even if
the system crashed the very next moment. It is quite possible, for instance, that the
139
BCA Sem-4 PAPER: BCAB2204T
system might crash after the COMMIT has been honored but before the updates have
been physically written to the database- they might still be waiting in a main memory
buffer and so be lost at the time of crash. Even if that happens the system’s restart
procedure will still install those updates in the database; it is able to discover the
values to be written by examine the relevant entries in the log. (it follows that the log
must be physically written before COMMIT processing can complete- the write ahead
log rule.) Thus the restart procedure will recover any transactions that completed
successfully but did not manage to get their updates physically written prior to the
crash; hence as stated earlier transaction are indeed the unit of recovery.
Note: In the next chapter we will see there is a unit on concurrency also.
Further since they are supposed to transform a consistent state of the database in to
another consistent state they can also be regarded as a unit of integrity.
1.9.7 The ACID Properties
Transactions have four important properties- atomicity, consistency, isolation and
durability (referred to colloquially as “the ACID properties”)
• Atomicity: Transaction are atomic (all or nothing)
• Consistency: Transaction preserves database consistency. That is a
transaction transforms a consistent state of the database in to another
consistent state without necessarily preserving consistency at all
intermediate points.
• Isolation: Transactions are isolated from one another. That is even
though in general there will be many transactions running
concurrently at any given transaction updates are concealed from all
the rest until that transaction commits. Another way of seeing the
same thing of that for any two distinct transactions T1 and T2, T1
might see T2’s updates (after T2 has committed) or T2 might see T1’s
updates (after T1 has committed ) but certainly not both.
• Durability: Once a transaction commits it updates survive in a
database even if there is subsequent system crash.
1.9.8 System Recovery
The system must be prepared to recover not only from purely local failures
such as occurrence of an over flow condition with in an individual transaction but
also from “Global” failures such as power outage. A local failure by definition effects
only the transaction in which the failure has actually occurred. A global failure, by
contrast, affects all of the transactions in progress at the time of failure and hence
has significant system wide implications. In this section and the next we briefly
consider what is involved in recovering from a global failure. Such failures fall in to
two categories :
• System failures: (e.g., power outage), which effect all transactions, currently
in progress but do not physically damage the database. A system failure is
sometimes called a soft crash.
• Media failures: (e.g. head crash on disk), which do cause damage to
the database or to some portion of it and effect at least those
transactions currently using that portion. A media failure is sometimes
called a hard crash.
140
BCA Sem-4 PAPER: BCAB2204T
The key point regarding system failure is that the contents of main memory
are lost (in particular the database buffers are lost). The precise state of any
transaction can therefore never been successfully completed and so must be undone-
i.e. rolled back- when the system restarts.
Further more it might also be necessary to re do certain transactions at restart
time that did successfully complete prior to the crash but did not manage to get their
updates transferred from the database buffers to the physical database.
The obvious question therefore arises; how does the system know at restart
time which transactions to undo and which to redo? The answer is as follows, at
certain prescribed intervals typically whenever some prescribed numbers of entries
have been written to the log- the system automatically take a check point. Taking a
check point involves (a.) Physically writing “(force writing”) the content of the
database buffers out to the physical database and (b) physically writing a special
check point record out to the physical log. The check point record gives a list of all
transactions that were in progress at the time the check point was taken. To see how
this information is used consider the following Figure which is read as follows(note
that time in the fig. Flows from left to right)
• A system failure has occurred at time tf.
• The most recent check point prior to time tf was taken at a time tc.
• Transaction of type t1 completed prior to time tc.
• Transaction of type T2 started prior to time tc and completed after time
tc and before time tf.
• Transaction of type T3 also started prior to time tc but did not complete
by time tf.
• Transaction of type T4 started after time tc and completed before time
tf.
• Finally transaction of type T5 also started after time tc but did not
complete by time tf.
Time tc tf
T
r T1
a
n T2
s
a
T3
c
t
i T4
o
n T5
s
141
BCA Sem-4 PAPER: BCAB2204T
It should be clear that when the system is restarted transaction of type T3 and
T5 must be undone, and transaction of types T2 and T4 must be redone. Note
however that transactions of type T1 do not enter in to the restart process at all
because its updates were forced to the database at time tc as part of the check point
process. Note two that transaction that completed unsuccessfully (i.e. with the
rollback) before time tf also do not enter into the restart process at all(why not?).
At restart time therefore the system first goes through the following procedure
in ordered to identify all transaction of types T2 to T5 :
Start with two lists of transactions the undo list and the redo list. Set the
undo list equal to the list of all transactions given in the most recent check
point record; set the redo list is empty.
Search forward through the log starting from the check point record.
If a BEGIN TRANSACTION log entry is found for transaction T add T to
the undo list.
If COMMIT log entries found for transaction T move T from the UNDO
list to the REDO list.
When the end of log is reached the UNDO and REDO list, identify
respectively transactions of types T3 and T5 and transaction of types
T2 and T4.
The system now works backward through the log undoing the transactions in
the UNDO list; then it works forward again redoing in the transaction in the REDO
list. Note: Restoring the database to consistent state by undoing work is some times
called backward recovery. Similarly restoring it to a consistent state by redoing work
is some times called forward recovery.
Finally when all such recovery activities are complete, then (and only then) the
system is ready to accept new work.
• Media Recovery
A media failure is a failure such as a disk head crash or a disk controller
failure in which some portion of a database has been physically destroyed. A recovery
from such a failure basically involves reloading (or restoring) the database from a
backup copy (or dump) and then using the log; both active and achieve portions in
general – to redo all transactions that completed since that backup copy was taken.
There is no need to undo transactions that were still in progress at the time of the
failure since by definition all updates of such transactions have been undone
(actually lost) any way.
The need to be able to perform media recovery implies the need for a
dump/restore (or unload/reload) utility. The dump portion of that utility is used to
make backup copies of the database on demand. (such copy can be kept on tape or
other archival storage; it is not necessary that they be on direct access media ). After
a media failure the restore portion of the utility is used to recreate the database from
a specified backup copy.
1.9.9 Summary
The term integrity refers to the correctness or accuracy of data in database.
Integrity constraints ensure that changes made to the database by authorized users do not
result in a loss of data consistency. In general an integrity constraint can be an arbitrary
142
BCA Sem-4 PAPER: BCAB2204T
predicate pertaining to the database. Domain constraints are the most elementary form of
integrity constraint. Often, we wish to ensure that a value that appears in one relation for
a given set of attributes also appears for a certain set of attributes in another relation. This
condition is called referential integrity. Recovery in database system means, primarily,
recovering the database itself: that is, restoring the database to a state that is known to be
correct (or rather, consistent) after some failure has rendered the current state
inconsistent. Transactions have four important properties- atomicity, consistency, isolation
and durability. The system must be prepared to recover not only from purely local failures
such as occurrence of and over flow condition with in an individual transaction but also
from “Global” failures such as power outage. A media failure is a failure such as a disk
head crash or a disk controller failure in which some portion of a database has been
physically destroyed.
1.9.10 Keywords
ACID: Transaction properties Atomicity, Consistency, Isolation and Durability.
Transaction: It refers to a logical unit of work that is performed on a database.
Domain: A universe of discourse which defines the data in the problem
1.9.11 Short Answer Type Questions:
Q1 What do you understand by data integrity? Explain various types of
integrity constraints along with suitable example.
Q2. What do you understand by database recovery? Explain various types of
recovery techniques.
Q3. What do you understand by a transaction? Explain the ACID properties of
transactions.
1.9.12 Long Answer Type Questions:
Q1. What is refrential integrity?
Q2. Consider the schema:
employee(employee-name, street, city)
works(employee-name, company-name, salary)
company(company-name, city)
manages(employee-name, manager-name)
Give an SQL DDL definition for the tables of this database. Identify referential-
integrity constraints that should hold and include them in the DDL definition
1.9.13 Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.
143
BCA Sem-4 PAPER: BCAB2204T
[Link]
144