0% found this document useful (0 votes)

27 views147 pages

RDBMS Section A

The document outlines the syllabus for the BCA-Part-II Semester IV course on Relational Database Management Systems at Punjabi University, Patiala, detailing the course objectives, learning outcomes, and structure of the examination. It covers essential topics such as database architecture, entity relationship models, normalization, and SQL operations. Additionally, it highlights the limitations of traditional file processing systems and the advantages of using a database management system.

Uploaded by

laxmanagarwal55748

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views147 pages

RDBMS Section A

Uploaded by

laxmanagarwal55748

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Centre for Distance and Online Education

Punjabi University, Patiala

Class : BCA-Part-II Semester : IV

Paper : BCAB2204T (Relational Database Management
Systems)
Medium : English Unit : I

Lesson No.

1.1 Introduction
1.2 Database Architecture
1.3 Entity Relationship Model
1.4 Relational Data Model
1.5 Data Models, Keys and Languages
1.6 Relational Algebra
1.7 Data Base Design
1.8 Normalization
1.9 Data Base Integrity and Recovery

Website : [Link]
Syllabus

BCAB2204T: Relational Database Management Systems

Max Marks: 70 Maximum Time: 3 Hrs

Min Pass Marks: 35%

A) Instructions for paper-setter

The question paper will consist of three sections A, B & C. Sections A & B will have
four questions from the respective sections of the syllabus and will carry 30% marks
each. Section C will have 6-12 short answer type questions which will cover the entire
syllabus uniformly and will carry 40% marks in all.

B) Instructions for candidates

1. Candidates are required to attempt two questions each from sections A & B of
the question paper and the entire section C.
2. Use of non-programmable scientific calculator is allowed.

Course Objective:
● To give fundamental knowledge database and management system.
● To explain the basic concepts of architecture of database.
● To make the learners acquainted with the use of data management issues .

Learning Outcome:
● Understanding the core terms, concepts, and tools of relational database
management systems.
● Understanding database design and logic development for database
programming.
● Apply SQL or Relational Algebra operations to find solutions for a given
application
● Apply normalization techniques to improve database design

SECTION-A
Database Management Systems: Definition, Characteristics, Advantages of Using
DBMS Approach and disadvantages of traditional file environment systems, Three
Schema Architecture, Data Independence – Physical and Logical data Independence,
Database Administrators and its responsibilities.
Introduction to ER Model: Weak Entity Sets, Strong Entity Sets, mapping
cardinalities, generalization, specialization, aggregation.
Relational Database [RDBMS]: The Relational Database Model, Concepts and
Terminology, Characteristics of Relations, CODD’s 12 rules for a fully RDBMS.
Constraints: Integrity Constraints- Entity, Domain and Referential Integrity
constraints, Business Rules, Keys - Super Keys, Candidate Keys, Primary Keys,
Secondary Keys and Foreign Keys.
Relational Algebra: Basic Operations, Additional Operations, Example Queries.
Normalization: Functional Dependency, Full Functional Dependency, Partial
Dependency, Transitive Dependency, Normal Forms.
SECTION-B
Transaction Management: Transaction Concept, ACID Properties, Transaction
States. Database Concurrency: Problems of Concurrent databases, Serializability
and Recoverability, Concurrency Control Methods - Two Phase Locking,
Timestamping. Database Recovery: Recovery Concepts, Recovery Techniques-
Deferred update, Immediate Update, Shadow Paging.
Introduction to Oracle: Oracle as client/server architecture, getting started, creating,
modifying, and dropping databases. Tables - Inserting, updating, deleting data from
databases, SELECT statement, Data constraints (Null values, Default values,
primary, unique and foreign key concepts), Queries for Relational Algebra.
Computing expressions, renaming columns, logical operators, range searching,
pattern matching, Oracle functions, grouping data from tables in SQL, manipulating
dates.
Working with SQL: triggers, use of data base triggers, types of triggers, how to apply
database triggers, BEFORE vs. AFTER triggers, combinations, syntax for creating
and dropping triggers.

Text/ Reference Books:

1. B.P. Desai, “Database management system” BPB publications, New Delhi.
2. C.J. Date, "An Introduction to Data Base Systems”, Narosa Publishers
3. Jeffrey D. Ullman, "Principles of Database Systems", Galgotia Pub.
4. D. Kroenke, "Database Processing", Galgotia Publications.
5. Henry F. Korth, “Database System Concepts”, McGraw Hill. Inc.
6. Naveen Prakash, “Introduction to Database Management”, TMH
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.1 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

Introduction

1.1.0 Objectives
1.1.1 Introduction
1.1.2 Data and Information
1.1.3 Traditional File Processing System
1.1.4 Limitations of File Processing System
1.1.5 Database and Database System
1.1.6 Components of database system
1.1.7 Database Management System
1.1.8 Advantages of DBMS over Traditional File Processing System
1.1.9 Summary
1.1.10 Keywords
1.1.11Short Answer Type Questions
1.1.12 Long Answer Type Questions
1.1.13 Suggested Readings

1.1.0 Objective
After completing this lesson, you will be able to:
• Define data and information
• Discuss traditional file processing system, its characteristics and
limitations
• Define database and explain its components

1.1.1 Introduction:
Data is termed as the collection of meaningful unorganized facts, concepts or
instructions. Information is processed form of data. Traditional File Processing
System is a method for storing and organizing data stored in computer files. But
Traditional File Processing has significant disadvantages of data redundancy, lack
of flexibility, data dependency etc.
In this lesson, we first provide the formal definitions of data, information, and
traditional file processing system. Then we define the limitations of traditional file
processing system and finally discuss the concept of database

1.1.2 Data and Information

1
BCA Sem-4 PAPER: BCAB2204T

Data is defined as a collection of meaningful facts which can be stored and

processed by computer or humans. The main examples of data are names, phone
numbers, marks, age of employees etc.
Information is processed form of data which helps us in making decisions. For
examples: the roll no and marks of all the students is the data and when it
processed and report cards of the students are prepared for taking the decision that
which student stood first, second and third in the class , then it becomes the
information.
Pictorially we can show it as:

Processing
DATA INFORMATION

The term data refers to factual information especially that used for analysis and
based on reasoning or calculation. Data itself has no meaning, but becomes
information when it is interpreted. Information is a collection of facts or data that
is communicated. However, in many contexts they are considered and are used as
synonyms. Data, by the way, is the plural of datum. Information comes from Latin
informationem 'concept, idea' or 'outline'. The relationship between data and
information works this way:

Data = the facts about a topic.

Information = evaluated data used to answer a question.

1.1.3 Traditional File Processing System:

Let’s take the example of our university to explain this system. Our objective is to
computerize the records of various departments of the university. By not going so
deep into the records, we just try to computerize the records of employees of the
university. In the past system, manually records were stored on the registers. Now
in this system, we are using computer for storing the records of employees. For
storing the records, we use files in computer. Each file has a specified format for
storing the file. Suppose we are only storing the name, date of birth, date of joining,
department, permanent address, correspondence address of employee in the file.
So, all these are fields of a record. The information of all fields for a particular
employee will become a record of a particular employee in that file. All the fields are
separated by a tab in a file and one record is stored in a line in file.
Thus, for example following will be the format of the file containing the records of
the university employees maintained by the establishment department of the
university:
[Link]. Name DOB DOJ Department Permanent Address Correspondence
address
1. Vishal 23-11-1977 23-02-2005 Peeran wala mohalla, ferozepur city
Lecturer (CS), Dept of Computer Sc., [Link]
…….

2
BCA Sem-4 PAPER: BCAB2204T

…….
…….
In the similar manner, Department of Computer Science (DCS) will also maintain
the record if its employee for recording the day to day activities of the employees.
Thus this file maintained by DCS will also contain the name, DOB, DOJ,
permanent address, correspondence address and other field specific to department.
Formally, a record is a collection of related fields that are treated as a single unit.
Further all the records are grouped together to form a file. All the related files are
grouped together to form a database. Thus, in the above example, all the files of the
university are interrelated to each other, when grouped together they form database
of the university. The database of the university consists of account file, student
record file, employee record file etc.
So, Traditional file processing system has for each application a separate master
file and its own set of personal files. Such file based approaches which came into
being with the first commercial application of computers did provide an increased
efficiency in the data processing compared to earlier manual paper record-based
systems as the demand for efficiency and speed increased, the computer-based
simple file oriented approach to information processing started suffering from
number of limitations which is explained in the following section.

1.1.4 Limitations of Traditional File Processing System

Following are the limitations of traditional file processing system:
1. Data Redundancy: By ‘Data Redundancy’ we mean the duplication of same
data at various places. In Traditional File Based Systems, each application
has its own private files. So, same data is stored in different files. For
example, In the above example of university, Department is storing the
personal details of employees and Accounts Department is also storing the
personal details of the employees for their purpose. This fact leads to
considerable redundancy in stored data, with resultant wastage in storage
space.
2. Data Inconsistency: By ‘Data Inconsistency’ we mean that same data
stored in different files do not match with each other at particular moment of
time. For example, when the employee had joined the university, he has
given some correspondence address But after some period of time, his
correspondence address changes and he has informed to his department
only and forgets to inform other departments. Thus in one file (maintained
by his own department) is updated with new correspondence address But
the other files (maintained by other departments) are still storing the
obsolete correspondence address of that employee. This is known as Data
Inconsistency.
3. Difficulty in sharing data: In this system, each file has its own format for
storing data. By Format we mean that delimiters separating each record in a
file and further for each field in a record can or cannot be same for different
files in an application. Thus for using the data from other file in a file

3
BCA Sem-4 PAPER: BCAB2204T

requires to know the format of the file from which the data is to be shared.
So, it becomes very difficult.
4. Incompatible File Formats: Each programmer stores the data in the files in
the format as per his choice as there is no standard file format for storing
the files. Thus, it becomes very difficult to share data among the files having
different format. Even If some other programmer has to work on a file stored
by different programmer, he has first understood the file format of that file
and then starts working on that file.
5. Lack of Data Security: The data stored in the database must be protected
from unauthorized access. Since in traditional File Processing System, the
application programs are added to the system on the basis of queries which
are not predefined so it is difficult to enforce security measures on these
application programs.
6. Lack of data integrity: Data Integrity means data correctness. For example:
Basic Pay cannot be negative. Such type of data constraints can be imposed
in the traditional file processing system by writing appropriate code in the
application programs. But in future, such type of more constraints is to be
added, we need to modify the application program and sometimes it is not
possible to change the application code. Thus it may lead to poor data which
may result in bad/wrong decisions.
7. Data Dependency: Data Dependence means it is impossible to change the
storage structure without affecting the application program. For example, if
we change the delimiter separating the fields in the records of a file from tab
to double space, we must change the code in the application program that
access data from that file as the structure of file get changed. This losses the
independence of changing the storage structure without changing the
application code.
8. Lack of Flexibility: In Traditional File Processing System, programmers are
already told about which type of queries need to be answered by the
application. And programmers code that query using the data files for that
application. But in the fast moving and competent business environment of
today, apart from such regular queries, there is a need for responding to un-
anticipatory queries, and then the system will fail. Then the programmer
again has to code for such queries. Thus it limitation leads to lack of
flexibility of the file system.
9. Inadequate data modeling of real world: The file system approach has
inability to design a database which shows the basic entities, relationships
and events facing the organization every day. Complex data and interfile
relationships cannot be formally defined to the system.
10. Concurrency Problem: Concurrency means simultaneous access of the
same file by two or more users. When data in a file is simultaneously
accessed by two or more users for updating the data, there must be system
that finally does not lead to inconsistent data. But in this file system, it is
not possible to implement such feature. If possible, it is very difficult to
implement.

4
BCA Sem-4 PAPER: BCAB2204T

1.1.5 Database and Database System: Database is collection of related data. The
database can be of varying sizes and varying complexity. In order to overcome the
limitations of the traditional file processing system, the concept of Database
Systems was introduced. A database system is basically a computerized record
keeping system i.e. it is a computerized system whose overall purpose is to
maintain information and to make the information available on demand.
In other words, A database is a collection of interrelated data stored in a
database server. These data will be stored in the form of tables. The primary aim of
database is to provide a way to store and retrieve database information in fast and
efficient manner. There are number of characteristics that differ from traditional file
management system. In file system approach, each user defines and implements
the needed files for a specific application to run. For example in sales department of
an enterprise, One user will be maintaining the details of how many sales personnel
are there in the sales department and their grades, these details will be stored and
maintained in a separate file. Another, user will be maintaining the salesperson
salary details working in the concern, the detailed salary report will be stored and
maintained in a separate file.
Although both of the users are interested in the data of the salespersons
they will be having their details in a separate file and they need different programs
to manipulate their files. This will lead to wastage of space and redundancy or
replication of data, which may lead to confusion, sharing of data among various
users is not possible, data inconsistency may occur. These files will not be having
any inter-relationship among the data’s stored in these files. Therefore in traditional
file processing every user will be defining their own constraints and implement the
files needed for the applications.
In database approach, a single repository of data is maintained that is
defined once and then accessed by many users. The fundamental characteristic of
database approach is that the database system not only contains data’s but it
contains complete definition or description of the database structure and
constraints. These definitions are stored in a system catalog, which contains the
information about the structure and definitions of the database. The information
stored in the catalog is called the metadata, it describes the primary database.
Hence this approach will work on any type of database for example, insurance
database, Airlines, banking database, Finance details, and Enterprise information
database. But in traditional file processing system the application is developed for a
specific purpose and they will access specific database only.
The other main characteristic of the database is that it will allow multiple
users to access the database at the same time and sharing of data is possible. The
database must include concurrency control software to ensure that several users
trying to update the same data at the same time, it should maintain in a controlled
manner. In file system approach many programmers will be creating files over a
long period and various files have different format, in various application
languages.
A multi-user database whose users have variety of applications must provide
facilities for defining multiple views. In traditional file system, if any changes are

5
BCA Sem-4 PAPER: BCAB2204T

made to the structure of the files if will affect all the programs, so changes to the
structure of a file may require changing of all programs that access the file. But in
case of database approach the structure of the database is stored separately in the
system catalog from the access of the application programs. This property is known
as program-data independence.
Database can be used to provide persistent storage for program objects and
data structures that resulted in object oriented database approach. Traditional
systems suffered from impedance mismatch problem and difficulty in accessing the
data, which is avoided in object oriented database system. Database can be used to
represent complex relationships among data as well as to retrieve and update
related data easily and efficiently.
It is possible to define and enforce integrity constraints for the data’s stored
in the database. The database also provides facilities for recovering hardware and
software failures. The backup and recovery subsystem is responsible for recovery. It
reduces the application development time considerably when compared to the file
system approach and availability of up-to-date information of all the users. It also
provides security to the data stored in the database system.

Self-Check Exercise-I
Q1. Why do we need database?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
Q2. What is data inconsistency?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................

Example: University Database

Used to maintain information concerning students, courses and grades in a
university environment.

The database has five files, each which stores data records of the same type.

The Student file stores data of each student

The Course file stores data on each course,
The Section file stores data on each section of a course,
The Grade_Report stores information on the grades students receive in the
various sections they have completed,
The Prerequisite file stores the prerequisite of each course.

Database Approach vs. Traditional File Processing

6
BCA Sem-4 PAPER: BCAB2204T

• Self contained nature of database systems (database contains both data and
meta-data).
• Data Independence: application programs and queries are independent of
how data is actually stored.
• Data sharing.
• Controlling redundancies and inconsistencies.
• Secure access to database; restricting unauthorized access.
• Enforcing Integrity Constraints.
• Backup and Recovery from system crashes.
• Support for multiple-users and concurrent access.

1.1.6 Composition of Database System: A database system comprises of four

major components namely, data, hardware, software, and users. The simplified
view of a database system is as follows:
Data: It is very important component of the database system. Most of the
organizations generate, store and process large amount of data. The data acts like a
bridge between hardware and software. The users which directly access it or access
it through some application programs. In general, the data in the database will
be both integrated and shared. Let us explain why we mean by the terms
“integrated and shared”:
By Integrated we mean that the database can be thought of as unification of
several distinct files, with any redundancy among those files wholly or partly
eliminated.
By shared, we mean that individual pieces of data in the database can be shared
among several different users, in the sense that each of those users can have
access to the same piece of data. Such sharing is the consequence of the fact that
database is integrated.

Database

Data
And

Application Programs
Programs End Users

Database Management System

7
BCA Sem-4 PAPER: BCAB2204T

Hardware:
The hardware of the database system consists of:
The secondary storage devices, usually, magnetic disks, hard disks, CD-ROMs,
Floppy Disks - that are used to hold the stored data, together with the associated
I/O devices, device controllers etc. The processor(s) and associated main
memory that are used to support the execution of the database system
software.

Software:
Software layer is present in the database system between the physical database
itself i.e. where the data is actually stored and the users of the system and is
known as Database Management System (DBMS). All the requests from the user for
access to the database are handled by DBMS. On general function of DBMS is thus
the shielding of database users from hardware level details. Basically, DBMS acts
like bridge between users and database.

Users:
There are a number of users who can access or retrieve data on demand using
application programs and interfaces provided by DBMS. Each type of user needs
different software capabilities. Following are the categories of the users:
 Application Programmers
 End Users
 Database Administrator (DBA)

Application Programmers: Users who are responsible for writing application

programs that use database. These programs operate on data for retrieving existing
information, inserting new information, deleting or changing existing information.

End Users: End Users are those who need not know about the presence of
database system or any other system supporting their usage. These users interact
with the system via the interface (menu- or- form driven) provided by DBMS. For
example: Users of Automatic Teller machines (ATM) falls under this category.

Database Administrators (DBA): The database administrator is the person who

implements the strategic and policy decisions regarding the data of the enterprise.
Thus DBA is responsible for the overall control of the system at a technical level.
Detailed responsibilities of DBA will be discussed in Lesson 3.

1.1.7 The Database Management System (DBMS):

A database Management System is the software system that allows users to define,
create and maintain a database and provides controlled access to the data.
It is basically a collection of programs that enables users to store, modify, and
extract information from a database as per the requirements. DBMS is an
intermediate layer between programs and the data. Programs ranging from small

8
BCA Sem-4 PAPER: BCAB2204T

systems that run on personal computers to huge systems that runs on mainframes.
Some of common examples of database applications are as follows:
• Billing System at Super Stores
• Patient Management System
• Student Record Management System
• Computerized Account Department at Educational Institutes
• Computerized Flight Reservation System
The DBMS relieves the user from knowing how data is stored physically and the
complex algorithms used for performing operations on the database.
Conceptually, what happens is the following:
1. A user issues an access request, using some particular data sublanguage
(DDL, DML, and DCL using SQL).
2. The DBMS intercepts that request and analyzes it.
3. The DBMS inspects, in turn, the external schema for that user, the
corresponding external/conceptual mapping, the conceptual schema, the
conceptual/internal mapping, and the storage structure definition.
4. The DBMS executes the necessary operations on the stored database.

In other words, DBMS is a collection of programs that enables you to store,

modify, and extract information from a database. There are many different types of
DBMSs, ranging from small systems that run on personal computers to huge
systems that run on mainframes. The following are examples of database
applications:
computerized library systems
automated teller machines
flight reservation systems
computerized parts inventory systems

Key components and functionalities of a DBMS include:

1. Data Manipulation Language (DML): Provides commands to manipulate

and retrieve data from the database. Common DML operations include
INSERT, UPDATE, DELETE, and SELECT.
2. Data Query Language (DQL): Allows users to query the database for
information. SQL (Structured Query Language) is a standard language used
for DQL.
3. Transaction Management: Ensures the consistency and integrity of data by
providing features like ACID properties (Atomicity, Consistency, Isolation,
Durability) for managing transactions.
4. Concurrency Control: Manages simultaneous access to the database by
multiple users to avoid conflicts and ensure data consistency.

9
BCA Sem-4 PAPER: BCAB2204T

5. Security: Enforces access controls to restrict unauthorized users from

accessing or modifying data. This includes user authentication and
authorization mechanisms.
6. Data Integrity: Enforces rules and constraints to maintain the accuracy
and reliability of data in the database. This involves ensuring that data
values meet specified criteria.
7. Backup and Recovery: Provides mechanisms to create backups of the
database and recover data in the event of data loss, system failures, or other
disasters.
8. Data Independence: Allows changes to the database structure without
affecting the applications that use the data. This is achieved through
concepts like data abstraction and encapsulation.
9. Concurrency and Locking Mechanisms: Manages concurrent access to
data by multiple users or applications, preventing conflicts and ensuring
data consistency.
10. Data Definition Language (DDL): Allows users to define the
database structure, including creating, modifying, and deleting tables and
their relationships.

Self-Check Exercise-II
Q3. What is difference between DBMS and traditional file system?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................
Q4. Does DBMS maintain integrity of data?
Ans..........................................................................................................................
................................................................................................................................
................................................................................................................................

1.1.8 Advantages of DBMS over Traditional File Processing System:

1. Redundancy can be reduced: In Traditional File Processing System, each

application has its own private files, which cannot be shared between multiple
applications. This can often lead to considerable redundancy in the stored data,
which results in wastage of storage space. By having centralized database most of
this can be avoided, It is not possible that all redundancy should be eliminated.
Sometimes there are sound business and technical reasons for maintaining
multiple copies of the same data. In a database system, however this redundancy
can be controlled.

10
BCA Sem-4 PAPER: BCAB2204T

For example: In case of college database, there is some common data of the
student which has to be mentioned in each application, like Rollno, Name, Class,
Phone no, Address etc. This will cause the problem of redundancy which results in
wastage of storage space and difficult to maintain, but in case of centralized
database, data can be shared by number of applications and the whole college can
maintain its computerized data with the database containing Roll no, Name, Class,
Father-Name, Address, Phone-No, Date of birth which are stored repeatedly in file
system in file system in each application, need not be stored repeatedly, because
every other application can access this information by joining of relations on the
basis of common column i.e. Roll no. Suppose any user of Library system need the
Name, Address of any particular student and by joining of Library and General
Office relations on the basis of column Roll no he\she can easily retrieve this
information. Thus we can say that centralized system of DBMS reduces the
redundancy of data to great extent but cannot eliminate the redundancy because
Roll no is still repeated in all the relations.

2. Integrity can be enforced

Integrity of data means that data in database is always accurate, such that
incorrect information cannot be stored in database. In order to maintain the
integrity of data, some integrity constraints are enforced on the database. A DBMS
should provide capabilities for defining and enforcing the constraints.
For Example: Let us consider the case of college database and suppose that
college having only BA,BCA, BIT, BBA and BCOM classes. But if a user enters the
class ,CA. then this incorrect information must not be stored in database and must
be prompted that this is an invalid data entry . In order to enforce this, the integrity
constrains must be applied to the class attribute of the student entity. But, in case
of file system this constraint must be enforced on all the application separately
(because all application have class field).
In case of DBMS, this integrity constraint is applied only once on the class
field of the General Office (because class field appears only once in the whole
database), and all other applications will get the class information about the
student from the General Office table so the integrity constraint is applied to the
whole database. Now, we can say that integrity constraint is applied to the whole
database. Now, we can say that integrity constraint can be easily enforced in
centralized DBMS system as compared to file system.

3. Inconsistency can be avoided

When the same data is duplicated and changes are made at one site, which is not
propagated to the other site, it gives rise to inconsistency and the two entries
regarding the same data will not agree. At such times the data is solid to be
inconsistent. So if the redundancy is removed chances of having inconsistent data
is also removed.
Let us again consider the college system and suppose that in case of General
Office file it is indicated that Roll-no 5 lives in Amritsar but in library file it is
indicated that Roll-Number 5 lives in Jalandhar. Then this is a state at which the

11
BCA Sem-4 PAPER: BCAB2204T

two entries of the same object do not agree with each other (that is one is updated
and other is not). At such time the database is said to be inconsistent.
An inconsistent database is capable of supplying incorrect or conflicting
information. So, there should be no inconsistency in database. It can be clearly
shows that inconsistency can be avoided in centralized system very well as
compared to file system.
Let us consider again the example of college system and suppose that
RollNo5 is shifted from Amritsar to Jalandhar, and then address information of Roll
Number 5 must be updated, whenever Roll number and address occurs in the
system. In case of file system, the information must be updated separately in each
application, but if we make updation only at three places and forget to make
updation at fourth application, the whole system shows the inconsistent results
about Roll Number 5.
In case of DBMS, Roll number and address occurs together only single time
in General-Office table. So, it needs single updation and then all other application
retrieve the address information from General-Office which is updated so, all
application will get the current and latest information by providing single update
operation and this single update operation is propagated to the whole database or
all application will get the current and latest information by providing single update
operation and this single updated operation is propagated to the whole database or
all other application automatically, this property is called as Propagation of Update.
We can say the redundancy of data greatly affect the consistency of data. If
redundancy is less, it is easy to implement consistency of data. Thus DBMS system
can avoid inconsistency to great extent.

4. Data can be shared

As explained earlier, the data about Name, Class, Father-name etc. of General-
Office is shared by multiple applications in DBMS as compared to file system so
now application s can be developed to operate against the same stored data. The
applications may be developed without having to create any new stored files.

5. Standards can be enforced

Since DBMS is a central system, so standard can be enforced easily may be at
Company level, Department level, National level or International level. The file
system is an independent system so standard cannot be easily enforced on multiple
independent applications.

6. Restricting unauthorized access

When multiple users share a database, it is likely that some users will not be
authorized to access all information in the database. For example account office
data is often considered confidential, and hence only authorized persons are
allowed to access such data. In addition, some users may be permitted only
authorized persons are allowed to access all information in the database. For
example account office data is often considered confidential, and hence only
authorized persons are allowed to access such data. In addition, some users may

12
BCA Sem-4 PAPER: BCAB2204T

be permitted only to retrieve data, whereas other is allowed both to retrieve and to
update. Hence the type of access operation retrieval or update must also be
controlled. Typically, users or user groups are given account numbers protected by
passwords, which they can use to gain access to the database. DBMS should
provide a security and authorization subsystem, which the DBA users to create
accounts and to specify account restrictions. T he DBMS should then enforce these
restrictions automatically.

7. Solving enterprise requirement than individual requirement

Since many types of users with varying level of technical knowledge use a database,
a DBMS should provide a virility of user interface. The overall requirements of the
enterprise are more important than the individual user requirements. So the DBA
can structure the database system to provide an overall service that is “best for the
enterprise”.
For example: A representation can be chosen for the data in storage that gives fast
access for the most important application but at the cost of poor performance in
some other application. But the file system favors the individual requirements than
the enterprise requirements.

8. Providing Backup and Recovery

A DBMS must provide facilities for recovering from hardware or software failures.
The backup and recovery subsystem of the DBMS is responsible for recovery. For
example, if the computer system fails in the middle of a complex update program,
the recovery subsystem is responsible for making sure that the database is restored
to the state it was in before the program started executing.

1.1.9 Summary
Data is defined as a collection of meaningful facts which can be stored and
processed by computer or humans. Information is processed form of data which
helps us in making decisions. Traditional file processing system has for each
application a separate master file and its own set of personal files. a record is a
collection of related fields that are treated as a single unit. Further all the records
are grouped together to form a file. All the related files are grouped together to form
a database. the computer-based simple file oriented approach to information
processing started suffering from number of limitations like data redundancy, data
inconsistency, difficulty in sharing data, concurrency problems etc.
In order to overcome the limitations of the traditional file processing system,
the concept of Database Systems was introduced. A database system is basically a
computerized record keeping system i.e. it is a computerized system whose overall
purpose is to maintain information and to make the information available on
demand. A database system comprises of four major components namely, data,
hardware, software, and users.

13
BCA Sem-4 PAPER: BCAB2204T

1.1.10 Keywords
DBMS: Database Management System
Application Programs: Software runs on system to help users in performing
various tasks
Concurrency: Ability of a system to handle multiple tasks or processes
simultaneously
Data Integrity: Data integrity refers to the accuracy, consistency, and reliability of
data in a database or information system.
Data Inconsistency: Data inconsistency refers to situations in which there are
discrepancies or contradictions in information stored within a dataset, database, or
system.
Query Language: A query language is a computer programming language used to
communicate with and manipulate databases or information systems.

1.1.11 Short Answer Type Questions

Q1. What is a database and database System?
Q2. Define DBMS.
Q3. Define file, record and field.
Q4. Why do we need DBMS?
Q5: What do we mean by state of data?
Q6: How does DBMS resolve data inconsistency?

1.1.12 Long Answer Type Questions

Q1. Explain the limitations of the Traditional file Processing System?
Q2. Explain the Components of a database system?
Q3. Discuss the categories of users in a database system.
Q4. Define DBMS?
Q5. What are the advantages of DBMS over traditional File Processing
System?

1.1.13 Suggested Readings

• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.

14
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.2 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

Database Architecture

1.2.0 Objectives
1.2.1 Introduction
1.2.2 DBMS Architecture
1.2.3 Data Independence
1.2.4 Mapping between Different Levels
1.2.5 Users of database
1.2.6 DBA and its responsibilities
1.2.7 Database Schema and Instance
1.2.8 Summary
1.2.9 Keywords
1.2.10 Short Answer Type Questions
1.2.11 Long Answer Type Questions
1.2.12 Suggested Readings

1.2.0 Objectives
After completing this lesson, you will be able to:
• Three Level DBMS Architecture or ANSI/SPARC Model
• Data Independence
• Mappings between Different levels of DBMS Architecture
• Users of database
• Roles played by DBA

1.2.1 Introduction
The Three Levels of the Architecture of DBMS is also known as ANSI/SPARC Model.
The goal of this three level architecture is to separate the user applications and the
physical database. The ANSI/SPARC model is divided into three levels, known as the
internal, conceptual, and external levels. In a DBMS based on three-schema
architecture, each user group refers to its own external schema. Hence, the DBMS
must transform a request specified on an external schema into a request against the
conceptual schema, and then into a request on the internal schema for processing
over the stored database.
The process of transforming requests and results between levels are called
mappings. There are two levels of mappings in the architecture – Conceptual/internal

15
BCA Sem-4 PAPER: BCAB2204T

mapping and external/conceptual mapping. The three level architecture is then used
to explain the concept of data independence, which can be defined as the capacity to
change the schema at one level of a database system without having to change the
schema at the next higher level. There are two types of data independence – Logical
data Independence and Physical data independence.

1.2.2 Three-Level Architecture of DBMS or ANSI/SPARC Model:

Several different frameworks have been suggested over the last several years. For
example, a framework may be developed based on the functions that the various
components of a DBMS must provide to its users. It may also be based on different
views of data that are possible within a DBMS. Commonly used views of data
approach is the three-level architecture suggested by ANSI/SPARC (American National
Standards Institute/Standards Planning and Requirements Committee). ANSI/SPARC
produced an interim report in 1972 followed by a final report in 1977. The reports
proposed an architectural framework for databases. A large number of commercial
systems and research database models fit into this framework. The three levels of the
architecture are three different views of the data:
1. External - individual user view
2. Conceptual - community user view
3. Internal - physical or storage view
The three level database architecture allows a clear separation of the information
meaning (conceptual view) from the external data representation and from the
physical data structure layout. A database system that is able to separate the three
different views of data is likely to be flexible and adaptable. This flexibility and
adaptability is data independence

END USERS

EXTERNAL VIEW EXTERNAL VIEW

EXTERNAL LEVEL ……

External/conceptual mapping
Conceptual/internal mapping
CONCEPTUAL LEVEL
CONCEPTUAL VIEW

INTERNAL LEVEL
INTERNAL VIEW
STORED DATABASE

16
BCA Sem-4 PAPER: BCAB2204T

The view at each of these levels is described by a scheme. A Scheme is an outline or a

plan that describes the records and relationships existing in the view. The word
scheme, which means a systematic plan for attaining some goal, is used
interchangeably in the database literature for the plural instead of schema. The word
schema is used in the database literature for the plural instead of schemata, the
grammatically correct word. The scheme also describes the way in which entities at
one level of abstraction can be mapped to the next level.

External or User View

The external or user view is at the highest level of database abstraction where only
those portions of the database of concern to a user or application program are
included. Any number of user views (some of which may be identical) may exist for a
given global or conceptual view.
Each external view is described by means of a scheme called an external schema.
The external schema consists of the definition of the logical records and the
relationships in the external view. The external schema also contains the method of
deriving the objects in the external view from the objects in the conceptual view. The
objects in the external view from the objects in the conceptual view. The object
includes entities, attributes, and relationships. (The terms view, scheme, and schema
are sometimes used interchangeably when there is no confusion as to what is
implied.)

Conceptual or Global View

At this level of database abstraction all the database entities and the relationships
among them are included. One conceptual view represents the entire database. This
conceptual view is defined by the conceptual schema. It describes all the records and
relationships included in the conceptual view and, therefore, in the database. There is
only one conceptual schema per database. This schema also contains the method of
deriving the objects in the conceptual view from the objects in the internal view.
The description of data at this level is in a format independent of its physical
representation. It also includes features that specify the checks to retain data
consistency and integrity.

Internal View
It is at the lowest level of abstraction, closest to the physical storage method used. It
indicates how the data will be scored and describes the data structures and access
methods to be used by the database. The internal view is expressed by the internal
schema, which contains the definition of the stored record, the method of
representing the data fields, and the access aids used. At least the following aspects
are considered at this level:

1. Storage allocation e.g. B-trees, hashing etc.

2. Access paths e.g. specification of primary and secondary keys, indexes
and pointers and sequencing.

17
BCA Sem-4 PAPER: BCAB2204T

3. Miscellaneous e.g. data compression and encryption techniques,

optimization of the internal structures.

Efficiency considerations are the most important at this level and the data structures
are chosen to provide an efficient database. The internal view does not deal with the
physical devices directly. Instead it views a physical device as a collection of physical
pages and allocates space in terms of logical pages.

The separation of the conceptual view from the internal view enables us to provide a
logical description of the database without the need to specify physical structures.
This is often called physical data independence. Separating the external views from
the conceptual view enables us to change the conceptual view without affecting the
external views. This separation is sometimes called logical data independence.

Assuming the three level view of the database, a number of mappings are needed to
enable the users working with one of the external views. For example, the payroll
office may have an external view of the database that consists of the following
information only:

 Staff number, name and address.

 Staff tax information e.g. number of dependents.
 Staff bank information where salary is deposited.
 Staff employment status, salary level, leave information etc.

The conceptual view of the database may contain academic staff, general staff, casual
staff etc. A mapping will need to be created where all the staff in the different
categories are combined into one category for the payroll office. The conceptual view
would include information about each staff's position, the date employment started,
full-time or part-time, etc etc. This will need to be mapped to the salary level for the
salary office. Also, if there is some change in the conceptual view, the external view
can stay the same if the mapping is changed.

Self- Check Exercise-I

Q1. What does conceptual schema represent?
Ans…..........................................................................................................................
...................................................................................................................................
...................................................................................................................................
Q2. Describe the mapping between database architecture and schema.
Ans…..........................................................................................................................
...................................................................................................................................
...................................................................................................................................

18
BCA Sem-4 PAPER: BCAB2204T

1.2.3 Data Independence:

For simplifying the concept of data independence, let us explain the opposite of data
independence. Applications implemented on older systems tend to be data dependent.
Data dependent means the way in which the data is organized in secondary storage,
and the technique for accessing it, are both dictated by the requirements of the
application under consideration and moreover that knowledge of that data
organization and access technique is built into the application logic and code But on
the other hand, in a database system, data dependence is extremely undesirable, at
least for following two reasons:-
1. Different applications will need different views of the same data.
2. The DBA must have the freedom to change the storage structure or
access technique in response to changing requirements, without having
to modify existing applications.
Thus, Data Independence can be defined as the capacity to change the schema at
one level of a database system without having to change the schema at the next
higher level. There are two types of data independences:-
1. Logical data Independence.
2. Physical data Independence.

Difference between Logical Data Independence and Physical Data Independence:

1. Logical data independence is usually required for changing the
conceptual schema without having to change the external schema or
application programs while physical data independence is required for
changing the internal schema without having to change the external
schema or conceptual schema
2. Logical data independence specifies that the application programs need
not to be changed if new fields are added to, or deleted from a database.
It is concerned with changing the logical structure without having to
change the application programs while physical data independence is
concerned with changing the internal schema i.e. changing the physical
storage of the data without having to change the corresponding external
schema.
3. Logical data Independence is usually more difficult to achieve than
physical data independence because the programs required to retrieve
the data are heavily dependent on the logical structure of data while
physical data independence is comparatively easy to achieve than the
logical data Independence.
4. Logical data independence is concerned with changing the structure of
the data or changing the data definition while physical data
independence helps to migrate the files from one kind of storage device
to another.

Advantages of Data Independence:

1. It provides the capacity to change the schema at one level without having the
need to change the schema at the next higher level.

19
BCA Sem-4 PAPER: BCAB2204T

2. It also hides the implementation details or hardware level details from its users
so that the users can concentrate on the program only.
3. The various changes made at different levels are absorbed by mapping at
deferent level.
4. Physical data independence allows the changes to be made in storage of data
without affecting application programs.
5. Logical data independence do not cases effect to application programs even if,
new fields are added or old records are deleted from the existing data.
6. Physical data independence allows the files to migrate from one kind of storage
device to another.
7. Logical data independence do not required the external schema to be changed.
8. Physical data independence do not requires the internal schema to be changed.

1.2.4 Mapping between Different Levels:

The conceptual database is the model or abstraction of the objects of concern to the
database. Thus, the conceptual record of above Figure is the conceptual database and
represents the abstraction of all the applications involving the entity set EMPLOYEE.
The view is the subset of the objects of the objects modeled in the conceptual database
that is used by an application. These could be any number of views of a conceptual
database. A view can be used to limit the portion of the database that is known and
accessible to a given application.
Two mappings are required in a database system with three different views. A
mapping between the external and conceptual views gives the correspondence among
the records and the relationship of the external and conceptual views. The external
view is an abstraction of the conceptual view, which in its turn is an abstraction of the
internal view. It describes the contents of the database as perceived by the internal
view. It describes the contents of the database as perceived by the user or application
program of that view. The user of the external view sees and manipulates a record
corresponding to the external view. There is a mapping from a particular logical record
in the external view to one (or more) conceptual record(s) in the conceptual view. A
number of differences could exist between the two. Names of the fields and records,
for instance, may be different. A number of conceptual fields can be combined into a
single logical field, for example, Last_Name and First_Name at the conceptual level but
Name at the logical level. A given logical record could be derived from a number of
conceptual records.
Similarly, there is a mapping from a conceptual record to an internal one. An
internal record is a record at the internal level, not necessarily a stored record on a
physical storage device. The internal record of above figure may be split up into two or
more physical records. The physical database is the data that is stored on secondary
storage devices. It is made up of records with certain data structures and organized in
files. Consequently, there is an additional mapping from the internal record to one or
more stored records on secondary storage devices. This may have been implemented
using some form of nonlinear addressing. The internal record is assumed to be
linearly addressed. However, this complexity is managed by the DBMS and the user
needs not be aware of its presence nor be concerned with it.

20
BCA Sem-4 PAPER: BCAB2204T

Mapping between the conceptual and the internal level specifies the method of
deriving the conceptual record from the physical database. Again, differences similar
to those that exist between external and conceptual views could exist between the
conceptual and internal views. Such differences are indicated and resolved in the
mapping.
Differences that could exist, besides the difference in names, include the following;
• Representation of numeric values could be different in the two views. One view
could consider a field to be decimal, whereas the other view may regard the
field as binary. A two-way transformation between such vales can be easily
incorporated in the mapping. If, however, the values are stored in a binary
format, the range of values may be limited by the underlying hardware.
• Representation of string data can be considered by the two views to be coded
differently. One view may perceive the string data to be in ASCII code, the other
view may consider the data to be in EBCDIC code. Again, two-way
transformation can be provided.
• The value for a field in one view could be computed from the values in one or
more fields of the other view. For example, the external view may use a field
containing a person’s age, whereas the conceptual view contains the date of
birth. The age value could be derived from the date of birth by using a date
function available from the operating system. Another example of a computed
field would be where an external view requires the value of the hours worked
during a week in a field, whereas the conceptual view contains fields
representing the hours worked each day of the week. The former can be derived
from the later by simple addition. These two examples of transformation
between the external and conceptual views are not bidirectional. One cannot
uniquely reflect a change in the total hours worked during a week to hours
worked during each day of the week. Therefore, a user’s attempt to modify the
corresponding external fields will not be allowed by the DBMS.
Such mapping between the conceptual and internal levels is a correspondence
that indicates how each conceptual record is to be stored and the
characteristics and size of each field of the record. Changing the storage
structure of the record involves changing the conceptual view to internal view
mapping so that the conceptual view does not require any alteration.
The conceptual view can assume that the database contains a sequence of
records for each conceptual record type. These records could be accessed
sequentially or randomly. The actual storage could have been done to optimize
performance. A conceptual record may be split into two records, with the less
frequently used record (part of the original record) on a slower storage device
and the more frequently used, record, on a faster device. The stored record
could be in a physical sequence, or one or more indices may be implemented
for faster access to record occurrences by the index fields. Pointers may exist
in the physical records to access the next record occurrence in various orders.
These structures are hidden from the conceptual view by the mapping between
the two.

21
BCA Sem-4 PAPER: BCAB2204T

1.2.5 User of Database:

There are a number of users who can access or retrieve data on demand using
application programs and interfaces provided by DBMS. Each type of user needs
different software capabilities. Following are the categories of the users:
• Application Programmers
• End Users
• Database Administrator (DBA)
Application Programmers: Users who are responsible for writing application
programs that use database. These programs operate on data for retrieving existing
information, inserting new information, deleting or changing existing information.
End Users: End Users are those who need not know about the presence of database
system or any other system supporting their usage. These users interact with the
system via the interface (menu- or- form driven) provided by DBMS. For example:
Users of Automatic Teller machines (ATM) falls under this category.
Database Administrators (DBA): The database administrator is the person who
implements the strategic and policy decisions regarding the data of the enterprise.
Thus DBA is responsible for the overall control of the system at a technical level.
Detailed responsibilities of DBA will be discussed in the following section.

Self-Check Exercise-II
Q3: Why is it necessary to enforce standards in Databases?
Ans...................................................................................................................
….....................................................................................................................
….....................................................................................................................
Q4: What are the different roles played by stakeholders in DBMS?
Ans...................................................................................................................
….....................................................................................................................
….....................................................................................................................

1.2.6. DBA and its responsibilities

The data administrator is the person who makes the strategic and policy decisions
regarding the data of the enterprise. And the database administrator (DBA) is the
person who provides the necessary technical support for implementing those
decisions. Thus, the DBA is responsible for the overall control of the system at a
technical level. In general, those functions will include the following:
Defining the conceptual schema
It is the data administrator’s job to decide exactly what information is to be held in the
database-in other words, to identify the entities of interest to the enterprise and to
ident9ify the information to be recorded about those entities. This process is usually
referred to as logical-sometimes conceptual-database design. Once the data
administrator has thus decided the content of the database at an abstract level, the
DBA will then create the corresponding conceptual schema, using the conceptual

22
BCA Sem-4 PAPER: BCAB2204T

DDL. The object (compiled) form of that schema will be used by the DBMS in
responding to access requests. The source (uncompiled) form will act as a reference
document for the users of the system.
Defining the internal schema
The DBA must also decide how the data is to be represented in the stored database.
This process is usually referred to as physical database design. Having done the
physical design, the DBA must then create the corresponding storage structure
definition (i.e., the internal schema), using the internal DDL. In addition,
Liaising with users
It is the business of the DBA to liaise with users, to ensure that the data they require
is available, and to write (or help the users write) the necessary external schemas,
using the applicable external DDL. In addition, the mapping between any given
external schema and the conceptual schema must also be defined. In practice, the
external DDL will probably include the means for specifying that mapping should be
clearly separable.
Defining security and integrity procedures
The conceptual DDL should include facilities for specifying security and integrity rules
that can be regarded as part of the conceptual schema.
Defining backup and recovery procedures
Once an enterprise is committed to a database system, it becomes critically dependent
on the successful operation of that system. In the event of damage to any portion of
the database-caused by human error, say, or a failure in the hardware or supporting
operating system-it is essential to be able to repair the data concerned with the
minimum of delay and with as little effect as possible on the rest of the system. For
example, the availability of data that has not been damaged should ideally not be
affected. The DBA must define and implement an appropriate recovery scheme,
involving, e.g., periodic unloading “dumping” of the database to backup storage, and
procedures for reloading the database when necessary from the most recent dump.
Monitoring performance and responding to changing requirements
The DBA is responsible for so organizing the system as to get the performance that is
“best for the enterprise” and for making the appropriate adjustments as requirements
change. For example, it might be necessary to reorganize the stored database on a
periodic basis to ensure that performance levels remain acceptable.

1.2.7 Database Schema and Instance:

The overall structure of the database is called database schema i.e it is the description
of a database, which is specified during the database design and is not expected to
change frequently. The diagram that displays the structure of each record type but not
the actual instance of records is called a schema diagram. A schema diagram displays
only some aspects of a schema, such as names of the records types and data items,
and some types of constraints. For example:

University_Dept:
Dept_Id Dept_name Date_Of_Estb Head

23
BCA Sem-4 PAPER: BCAB2204T

Student_Record
Stu_Id Stu_Name D_O_Adm Class Session

Employee_Record
Emp_ID Emp_Name Emp_Add Dept D_O_J

The actual data in a database may change quite frequently. For Example: the
database
Above shown changes every time we add a student. The data in the database at a
particular moment of time is called a database state or snapshot. It is also called the
current set of occurrences or instances in the database. In the given database state,
each schema construct has its own current set of instances. For example, the
Student_Record construct will contain the set of individual student records as its
instances. Every time we insert or delete a record, or change the value of a data item
in a record, we change one state of the database into another state.
The distinction between database schema and database state is very important. When
we define a new database, we specify its database schema only to the DBMS. At this
point, the corresponding database state is the empty state with no data. We get the
initial state of the database when the database state is first populated or loaded with
the initial data. From then on, every time an update operation is applied to the
database, we get another database state. At any point in time, the database has a
current state.

1.2.8 Summary:
The three level architecture is divided into three levels: the external level, the
conceptual level, and the internal level. The external or user view is at the highest
level of database abstraction where only those portions of the database of concern to a
user or application program are included.
One conceptual view represents the entire database. The conceptual view is defined by
the conceptual schema. It describes all the records and relationships included in the
conceptual view and, therefore, in the database.
The internal view indicates how the data will be scored and describes the data
structures and access methods to be used by the database. It is expressed by the
internal schema, which contains the definition of the stored record, the method of
representing the data fields, and the access aids used. The DBMS provides users with
a method of abstracting their data requirements and removes the drudgery of
specifying the details of the storage and maintenance of data. The DBMS insulates
users from changes that occur in the database. Two levels of database independence
are provided by the system. Physical independence allows changes in the physical
level of data storage without affecting the conceptual view. Logical independence
allows the conceptual view to be changed without affecting the external view.

24
BCA Sem-4 PAPER: BCAB2204T

1.2.9 Keywords
View: A database view is a virtual table that is based on the result of a SELECT query.
Data Mapping: Data mapping refers to the process of defining the relationships and
connections between two distinct data models, schemas, or formats.
Stored Procedure: A stored procedure is a precompiled collection of one or more SQL
statements or procedural statements, which are stored in a database and can be
executed later.
DBA: Database Administrator is responsible for managing database
User liaison: user liaisoning typically involves establishing and maintaining a positive
relationship between an organization and its users or customers.
Schema: The overall structure of the database is called database schema

1.2.10 Short Answer Type Questions

Q1. Define Data Independence.
Q2. What do you mean by data abstraction?
Q3. Which users benefit from external view?
Q4. What do you mean by Database Schema ands instance?
Q5. Give examples of instances of a database?

1.2.11 Long Answer Type Questions

Q1. Explain the difference between Logical data Independence and Physical
data Independence.
Q2. Explain ANSI/SPARC Model.
Q3. Explain the mapping between different levels of DBMS Architecture.
Q4: Define DBA and explain its responsibilities.

1.2.12 Suggested Readings

25
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.3 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

ENTITY RELATIONSHIP MODEL

1.3.0 Objectives
1.3.1 Introduction
1.3.2 Entity Relationship Model
1.3.3 Basic Concepts
1.3.4 Mapping Cardinalities
1.3.5 Entity relationship Diagram
1.3.6 Weak and Strong Entity sets
1.3.7 Aggregation
1.3.8 Summary
1.3.9 Keywords
1.3.10 Short Answer Type Questions
1.3.11 Long Answer Type Questions
1.3.12 Suggested Readings

1.3.0 Objectives
After completing this lesson, you will be able to:
• Define E-R Model
• Explain the mapping Cardinalities
• Define and Draw E-R diagrams
• Define weak and strong entity sets
• Define aggregation

1.3.1 Introduction:
The Entity-Relationship Model is a high-level conceptual data model developed by Chem in
1976 to facilitate database design. The E-R Model is shown a diagrammatically using E-R
diagram which represents the elements of the conceptual model that shows the meanings
and the relationships between those elements independent of any particular DBMS and
implementation details. Cardinality of a relationship between entities is calculated by

26
BCA Sem-4 PAPER: BCAB2204T
measuring how many instances of one entity are related to a single instance of another. One
of the main limitations of the E-R Model is that it cannot express relationship among
relationships. So to represent these relationships among relationships, we combine the
entity sets and their relationship to form a higher level entity set. This process of combining
entity sets and their relationships to form a high entity set so as to represent relationships
among relationships is called Aggregation.
1.3.2 Entity Relationship Model
The entity-relationship model is based on the perception of a real world that consists of
a set of basic objects called entities, and of relationships among these objects. It was
developed to facilitate database design by allowing the specification of an enterprise
schema which represents the overall logical structure of a database. The E-R Model is
extremely useful in mapping the meanings and interactions of real-world enterprises
into a conceptual schema. Entity – relationship model was originally proposed by Peter
in 1976 as a way to unify the network and relational database views. Simply stated the
ER model is a conceptual data model that views the real world as entities and
relationships. A basic component of the model is the entity relationship diagram, which
is used to visually represent data objects. For the database designer, the utility of the
ER model is :
1) It maps well to the relational model. The constructs used in the ER model can
easily be transformed into relational tables.
2) It is simple and easy to understand with a minimum of training.
Therefore the model can be used by the database designer to
communicate the design to the end user.
3) In addition, the model can be used as a design plan by the database
developer to implement a data model in specific database management
software.
1.3.3 Basic Concepts
There are three basic notions that the E-R data model employs – entity sets,
relationship sets, and attributes.
Entity: An entity is a thing or object in the real world that is distinguishable from all
other objects. For example, each person in an enterprise is an entity. An entity has set of
properties and the values for some set of properties may uniquely identify an entity. For
example, the employee Id of a person uniquely identifies one particular person in the
enterprise. Entities are principal data object about which information is to be collected.
Entities are usually recognizable concepts, either concrete or abstract such as person, places,
things, or events which have relevance to the database. Some specific examples of entities are
EMPLOYEES, PROJECTS and INVOICES. An entity is analogous to a table in the relational
model. Entities are classified as :
1) Independent: An independent entity is one that does not rely on
another for identification.

27
BCA Sem-4 PAPER: BCAB2204T
2) Dependent: A dependent entity is one that relies on another for
identification.
Special Entity Types: Associative entities (also known as intersection entities) are
entities used to associate two or more entities in order to reconcile a many-to-many
relationship. Subtypes entities are used in generalization hierarchies to represent a
subset of instances of their parent entity, called the supertype, but which have
attributes or relationships that apply only to the subset.
Entity set: An entity set is a set of entities of the same type that share the same
properties, or attributes. The set of all persons who are customers at a given bank, for
example, can be defined as the entity set customer. The individual entities that
constitute the set are said to be the extension of the entity set. Thus all the customers
of the given bank are the extensions of the entity set customer.
Attributes: An entity is represented by a set of attributes. Attributes are descriptive
properties possessed by each member of an entity set. For example, a customer
entity set of a given bank has the attributes like account number, customer name,
customer address etc. For each attributes, there is a set of permitted values, called the
domain or value set, of that attribute. The domain of attribute customer-name might
be the set of all text strings of a certain length.
A database thus includes a collection of entity sets each of which contains any
number of entities of the same type. For example, a bank database consists of entity
sets like customer, loan etc.
An attribute, as used, in the E-R Model, can be characterized by the following
attribute types:
1. Simple and composite attributes: Simple attributes are those that
cannot be divided into subparts i.e. into other attributes. For example:
customer account number is a simple attribute. Composite attributes are
those that can be further divided into subparts or attributes. For
example: Customer_name attribute of an entity can be considered as
composite attribute because it can further be divide into subparts like
first name, middle name and last name.
2. Single valued and multivalued attributes: The attributes that have a single
value for a particular entity is known as single valued attributes. For example,
employee_Id of an employee in an enterprise will be single valued for every
employee. Multivalued attributes are those attributes that have multiple values
for an entity. For example, employee_dependent_names for a particular employee
in an enterprise can have zero, one or more names depending on the number of
dependents of an employee.
3. Null attributes: A null value is used when an entity does not have a
value for an atrbute. For example, if a particular employee has no
dependents, then the value of employee_dependent_names for that
employee in an enterprise will be null. Null can also designate that an

28
BCA Sem-4 PAPER: BCAB2204T
attribute value is unknown. An unknown value may be either missing or
not known.
4. Derived attributes: The value of these type of attributes is derived from the
values of other related attributes or entities. Some attributes may be related for a
particular entity. For example: the age of an employee can be derived from the
date_of_birth attribute of an employee therefore they are related .
Relationship Sets: A relationship is an association among several entities. For
example, we can define a relationship that associates an employee Ram Singh with
Department Computer Science. This relationship specifies that Employee Ram Singh is
working in the department Computer Science.
A relationship set is a set of relationships of the same type. Formally, it is a
mathematical relation between n>=2 entity sets. If E1,E2,E3…..En are entity sets, then
relationship set R is a subset of {(e1,e2,e3…..en)| e1 ε E1, e2 ε E2, e3 ε E3….. en ε En}
where (e1,e2,e3…..en) is a relationship.
Consider the two entity sets employee and Departments. We define the
relationship set works for to denote the association between employee and the
department that the employees have.
The association between entity sets is referred to as participation, that is the
entity sets e1 ε E1, e2 ε E2, e3 ε E3….. en ε En participate in relationship set R. A
relationship instance in an E-R schema represents that an association exists between
the named entities in the real world enterprise that is being modeled.
The function that an entity plays in a relationship is called that entity’s role.
Since entity sets participating in a relationship set are generally distinct, roles are
implicit and are not usually specified. However, they are useful when the meaning of a
relationship needs clarification. Such is a case when the entity sets of a relationship
set are not distinct, i.e., the same entity set participates in a relationship set more than
once, in different roles. In this type of relationship set, which is sometimes called a
recursive relationship set, explicit role names are necessary to specify how an entity
participates in a relationship instance.
A relationship can also have descriptive attributes. Consider a relationship set
depositor with entity set customer and account. We could associate the attribute
access-date to that relation to specify the most recent date on which a customer
accessed an account. The depositor relationship among the entities corresponding to
customer is described by access-date, which means when customer has most recently
accessed the account.
The number of entity sets that participate in a relationship set is also the
degree of the relationship set. A binary relationship set is of degree 2; a ternary
relationship set is of degree 3.

29
BCA Sem-4 PAPER: BCAB2204T

Self- Check Exercise-I

Q1. What is aggregation?
Ans.................................................................................................................
....…................................................................................................................
.......................................................................................................................
Q2. What are the different types of attributes?
Ans.................................................................................................................
....…................................................................................................................
.......................................................................................................................

1.3.3 Mapping Cardinalities:

Mapping Cardinalities, or cardinality ratios, express the number of entities to
which another entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets, although
occasionally they contribute to the description of relationship sets that involve more
than two entity sets.
For a binary relationship sets R between entity sets A and B, the mapping cardinality
must be one of the following:
• One to One: An entity in A is associated with at most one entity in B, an
entity in B is associated with at most one entity in A. See Figure (a)
• One to many: An entity in A is associated with any number of entities in
B. An entity in B, however, can be associated with at most one entity in
A. See Figure (b)
• Many to one: An entity in A is associated with at most one entity in B.
An entity in B, however, can be associated with any number of entities
in A. See Figure (c)
• Many to Many: An entity in A is associated with any number of entities
in B. An entity in B, can also be associated with any number of entities
in A. See Figure (d)

30
BCA Sem-4 PAPER: BCAB2204T

1.3.4 Entity Relationship Diagrams: The overall logical structure of a database can
be expressed graphically by an E-R Diagram. The relative simplicity and pictorial
clarity of this diagramming technique may well account in large part for the wide
spread use of the E-R Model. Such a diagram consists of the following components:
• Rectangles, which represents entity sets.
• Ellipses, which represents attributes.
• Diamonds, which represents relationship sets.
• Lines, which link attributes to entity sets and entity sets to
relationship sets.
• Double ellipses, which represents multivalued attributes.
• Dashed ellipses, which denote derived attributes.
• Double Lines, Which indicate total participation of an entity in a
relationship set.
To distinguish the mapping cardinalities of relationship set, we draw either directed
line (→) or an undirected line(-) between the relationship set and the entity set.
Consider the following E-R Diagram, which consists of two entity sets, customer
and loan, related through a binary relationship set borrower. The attributes associated
with customer are customer_name, social-security, customer-street, and customer-
city. The attributes associated with loan are loan-number and amount. The
relationship set borrower may be many to many, one to many, many to one or one to
one.

Social_secu
Customer_str
rity eet
Loan_num amou
Customer_n
ber nt
ame Customer_c
ity

Borro
Customer wer Loan

31
BCA Sem-4 PAPER: BCAB2204T
In the above E-R diagram, underlined attributes are acting as primary keys of the
corresponding entity sets.
Direction: The direction of a relationship indicates the originating entity of a relationship
. The entity from which a relationship originates is the parent entity ; the entity where the
relationship terminates is the child entity .
The type of the relation is determined by the direction of line connecting
relationship component and the entity. To distinguish different types of relation, we
draw either a directed line or an undirected line between the relationship set and the
entity set . Directed line is used to indicate one occurrence and undirected line is used
to indicate many occurrences in a relation.
DEPARTMENT , MANAGER , EMPLOYEE , PROJECT
The relationship between a DEPARTMENT and a MANAGER is usually one-to-
one ; there is only one manager per department . This relationship between entities is
shown below. Each entity is represented by a rectangle and a direct line indicates the
relationship between them. The relationship for MANAGER is both 1:1

Department Manager

Note that a one to one relationship between two entity set does not imply that for an
occurrence of an entity from one set at any time there must be an ocurrence of an
entity in the other set. In the case of an organistion , there could be times when a
department is without a manager or when an employee who is classified as a manager
may be without a department.
A one to many relationship exists from the entity MANAGER to the entity
EMPLOYEE because there are several employees reporting to the manager. As we have
pointed out there could be an occurrence of the entity type MANAGER having zero
occurrences of the entity type EMPLOYEE reporting to him or her. A reverse
relationship, from EMPLOYEE to MANAGER would be many to one since a single
manager may supervise many employees.

Manager Employee

The relationship between the entity EMPLOYEE and the entity PROJECT can be
derived as follows : Each employee could be involved in a number of different projects ,
and a number of employees could be working on a given project . This relationship
between EMPLOYEE and PROJECT is many to many .It is illustrated as below .

32
BCA Sem-4 PAPER: BCAB2204T

Employee Project

These relations can also be expressed in terms of following examples .

Consider a Customer – Loan relationship in which the loans are given to
customers. Depending on the rules of the bank, one customer can take a single loan or
multiple loans and relationship may be classified as
1:1 If one customer can take only one loan .

C name
Phone no
Loan No
Address
Amount
City

Customer Cust_ Loan

Loan

One-to-One Relationship

1:M If one customer can take multiple loans

C_name
Phone no
Loan No
Address
Amount
City

Customer Cust_ Loan

Loan

One-to-Many Relationship

33
BCA Sem-4 PAPER: BCAB2204T

M:1 If multiple customers participate for a single loan and one customer can take only
one loan

C name
Phone no
Loan No
Address
Amount
City

Customer Cust_ Loan

Loan

Many-to-One Relationship

M:M If multiple customers participate for a single loan and one cutomer can take
more than one loan .

C name
Phone no
Loan No
Address
Amount
City

Customer Cust_ Loan

Loan

Many-to-Many Relationship
1.3.5 Weak and Strong Entity Sets
An entity Set may not have sufficient attributes to form a primary key, such an entity
set is termed as weak entity set. An entity set that has a primary key is termed as a
strong entity set. For example, consider the entity payment, which has the three
attributes: payment-number, payment-date, and payment-amount. Although each
payment entity is distinct, payments for different loans may share the same payment

34
BCA Sem-4 PAPER: BCAB2204T
number. Thus, this entity set does not have a primary key; it is a weak entity set. For a
weak entity set to be meaningful, it must be part of a one-to-many relationship set.
This relationship set should have no descriptive attributes, since any required
attributes can be associated with the weak entity set. A member of a strong entity set
is a dominant entity, whereas a member of a weak entity set is a subordinate entity.
Although a weak entity set does not have a primary key, we nevertheless need a
means of distinguishing among all those entities in the entity set that depend on one
particular strong entity. The discriminator of a weak entity set is a set of attributes that
allows this distinction to be made. For example, the discriminator of the weak entity set
payment is the attribute payment number, since, for each loan, a payment number uniquely
identifies one single payment for that loan. The discriminator of a weak entity set is also
called the partial key of the entity set.
The Primary key of a weak entity set is formed by the primary key of the strong
entity set on which set is existence dependent, plus the weak entity set’s discriminator.
In the case of the entity set payment, its primary key is (loan-number, payment-
number), where loan-number identifies the dominant entity of a payment, and
payment-number distinguishes payment entities within the same loan.
The identifying dominant entity set is said to own the weak entity set that it
identifies. The relationship that associates the weak entity set with an owner is the
identifying relationship. In our example, loan-payment is the identifying relationship
for payment.
A weak entity set in E-R Diagram is represented by a double outlined box, and
the corresponding identifying relationship by a double outlined diamond.
The relationship between weak entity and strong entity set can be expressed
with following example. In the following example loan-payment is the identifying
relationship for payment entity. A weak entity set is represented by double outlined box
corresponding identifying relation by a doubly outlined diamond. Here double line
indicate total participation of weak entity in strong entity set it means that every
payment must be related via loan-payment to some account. The arrow from loan
payment to loan indicates that each payment is for single loan. The discriminator of a
weak entity set is underlined with dashed lines rather than solid line.

Self- Check Exercise-II

Q3. What is the degree of a relation with 5 entity sets?
Ans.
….......................................................................................................................
……….................................................................................................................
...........................................................................................................................

35
BCA Sem-4 PAPER: BCAB2204T

Q4. Does every relation consist of both weak and strong entity sets?
Ans….….................................................................................................................
…...........................................................................................................................
…...........................................................................................................................

Loan_number Payment_date

Payment_numb
amount er
Payment_a
mt

Loan_
Loan payme Payment
nt

Relation between strong and weak entity set

1.3.6 Aggregation
One limitation of the E-R Model is that it is not possible to express relationships among
relationships. To illustrate the need for such a construct, we consider a database
describing information about customers and their loans. Suppose that each customer-
loan pair may have a bank employee who is the loan officer for that particular pair.
Using our basic E-R Diagram constructs, we obtain E-R diagram drawn in following
figure (a) . It appears that the relationship sets borrower and loan-officer can be
combined into one single relationship set. Nevertheless, we should not combine them,
because doing so would obscure the logical structure of this schema. For example, if
we consider the borrower and loan-officer relationship sets, then this combination
specifies that a loan officer must be assigned to every customer-loan pair, which is not
true. The separation into two different relationship sets solves this problem.
There is redundant information in the resultant figure, however, since every
customer-loan pair in loan-officer is also in borrower. If the loan officer were a value
rather than an employee entity, we could instead make loan-officer a multivalued
attribute of the relationship borrower. But doing so makes it more difficult to find, for
example, customer-laon pairs for which an employee is responsible. Since the loan
officer is an employee entity, this alternative is ruled out in any [Link] best way is to
model a situation such as the one just described is to use aggregation. Aggregation is
an abstraction through which relationships are treated as higher-level entities. Thus,
for our example, we regard the relationship set borrower and the entity sets customer

36
BCA Sem-4 PAPER: BCAB2204T
and loan as a higher-level entity set called borrower. Such an entity set is treated in
the same manner as is any other entity set. A common situation for aggregation is
shown in figure (b).

Social_security
Customer_street

Loan_number amount
Customer_name
Customer_city

Borrower
Customer Loan

Loan-officer

employee

Telephone_number
e- social_security
Employee_name

Figure(a): E-R diagram with redundant relationships

37
BCA Sem-4 PAPER: BCAB2204T

Customer_street
Social_securit Loan_number amount
y
Customer_na Customer_city
me
Borrowe
Customer r Loan

Loan-officer

employee

Telephone_numbe
e- social_security Employee_name r

Figure (b) : E-R Diagram with aggregation

Generalization
A generalization hierarchy is a form of abstraction that specifies that two or more
entities which share common attributes can be generalized into a higher level entity
type called a supertype or generic entity. The lower level of entities becomes the
subtype or categories to the supertype. Subtypes are dependent entities.
Generalization is used to emphasize the similarities among lower-level entity
sets and to hide differences. It makes ER diagram simpler because shared attributes
are not repeated. Generalisation is denoted through a triangle component labeled ‘ IS A
‘ , as shown below.

38
BCA Sem-4 PAPER: BCAB2204T

Interest
Account Rate Account Balanc
e
Balance Overdraft_amt

Current
Saving account
account
Generalised
As

Acc_no Balance

Account

IS
Int Rate
A Overdraft
amt

Saving Account Current Account

Account is the higher level entity set and Saving account and Current account are
lower level entity sets. Generalization proceeds from the recognition that a number of
entity sets share some common features. Based on their commonalities, generalization
synthesizes these entity sets into a simple, higher-level entity set. Generalization is
used to emphasize the similarities among lower-level entity sets and to hide the
differences; it also permits an economy of representation in that shared attributes are
not repeated.
A crucial property of the higher-level and lower-level entities created by
specialization and generalization is attribute inheritance. The attributes of the higher-
level entity sets are said to be inherited by the lower-level entity sets. For example:
savings-account and checking-account inherit the attributes of account. Thus saving-
account is described by its account-number, balance and Interest rate and Checking
account is described by its account-number, balance, and overdraft-amount attributes.

39
BCA Sem-4 PAPER: BCAB2204T
A lower-level entity set also inherits participation in the relationship sets in which its
higher-level participates. Both the savings-account and checking-account entity sets
participate in the depositor relationship set. Attributes inheritance applies through all
tiers of the lower-level entity sets. The standard, gold, and senior lower-level entity sets
inherit the attributes and relationship participation of both checking-account and
account.
Whether a given portion of an E-R model was arrived at by specialization or
generalization, the outcome is basically the same:
• A higher-level entity set with attributes and relationships that apply to
all of its lower-level entity sets.
• Lower-level entity sets with distinctive features that apply only within a
particular lower-level entity set.
Converting E-R diagrams into tables
A database that conforms to an E-R database schema can be represented by a
collection of tables. For each entity set and for each relationship set in the database,
there is a unique table that is assigned the name of the corresponding entity set or
relationship set. Each table has multiple columns, each of which has a unique name.
Both the E-R model and the relational database are abstract, logical
representations of real world enterprises. Because the two models employ similar
design principles, we can convert an E-R design into a relational design. Consequently
a database representation from an E-R diagram to a table format is the basis for
deriving a relational database design from an E-R diagram. In the following section, we
will explain how to convert the E-R diagram or schema into tables.

Tabular representation of Strong Entity Sets

Let E be a strong entity set with descriptive attributes a1, a2, a3……,an. We represent this
entity by a table called E with n distinct columns, each of which corresponds to one of
the attributes of E. Each row in this table corresponds to one entity of the entity set E.
Consider the entity set loan of the E-R diagram shown below:

Social_security
Customer_street

Loan_number amount
Customer_name
Customer_city

Borrower
Customer Loan

This entity set has two attributes: loan_number and amount. We represent this entity set by
table loan, with two columns, as shown in following figure. The row (L-17,1000) in the loan table

40
BCA Sem-4 PAPER: BCAB2204T
means that loan number L-17 has a amount of Rs. 1000. We can add a new entity to the
database by inserting a row into a table. We can also delete or modify rows.

Table : loan
Loan_number amount
L-17 1000

Let D1 denote the set of all loan numbers and let D2 denote the set of all balances. Any
row of the loan table must consist of a 2-tuple (v1,v2) where v1 is a loan, v1 belongs to
D1 and v2 is an amount where v2 belongs to D2. In general, the loan table will contain
only a subset of the set of all possible rows. We refer to the set of all possible rows of
loan as the Cartesian product of D1 and D2 denoted by D1 X D2.
In general, if we have a table of n columns, we denote the Cartesian product of
D1, D2, D3, … Dn by D1X D2 X D3 …..X Dn.
Similarly, another example, consider the entity set customer of the E-R Diagram
shown above. This entity set has the attributes customer_name, social_security,
customer_street, customer_city. The table corresponding to customer has four
columns as shown in Figure below:

Table : Customer
Customer_na Social_security Customer_street Customer_city
me
Suresh 123-4-567 Phase I Patiala

Self- Check Exercise-III

Q5. What is the difference between strong entity and weak entity?
Ans….......................................................................................................................
…........................................................................................................................
...........................................................................................................................
Q6. Generalization is top-down approach or a bottom up aproach?
Ans….......................................................................................................................
…........................................................................................................................
...........................................................................................................................

Tabular Representation of Weak entity Sets

Let A be a weak entity set with attributes a1, a2, a3……,an. Let B be the strong entity set
on which A is dependent. Let the primary key of B consists of attributes b 1, b2,

41
BCA Sem-4 PAPER: BCAB2204T
b3……,bn.. We represent the entity set A by a table called A with one column for each
attribute of the set:
{ a1, a2, a3……,an} U { b1, b2, b3……,bn }
As an illustration, consider the entity set payment consisting of attributes
payment_number, payment_date, payment_amount. The primary key of the loan entity
set, on which payment is dependent, is loan_number. Thus , payment is represented
by a table with four columns labeled loan_number, payment_number, payment_date,
payment_amount as shown in following figure:

Table: payment
Loan_nu Payment_nu Payment Payment_am
mber mber _date ount
L-17 67 10-11- 1000
2006
Tabular Representation of Relationship sets

Let R be a relationship set, let a1, a2, a3……,an be the set of attributes formed by the
union of the primary keys of each of the entity sets participating in R, and let
descriptive attributes (if any) of r be b1, b2, b3……,bn.. We represent this relationship set
by a table called R with one column for each attribute of the set:
{ a1, a2, a3……,an} U { b1, b2, b3……,bn }
As an illustration, consider the relationship set borrower in the E-R Diagram shown
above. This relationship set involves the following two entity sets:
3) Customer, with the primary key social security.
4) Loan, with the primary key loan_number.
Since the relationship set has no attributes, the borrower table has two columns labeled
social_security and loan_number as shown in figure below:
Table : borrower
Social_security Loan_number
123-4-567 67
The case of a relationship set linking a weak entity set to the corresponding strong entity set is
special. As we noted earlier, these relationships are many-to-one and have no descriptive
attributes. Furthermore, the primary key of a weak entity set includes the primary key of the
strong entity sets. In the E-R diagram shown below, the weak entity set payment is dependent
on the strong entity set loan via the relationship set loan-payment. The primary key of payment
is {loan_number, payment_number} and the primary key of loan is {loan_number}. Since
loan_payment has no descriptive attributes, the table for loan_payment would have two
columns, loan_number and payment_number. The table for the entity set payment has four
columns, loan _number, payment_number,payment_date,payment_amount. Thus, the loan-
payment table is redundant. In general, the table for the relationship set linking a weak entity

42
BCA Sem-4 PAPER: BCAB2204T
set with its corresponding strong entity set is redundant and does not need to be present in a
tabular representation of E-R diagram

Payment_date

Loan_number amount Payment_amount

Payment_number

Loan_payment payment
. Loan

Multivalued Attributes
We have seen that attributes in an E-R diagram generally map directly into columns
for the appropriate tables. Multivalued attributes, however, are an exception; new
tables are created for these attributes.
For a multivalued attribute M, we create a table T with a column C that
corresponds to M and columns corresponding to primary key of the entity set or
relationship set of which M is an attribute. For example, consider the E-R diagram that
includes the multivalued attribute dependent_name. For this multivalued attribute, we
create a table dependent_name, with columns dname, referring to the dependent_name
attribute of employee, and e_social_security, representing the primary key of the entity
set employee. Each dependent of an employee is represented as a unique row in the
table.
Tabular Representation of Generalization
There are two different methods for transforming to a tabular form an E-R diagram
that includes generalization.
1. Create a table for the higher-level entity set. For each lower-level entity set, create a
table that includes a column for each of the attributes of that entity set plus a column
for each attribute of the primary key of the higher-level entity set. Thus, for the E-R
diagram, we have three tables:
• Account, with attribute account_number and balance.
• Savings_account, with attributes account_number and interest_rate.
• Checking_account, with attributes account_number and
overdraft_amount.
2. If the generalization is disjoint and complete – that is if no entity is a member of two
lower-level entity sets directly below a higher-level entity set, and if every entity in the
higher-level entity set is also a member of one of the lower-level entity sets – then an
alternative representation is possible. Here, create no table for the higher level entity
set. Instead, for each lower-level entity set, create a table that includes a column for

43
BCA Sem-4 PAPER: BCAB2204T
each of the attributes of that entity set plus a column for each attribute of the higher
level entity set. Then, for the E-R diagram we have two tables:
• saving_account, with attributes account_number, balance, and
interest_rate.
• Checking_account, with attributes accounts_number, balance, and
overdraft_amount.
The saving_account and checking_account relations corresponding to these tables both
have account-numbers as the primary key.
If the second method were used for an overlapping generalization, some values
such as balance would be stored twice unnecessarily. Similarly, If the generalization
were not complete—that is , if accounts were neither savings nor checking accounts—
then such accounts could not be represented with the second option.

Specialization
Specialization is the process of taking subsets of a higher level entity set to form lower
level entity sets . It is a process of defining a set of subclasses of an entity type, which
is called as superclass of the specialization . The process of defining subclass is based
on the basis of some distinct characteristics of the entities in the superclass .
For example specialization of the Employee entity type may yield the set of
subclasses namely Salaried_Employee and Hourly_Employee on the method of pay as
shown below.

44
BCA Sem-4 PAPER: BCAB2204T

Emp Emp Basic Hourly

number name Pay rate

Employee

Emp Emp
number name

Employee

IS
A
Salaried-Employee Hourly_Employ
ee

Hourly
Basic_pa rate
y
1.3.7 Summary
The entity-relationship model is based on the perception of a real world that consists of
a set of basic objects called entities, and of relationships among these objects. There
are three basic notions that the E-R data model employs – entity sets, relationship
sets, and attributes. An entity is a thing or object in the real world that is
distinguishable from all other objects. An entity set is a set of entities of the same type
that share the same properties, or attributes. An entity is represented by a set of
attributes. Attributes are descriptive properties possessed by each member of an entity
set. A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Mapping
Cardinalities, or cardinality ratios, express the number of entities to which another
entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets,
although occasionally they contribute to the description of relationship sets that
involve more than two entity sets. The overall logical structure of a database can be

45
BCA Sem-4 PAPER: BCAB2204T
expressed graphically by an E-R Diagram. The relative simplicity and pictorial clarity of
this diagramming technique may well account in large part for the wide spread use of
the E-R Model. An entity Set may not have sufficient attributes to form a primary key,
such an entity set is termed as weak entity set. An entity set that has a primary key is
termed as a strong entity set. One limitation of the E-R Model is that it is not possible
to express relationships among relationships. This limitation can be removed through
aggregation. Aggregation is an abstraction through which relationships are treated as
higher-level entities.
1.3.8 Keywords
Aggregation: Aggregation generally refers to the process of combining different
elements or components into a single, unified entity.
Cardinality: it refers to the size or multiplicity of a relation
Set: a set is a well-defined collection of distinct objects, considered as an object in its
own right.
Specialization: Specialization is the process of defining a set of subclasses from a
superclass
1.3.9 Short Answer Type Questions:
Q1. Define entity, entity set, attribute, relationship and relationship set.
Q2. What do you mean by an attribute? Explain various types of attributes.
Q3. What is the difference between generalization and specialization?
Q4. How does ER diagram represent the concept of specialization and
generalization?
Q5. Define aggregation with suitable example. Why it is needed?
1.3.10 Long Answer Type Questions
Q1. What do you mean by mapping cardinality? Explain various mapping
cardinalities with suitable examples.
Q2. What is an E-R Diagram? Explain various constructs for drawing E-R
Diagram.
Q3. Explain E-R Model.
Q4. Explain the steps for converting E-R Diagrams into tables.
1.3.11Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New
Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.

46
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.4 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

RELATIONAL DATA MODEL

1.4.0 Objectives
1.4.1 Introduction
1.4.2 Relational Data Model Concepts
1.4.3 Constraints
1.4.4 Summary
1.4.5 Keywords
1.4.6 Short Answer Type Questions
1.4.7 Long Answer Type Questions
1.4.8 Suggested Readings

1.4.0 Objectives
After reading this lesson you will be able to
• Understand the basic concepts of Relational Data Model
• Constraints in Relational Data Model

1.4.1 Introduction
The relational data model is an abstract theory of data that is based on certain aspects
of mathematics (principally set theory and predicate logic). The principles of relational
model were originally laid down in 1969-70 by Dr. E.F. Codd at that time a member of
IBM. Relational model is a way of looking at data. Relational model stores data in the
form of tables. A relational model database is defined as a database that allows you to
group its data items into one or more independent tables that can be related to one
another by using fields common to each related table.

1.4.2 Relational Data Model Concepts

The relational model is concerned with three aspects of data :
1) Structures
2) Data integrity
3) Manipulation
Structure aspects: The data in the database is perceived by the user as a table. It
means database is arranged in the form of tables and collection of tables is called
database. Structure means design view of database like data type, its size etc.
Integrity aspect: Those tables that satisfy certain integrity constraints like domain
constraints, entity constraints, referential integrity and operational constraints.
Manipulative aspects: The operators available for the user for manipulating those
tables into database e.g. for purpose of retrieval of data like projection, join and
restrict.
Characteristics of Relational Database
Relational database system has the following characteristics:
1) The whole data is conceptually represented as an orderly arrangement of
data into rows and columns called a relation or a table.
2) All values are scalar. That is, at any given time for each row/column
position in the relation there is one and only one value.
3) All operations performed on an entire relation and result is an entire
relation, a concept known as closure.
Dr Codd when formulating the relational model, chose the term “relation” because it
was comparatively free of connotations, unlike, for example, the word “table”. It is a
common misconception that the relational model is so called because relationships are
established between tables. In fact the name is derived from the relations on whom it is
based. Notice that the model requires only that data be conceptually represented as a
relation; it does not specify how the data should be physically implemented. A relation
is a relation provided that it is arranged in row and column format and its values are
scalar. Its existence is completely independent of any physical representation.
Emp_Code Name Year

21130 Amar Jain 1

30143 Kuldeep 3
41894 Manoj 2
51207 Rita Bajaj 6

Basic Terminology used in Relational Model

Tuples of a relation
Each row of data is tuple. Actually, each row in n-tuple, but the “n-“ is actually
dropped.
Cardinality of a relation
The number of tuples in a relation determines its cardinality. In this case, the relation
in above figure has a cardinality of 4.
Degree of a relation

48
BCA Sem-4 PAPER: BCAB2204T
Each column in the tuple is called an attribute. The number of attributes in a relation
determines its degree. The relation in above figure has degree 3.
Domains
A domain definition specifies the kind of data represented by the attribute. More
particularly, a domain is a set of all possible values that an attribute may validly
contain. Domains are often confused with data types, but this is inaccurate. Data type
is a physical concept while domain is a logical one. “Number “ is a data type and “Age”
is a domain .
To give another example “Street name” and “Sur name” might both be
represented as text fields but they are obviously different kinds of text fields, they
belong to different domain.
Domains is also a broader concept than data type in that domain definition includes a
more specific description of the valid data. For example, the domain Degree Awarded, which
represents the degrees awarded by a university in the database schema, this attribute might
be defined as Text[3[, but it’s not just any three–character string, it’s a member of the set {BS,
BA, MA, MS, PhD, LLB, MD}, of course not all hundred or so values if we are talking about
mauseums exhibit. In such instances it’s useful to define the domain in terms of the rules,
which can be used to determine the membership of any specific value in the set of all valid
values.
For example, Person Age could be defined as “an integer in the range 0 to 120”
whereas Exhibit Age (age of any object for exhibition) might simply by “an integer equal
to or greater than 0”.
Body of a Relation
The body of the relation consists of an unordered set of zero or more tuples.
There are some important concepts here. First the relation is unordered. Record
numbers do not apply to relations. Second a relation with O tuples still qualifies as a
relation. Third, a relation is a set. The items in a set are, by definition uniquely
identifiable. Therefore for a table to qualify as a relation each record must be uniquely
identifiable and the table must contain no duplicate constraints.
Keys of a Relation
It is a set of one or more columns whose combined values are unique among all
occurrences in a given table. A key is the relational means to specify uniqueness.
1.4.3 Constraints:
There are two types of constraints
1) Table Constraint
2) Column Constraint
Table Constraint
If the constraint spans across multiple columns , the user will have to use table level
constraints. If the data constraint attached to a specific cell in a table references the
contents of another cell in the table , then the user will have to use table level
constraints.

49
BCA Sem-4 PAPER: BCAB2204T
Primary Key as table level constraint :
E.g Create table sales_order_details(s_order_no varchar2(6), Product_no varchar2(6),
PRIMARY KEY (s_order_no, product no));
Column Level Constraint
If the constraints are defined with the column definition, it is called as a column
level constraint. They are local to a specific column.
Primary Key as a column level constraint
Create table client ( client_no varchar2(6) Primary Key…);
Features of Constraint
1) NOT NULL CONDITION
2) UNIQUENESS
3) PRIMARY KEY identification
4) FOREIGN KEY
5) CHECK the column value against a specified condition
Some important constraints features and their implementation have been discussed
below:
Primary Key Constraints
A PRIMARY KEY constraint designates a column or combination of columns as the
table’s primary key. To satisfy a PRIMARY KEY constraint, both the following
conditions must be true :
1) No primary key value can appear in more than one row in the table.
2) No column that is the part of the primary key can contain null value.
A table can have only one primary key.
A primary key column cannot be of data type LONG OR LONG ROW. You cannot
designate the same column or combination of columns as both a primary key and a
unique key or as both a primary key and a cluster key. However, you can designate the
same column or combination of columns as both a primary key and a foreign key.
Defining Primary Keys
You can use the column_constraint syntax to define a primary key on a single column.
Example
The following statement creates the DEPT table and defines and enables
a primary key on the DEPTNO column :
CREATE TABLE dept
( deptno NUMBER (2) CONSTRAINT pk_dept PRIMARY KEY,
dname VARCHAR2(10));
The pk_dept constraint identifies the deptno column as the primary key of the dept
table . This constraint ensures that no two departments in the table have the same
department number and that no department number is NULL .
Alternatively, you can define and enable this constraint with table constraint syntax :
CREATE TABLE dept
(deptno NUMBER(2),

50
BCA Sem-4 PAPER: BCAB2204T
dname VARCHAR2(9),
loc VARCHAR2(10),
constraint PK_DEPT PRIMARY KEY (deptno));
Self- Check Exercise-I
Q1. What are the conditions to implement primary key constraint?
Ans…..............................................................................................................
…....................................................................................................................
.......................................................................................................................
Q2. What are the types of structural constraints?
Ans…..............................................................................................................
…....................................................................................................................
.......................................................................................................................

Defining Composite Primary Keys

A composite primary key is a primary key made up of a combination of columns.
Because Oracle creates index on the columns of a primary key, a composite primary
key can contain a maximum of 16 columns. To define a composite primary key, you
must use the table_constraint syntax, rather than the column_constraint syntax.
Example
The following statement defines a composite primary key on the combination of the
SHIP_NO and CONTAINER_NO columns of the SHIP_CONT table :
ALTER TABLE ship_cont
ADD PRIMARY KEY(ship_no, container_no) DISABLE
This constraint identifies the combination of the SHIP_NO and CONTAINER_NO
columns as the primary key of the SHIP_CONTAINER. The constraint ensures that no
two rows in the table have the same values for both SHIP_NO column and the
CONTAINER_NO column.
The CONSTRAINT clause also specifies the following properties of the constraint.
1) Since the constraint definition does include a constraint name,
Oracle generates a name for the constraint .
2) The DISABLE option causes Oracle to define the constraint but
not enforce it.
Referential Integrity Constraints
A referential integrity constraint designates a column or combination of columns as a
foreign key and establishes a relationship between that foreign key and a specified
primary or unique key, called the referenced key. In this relationship, the table
containing the foreign key is called the child table and the table containing the
referenced key is called the parent table. Note the following:

51
BCA Sem-4 PAPER: BCAB2204T
1) The child and parent tables must be on the same database. They
cannot be on different nodes of a distributed database.
2) The foreign key and the referenced key can be in the same table.
In this case, the parent and child tables are the same.
3) To satisfy a referential integrity constraint, each row of the child
table must meet one following conditions:
a) The value of the row’s foreign key must appear as a
referenced key value in one of the parent table’s rows . The
row in the child table is said to depend on the referenced
key in the parent table.
b) The value of one of the columns that makes up the foreign
key must be null.
A referential integrity constraint is defined in the child table. A referential integrity
constraint definition can include any of the following key words:
) Foreign Key: Identifies the column or combination of columns in
the child table that makes up the foreign key. Only use this
keyword when you define a foreign key with a table constraint
clause.
) Reference: Identifies the parent table and the column or the
combination of columns that make up the referenced key. If you
only identify the parent table and omit the column names, the
foreign key automatically references the primary key on the
parent table. The corresponding columns of the referenced key
and the foreign key must match in number and data types.
On Delete Cascade : Allows deletion of referenced key values in the parent table that have
dependent rows in the child table and causes Oracle to automatically delete dependent rows
from the child table to maintain referential integrity . If you omit this Option, Oracle forbids
deletions or referenced key in the parent table that have dependent rows in the child table.
In the first example, we defined a referential integrity constraint in a CREATE TABLE
statement that contains as clause. Instead, you can create the table without the constraint
and then add it later with an ALTER TABLE statement.
You can define multiple foreign keys in a table . Also , a single column can be part of
more than one foreign key .
Defining Referential Integrity Constraints
You can use column_constraint syntax to define a referential integrity constraint in which
the foreign key is made up of a single column .
Example
The following statement creates the EMP table and defines and enables a foreign key
on the DEPTNO column that references the primary key on the DDPTNO column of the
DEPT table:
CREATE TABLE emp

52
BCA Sem-4 PAPER: BCAB2204T
(Empno NUMBER (4) ,
ename VARCHAR2 (10) ,
job VARCHAR2 (9) ,
ngr NUMBER (4) ,
hiredate DATE ,
sl NUMBER (7,2) ,
deptno CONSTRAINT fk_deptno REFERENCES dept (deptno));
The constraint FK_DEPTNO ensures that all employees in the EMP table work in a
department in the DEPT table. However, employees have null department numbers.
Before you define and enable this constraint you must define and enable a constraint
that designates the DEPTNO column of the DEPT table as a primary or unique key.
Note that the referential integrity constraint definition does not use the FOREIGN KEY
keyword to identify the columns that make up the foreign key. Because the constraint
is defined with a column constraint clause on the DEPTNO column, the foreign key is
automatically on the DEPTNO column.
Note that the constraint definition identifies both the parent table and the columns of
the referenced key. Because the referenced key is the parent table’s primary key, the
referenced key column names are optional.
Note that the above statement omits the DEPTNO column’s data type. Because
this column is a foreign key, Oracle automatically assigns it the data type DEPTNO
column to which the foreign key refers.
Alternatively, you can define a referential integrity constraint with table_constraint
syntax :
CREATE TABLE emp
( empno NUMBER (4) ,
ename VARCHAR2 (10) ,
job VARCHAR2(9) ,
ngr VARCHAR2(9) ,
Hiredate DATE ,
Sl NUMBER(7,2) ,
Comm NUMBER(7,2) ,
Deptno CONSTRAINT fk_deptno FOREIGN KEY
(deptno) REFERENCES dept(deptno));
Note that the foreign key definitions in both the above statements omit the ON DELETE
CASCADE option , causing Oracle to forbid the deletion of a department if any
employee works in that department .
Now if we take a simple example with following relations based on which we see the
various operations in relational model. Relations are :
1) Supplier records
2) Part records
3) Shipment records

53
BCA Sem-4 PAPER: BCAB2204T
The Supplier records
SNo Name Status City
S1 Suneet 20 Qadian
S2 Ankit 10 Amritsar
S3 Amit 10 Amritsar

The Part records

PNo Name Color Weight City

P1 Nut Red 12 Qadian

P2 Bolt Green 17 Amritsar
P3 Screw Blue 17 Jalandhar
P4 Screw Red 14 Qadian

The Shipment records

Sno Pno Qty
S1 P1 250
S1 P2 300
S1 P3 500
S2 P1 250
S2 P2 500
S3 P2 300

As we discussed earlier , in this context we assume that each row in the Supplier table
is identified by a unique SNo (Supplier Number), which uniquely identifies the entire
row in the table . Likewise each part has a unique PNo (Part Number ) .Also we
assume that no more than one shipment exists for a given supplier/part combination
in the shipments table .
Note that the relations Parts and Shipments have PNo (Part Number) in common
and Supplier and Shipments relations have SNo (Supplier Number) in common. The
Supplier and Parts relation have City in common. For example, the fact that supplier
S3 and part P2 are located in the same city is represented by the appearance of the
same value, Amritsar, in the city column of the two tuples in relations.
Self- Check Exercise-II
Q3. Referential Integrity constraint is applied on foreign key or primary key or
both?
Ans…..............................................................................................................
…...................................................................................................................
.......................................................................................................................

54
BCA Sem-4 PAPER: BCAB2204T

Q4. Which command is given to delete all referenced tables of a parent table?
Ans…..............................................................................................................
…...................................................................................................................
......................................................................................................................

Operations in Relational Model

There are four basic operations :
1) Insert
2) Delete
3) Modify
4) Retrieve
Insert Operation:
Suppose we wish to insert the information of supplier who does not supply any part, it
can be inserted in S table without any anomaly e.g. S4 can be inserted in S table.
Similarly, if we wish to insert information of a new part that is not supplied by any
supplier, it can be inserted into a P table. If a supplier starts supplying any new part,
then this information can be stored in shipment table SP with the supplier number,
part number and supplied quantity. So we can say that insert operations can be
performed in all the cases without any anomaly.
Modify Operation
Suppose supplier S1 has moved from Qadian to Jalandhar. In that case we need to make
changes in the record so that the supplier table is up-to-date. Since supplier number is the
primary key in the S table. so there is only a single entry of S1, which needs a single update
and problem of data inconsistencies would not arise. Similarly, part and shipment information
can be updated by a single modification in the tables P and SP respectively without the
problem of inconsistency. Update operation in relational model is very simple and without any
anomaly in case of relational model.
Delete Operation:
Suppose if supplier S3 stops the supply of part P2, then we have to delete the
shipment connecting part P2 and supplier S3 from shipment table SP. This information
can be deleted from SP table without affecting the details of supplier of S3 in supplier
table and part P2 information in part table. Similarly, we can delete the information of
parts in P table and their shipments in SP table and we can delete the information of
suppliers in S table and their shipments in SP table.
Record Retrieval:
Record retrieval methods for relational model are simple and symmetric which can be
clarified with the following queries:
Query 1: Find the supplier numbers for suppliers who supply part P2

55
BCA Sem-4 PAPER: BCAB2204T
In order to get this information we have to search the information of part P2 in
the SP table. For this a loop is constructed to find the rcords of P2 and on getting the
records, corresponding supplier numbers are printed.
Algorithm
do until no more shipments;
get next shipment where PNO=P2;
print SNO;
end;
Query 2 : Find part numbers for parts supplied by supplier S2.
In order to get this information we have to search the information of supplier S2
in the SP table. For this a loop is constructed to find the records of S2 and on getting
the records corresponding part numbers are printed.
Algorithm
do until no more parts ;
get next shipment where SNO=S2;
print PNO;
end;
Since both the queries involve the same logic and are very simple, so we can conclude that
retrieval operation of this model is simple and symmetric.
Structured Query Language (SQL)
Structured query language (SQL) pronounced as “sequel” is the set of commands that
all programs and users must use to access data within the database. Application
programmers and Oracle tools often allow users to access the database without directly
using SQL, but these applications in turn must use SQL when executing the user’s
request.
Historically, the paper, “ A Relational Model of Data for Large Shared Data
Banks,” by Dr E F Codd, was published in June 1970 in the Association of Computer
Machinery (ACM) journal, Communications of the ACM. Codd’s model is now accepted
as the definitive model for relational database management systems (RDBMS). The
language, Structured English Query Language (SEQUEL) was developed by IBM
Corporation, Inc .to use Codd’s model. SEQUEL, later became SQL. In 1979, Relational
Software, Inc introduced the first commercially available implementation of SQL.
Today, SQL is accepted as the standard RDBMS language. The latest SQL standard
published by ANSI and ISO is often called SQL-92 (and sometimes SQL2).
Benefits of SQL
This section describes many of the reasons for SQL’s widespread acceptance by relational
database vendors as well as end users. The strengths of SQL benefit all ranges of users
including application programmers, database administrators, and management and end
users.
Non-Procedural Language
SQL is a non-procedural language because it :

56
BCA Sem-4 PAPER: BCAB2204T
1) Processes sets of records rather than just one at a time ;
2) Provides automatic navigation to the data .
3) System administrators
4) Database administrators
5) Security administrators
6) Application programmers
7) Decision support system personnel
8) Many other types of end users
Unified Language
SQL provides commands for a variety of tasks including :
1. Querying data;
2. Inserting, updating and deleting rows in a table ;
3. Creating ,replacing , altering and dropping objects ;
4. Controlling access to the database and its object ;
5. Guaranteeing database consistency and language .
SQL unifies all the above tasks in one consistent language .
Common Language for all Relational Databases
Because all major relational database management systems support SQL, you can
transfer all skills you have gained with SQL from one database to another. In addition,
since all programmes written in SQL are portable, they can often be moved from one
database to another with very little modification.
Embedded SQL
Embedded SQL refers to the use of standard SQL commands embedded within a
procedural programming language. Embedded SQL is a collection of these commands:
All SQL commands, such as SELECT and INSERT, available with SQL with
interactive tools;
Flow control commands, such as PREPARE and OPEN, which integrate the standard
SQL, commands with a procedural programming language.
The Oracle precompilers support embedded SQL. The Oracle precompilers
interpret embedded SQL statements and translate them into statements that can be
understood by procedural language compilers. Each of these Oracle precompilers
translate embedded SQL programmes into a different procedural language:
The Pro *Ada precompiler
The Pro *C/C++ Precompiler
The Pro * COBOL precompiler
The Pro * FORTRAN precompiler
The Pro * Pascal precompiler
The Pro * PL/I precompiler
Database Objects :
Oracle supports two types of data objects .

57
BCA Sem-4 PAPER: BCAB2204T
Schema Objects : A schema is a collection of logical structures of data, of schema
objects. A schema is owned by a database user and has the same name as the user.
Each user owns a single schema. Schema objects can be created and manipulated with
SQL and include the following types of objects.

Cluster database links database

triggers
Indexes Packaged sequences

Snapshots snapshot logs shared

functions
Stored procedures synonyms tables

Views
Non-schema Objects : Other types of objects are also stored in the database and can
be created and manipulated with SQL , but are not contained in a schema .
Profiles Rates

Rollback segments table spaces

Users

Objects Naming Conventions

The following rules apply when naming objects :
1) Names must be from 1 to 30 characters long with the following
exceptions :
2) Names of database are limited to 8 characters. Names of database links
can be as long as 128 characters.
3) Names cannot contain quotation marks.
4) Names are not case sensitive.
5) A name must begin with an alphabetic character from your database
character set unless surrounded by double quotation marks.
6) Names can only contain alphanumeric characters from your database
character set and the character _,$ and #. You are strongly discouraged
from using $ and #.
7) If your database character set contains multi-byte characters, it is
recommended that each name for a user or a role contain at least one
single-byte character.
8) Names of databases links can also contain periods (.) and ampersand &.

58
BCA Sem-4 PAPER: BCAB2204T
9) Columns in the same table or view cannot have the same name. However,
column in different tables or views can have the same name.
10) Procedures or functions contained in the same package can have the
same name, provided that their arguments are not of the same number
and data types. Creating multiple procedures of functions with the same
name in the same package with different arguments is called overloading
the procedure or function.
Objects Naming Guidelines
There are several helpful guidelines for naming objects and their parts :
1) Use full, descriptive, pronounceable names (or well-known
abbreviations).
2) Use consistent naming rules.
3) Use the same name to describe the same entity or attributes across
tables.
4) When naming objects, balance the objective of keeping names short and
easy to use with the objective of making names as long and descriptive
as possible. When in doubt, choose the more descriptive name because
many people may use the objects in the database over a period of time.
Your counterpart ten years from now may have difficulty understanding
a database with names like PMDD instead of PAYMENT_DUE_DATE.
5) Using consistent naming rules helps understand the part plays in your
application. One such rule might be to begin the names of all tables
belonging to the FINANCE application with FIN_.
6) Use the same names to describe the same things across tables. For
examples, the department number columns of the EMP and DEPT tables
should be named DEPTNO.
Advantages of Relational Model
The major advantages of the relational model are :
1. Structural independence
In relational model changes in the database structure do not affect the
data access. When it is possible to make change to the database structure
without affecting the DBMS’s capability to access data, we can say that
structural independence has been achieved. So, relational database model has
structural independence.
2. Conceptual simplicity:
We have seen that both the hierarchical and the network database
models were conceptually simple. But the relational database model is even
simpler at the conceptual level. Since the relational data model frees the
designer from the physical data storage details, the designers can concentrate
on the logical view of the database.
3. Design, implementation, maintenance and usage case:

59
BCA Sem-4 PAPER: BCAB2204T
The relational database model achieves both data independence and
structure independence making the database design, maintenance,
administration and usage much easier than the other models.
4. Ad hoc query capability:
The presence of very powerful, flexible and easy-to-use query capability is
one of the main reasons for the immense popularity of the relational database
model. The query language of the relational database models structured query
language or SQL makes ad hoc queries a reality. SQL is a fourth generation
language. A 4GL allows the user to specify what must be done without
specifying how it must be done. So, using SQL the users can specify what
information they want and leave the details of how get the information to the
database.
Disadvantages of Relational Model
The relational model’s disadvantages are very minor as compared to the advantages and
their capabilities far outweigh the shortcomings . Also the drawbacks of the relational
database systems could be avoided if proper corrective measures are taken. The drawbacks
are not because of the shortcomings in the database model, but the way it is being
implemented.
Some of the disadvantages are :
1. Hardware overheads:
Relational database system hides the implementation complexities and the
physical data storage details from the users. For doing this, for making things easier
for the users, the relational database systems need more powerful hardware
computers and data storage devices. So the RDBMS needs powerful machines to run
smoothly. But as the processing power of modern computers is increasing at an
exponential rate and in today’s scenario, the need for more processing is no longer a
very big issue.
2. Ease of design can lead to bad design
The relational database is an easy to design and use. The users need not
know the complex details of physical data storage. They need not know how the
data is actually stored to access it. This ease of design and use can lead to the
development and implementation of very poorly designed databases. Since the
database is efficient, these design inefficiencies will not come to light when the
database is designed and when there is only a small amount of data. As the
database grows, the pooly designed databases will slow the system down and
will result in performance degradation and data corruption.
3. Information island phenomenon
As we have said before, the relational database systems are easy to
implement and use. This will create a situation where too many people or
departments will create their own databases and applications.

60
BCA Sem-4 PAPER: BCAB2204T
These information islands will prevent the information integration that is
essential for the smooth and efficient functioning of the organization. These
individual databases will also create problems like data inconsistency, data
duplication, data redundancy and so on.
But as we have said all these issues are minor when compared to the advantages and
all these issues could be avoided if the organization has a properly designed database
and has enforced good database standards.
1.4.4 Summary
The relational data model was first introduced by Dr. E.F. Codd, an Oxford trained
mathematician while working in IBM Research Center in 1970’s. He represented this
idea in a classic paper and attracted immediate attention due to its simplicity and
mathematical foundations. It also drew immediate attention of the computing industry
because of its simple way in which it represented information by well understood
convention of tables of values as its building block. The relational model is considered
one of the most popular developments in the database technology because it can be
used for representing most of the real-world objects and the relationships between
them.
1.4.5 Keywords
Referential Integrity: Referential Integrity is a concept in Database Management
Systems (DBMS) that ensures relationships between tables are maintained
consistently.
Schema Object: A database schema is a collection of database objects, including
tables, views, indexes, sequences, procedures, functions, and more, organized and
defined to represent the structure of a database.
Stored Procedure: A stored procedure is a precompiled collection of one or more SQL
statements that can be executed as a single unit.
Rollback: Refers to the process of undoing or reverting a set of transactions that were
not committed to the database.
1.4.6 Short Answer Type Questions
Q1. What is difference between key an index?
Q2. Differentiate between UNIQUE Key and PRIMARY key.
Q3. What are the features of Column-level constraints.
Q4. What are the schema and non-schema objects?
1.4.7 Long Answer Type Questions
Q1. Explain the various concepts of relational data model.
Q2. Explain the various constraints in relational data model.
Q3. Explain row level constraint with examples.
1.4.8 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia Publication,
New Delhi.

61
BCA Sem-4 PAPER: BCAB2204T
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.

62
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.5 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

Data Models, Keys and Languages

1.5.0 Objectives
1.5.1 Introduction
1.5.2 Database Utilities
1.5.3 Data Models
[Link] High-level data models
[Link] Low-level data models
[Link] Representational data model
1.5.4 What is a Key?
[Link] Super Key
[Link] Candidate Key
[Link] Primary Key
[Link] Unique Key
[Link] Foreign Key
1.5.5 Database Languages
1.5.6 Data Definition Language
1.5.7 Data Manipulation Language
1.5.8 Data Control Language
1.5.9 Summary
1.5.10 Keywords
1.5.11 Short Answer Type Questions
1.5.12 Long Answer Type Questions
1.5.13 Suggested Readings

1.5.0 Objectives
After completing this lesson, you will be able to:
• Explain the database utilities
• Data Model and various data models
• Define Key
• Different type of keys – Super, Candidate, Primary, Unique and
Foreign
• What is DBMS Language?
• DDL, DML, DCL

63
BCA Sem-4 PAPER: BCAB2204T

1.5.1 Introduction
DBMSs have database utilities that help the DBA in managing the database
system. Common utilities include loading, backup, file reorganization and
performance monitoring. One fundamental characteristic of the database approach
is that it provides some level of data abstraction by hiding details of data storage
that are not needed by most database users. A data model is a collection of
concepts that can be used to describe the structure of a database and that provides
the necessary means to achieve the data abstraction. We can categorize the data
models into High-level or conceptual data models and Low-level or physical data
models. High-level or conceptual data models are those that provide concepts that
are close to the way many users perceive data. Entity Relationship model is a
popular example of high-level data model. Low-level or physical data models are
those that provide concepts that describe the details of how data is stored in the
computer. Between the two broad categories of data models is representational or
implementation data model which provide concepts that may be understood by end
users. Hierarchical, Network and relational are the popular examples of this
category of data model. A Key is a property of the entity set, rather than of the
individual entities. Any two individual entities in the set are prohibited from having
the same value of the key attributes at the same time. There is different type of keys
– Super, Candidate, Primary, Unique and Foreign. Keys help in maintaining the
data integrity and consistency.

1.5.2 Database Utilities

DBMSs have database utilities that help the DBA in managing the database
system. Common utilities have the following types of functions:
1. Loading: A loading utility is used to load existing data files --- such as text
files or sequential files—into the database. Usually, the current format of the
data file and the desired database file structure are specified to the utility,
which then automatically reformats the data and stores it in the database.
With the proliferation of DBMSs, transferring data from one DBMS to
another is becoming common in many organizations. Some vendors are
offering products that generate the appropriate loading programs, given the
existing source and target database storage descriptions (internal schemas).
Such tools are also called conversion tools.
2. Backup: A backup utility creates a backup copy of the database, usually by
dumping the entire database onto tape. The backup copy can be used to
restore the database in case of catastrophic failure. Incremental backups are
also often used, where only changes since the previous backup are recorded.
Incremental backup is more complex but it saves space.
3. File reorganization: This utility can be used to reorganize a database file
into a different file organization to improve performance.
4. Performance monitoring: Such a utility monitors database usage and
provides statistics to the DBA. The DBA uses the statistics in making
decisions such as whether or not to reorganize files to improve performance.

64
BCA Sem-4 PAPER: BCAB2204T

Other utilities may be available for sorting files, handling data compression,
monitoring access by users, and performing other functions.

1.5.3 Data Models:

A data model is a collection of concepts that can be used to describe the structure
of a database and that provides the necessary means to achieve the data
abstraction. A Data Model is a collection of concepts that can be used to describe
the structure of the database including data types, relationship and constraints
that should apply on the data.
The main purpose of data modeling is to help in understanding the meaning of the
data and to facilitate communication about information requirements. A data model
supports communication between the users and database designers.

WHY DATA MODELLING IS IMPORTANT?

The goal of data model is to make sure that all data objects required by the
database are completely and accurately represented because the data model uses
easily understood notations and natural language which can be reviewed and
verified as correct by the end users.
The data model must be detailed enough to be used by the database designers for
building the physical database. The information contained in data model will be
used to define the relational tables, primary and foreign keys and other
information. A poorly designed database will require more time in the long term.
Without careful planning you may create a database that omits data required to
create reports, produce results that are incorrect or inconsistent and is unable to
accommodate changes in the user’s requirements. A data model should possess the
following characteristics so that the best possible data representation can be
obtained:
• Diagrammatic representation of the Data Model
• Simplicity in designing and expressibility to distinguish between data
and their relationships. Also it is detailed enough to be used by a
database designer to build the database
• Application independent i.e. it could be shared by different
applications
• No duplication in representation of data
• Consistency and structure validity is maintained.
It follows a bottom up approach. A basic model, representing entities and
relationships are developed first, and then details are added. We can categorize the
data models into three broad categories.
[Link] High-level or conceptual data models: High-level or conceptual data
models are those that provide concepts that are close to the way many users
perceive data. This category is also known as Object based data models. Most of the
common data models under this category are:
a. Entity RELATIONSHIP Model (ER Model): The ER Model is a high level
conceptual data model which describes the structure of the database and
the associated operations like retrieval and updation on the database. The

65
BCA Sem-4 PAPER: BCAB2204T

basic concepts of ER Model include entity type, their attributes and

relationships. It also represents certain constraints which are used on the
database.
b. Object Oriented Model: The object oriented model is based on collection of
objects, attributes and relationships which together form the static
properties. It also consists of integrity Rules over objects and dynamic
properties such as operations or rules defining new database states. An
object is a collection of data and methods. When different objects of same
type are grouped together they form a class. This model is used basically for
multimedia applications as well as data with complex relationships. The
object model is represented graphically with object diagrams containing
object classes. Classes are arranged into hierarchies sharing common
structure and behavior and are associated with other classes.
c. Semantic Data Models: These models are used to express greater
interdependencies among entities of interest. These interdependencies
enable the models to present the semantic of the data in the databases. This
class of data models is influenced by the work done by artificial intelligence
researchers. Semantic data models are developed to organize and represent
knowledge but not data. This type of data models is able to express greater
interdependencies among entities of interest. Mainframe database are
increasingly adopting semantic data models. Mainframe database are
increasingly adopting semantic data models. Also its growth usage is also
seen in PC’s. In coming times database management system will be partially
or fully intelligent.
d. Functional Data Model: The functional data model describes those aspects
of a system concerned with transformations of values-functions, mappings,
constraints and functional dependencies. The functional data model
describes the computations within a system. It show,ls how output value in
computation are derived from input values without regard for the order in
which the values are computed. It also includes constraints among values. It
consists of multiple data flow diagrams. Data flow diagrams show he
dependencies between values and computation of output values from input
values and functions, without regard for when (or if) the functions are
executed. Traditional computing concepts such as expression trees are
examples of functions are executed. Traditional computing concepts such as
expression trees are examples of functional models, as are less traditional
concepts such as spreadsheets.
e. High level data models use concepts such as entities, attributes and
relationships. An entity represents a real-world object or concept, such as an
employee or a project, that is described in the database. An attribute
represents some property of interest that further describes an entity, such
as the employee’s name or salary. A relationship among two or more entities
represents am interaction among the entities. For example, a works-on
relationship between an employee and a project. The object oriented data
model extends the state of the object but also the actions that are associated

66
BCA Sem-4 PAPER: BCAB2204T

with the object, that is, its behavior. The object is said to encapsulate both
state and behavior.
[Link] Low-level or physical data models: Low-level or physical data models are
those that provide concepts that describe the details of how data is stored in the
computer. It represents information such as record formats, record ordering, and
access paths. An access path is a structure that makes the search for particular
database records efficient. One of the most important low-level data model is
unifying data model.
[Link] Representational or implementation data models: Between the two
broad categories of data models is representational or implementation data model
which provide concepts that may be understood by end users. These are used in
describing data at logical and view levels of three level architecture of DBMS. In
contrast to object based data models, these are used to specify the overall logical
structure of the database and to provide a higher-level description of the
implementation. The most popular representational data models are
a. Hierarchical data model
b. Network data model
c. Relational data model
Representational data models represents data by using record structures and
hence are sometimes called record-based data models.

Self-Check Exercise-I
Q1: What is data modelling?
Ans……….....................................................................................................
…...........................................................................................................................
…...........................................................................................................................
Q2: What are the utilities of DBMS?
Ans…......................................................................................................................
…...........................................................................................................................
…....................................................................................................................

1.5.4 What is Key?

Before we start discussing the concept of Key and various types of keys - Consider
the Table PGDCA_Students_Details (Roll_No, SSNo, Student_Name, Father_name,
Address, Phone_No). Here in this table, PGDCA_Students_Details is the name of the
table and
(Roll_No, SSNo, Student_Name, Father_name, Address, Phone_No) are the names of
the attributes (fields) of the table. It is storing the details of the students of the
PGDCA class.
Table is also called entity set and the rows in the table are also called entities.
Definition: A key is a single attribute or combination of two or more attributes of
an entity set that is used to identify one or more instances of the set. The attribute

67
BCA Sem-4 PAPER: BCAB2204T

Roll_No uniquely identifies an instance of the entity set PGDCA_Students_Details.

Thus, a key is a property of the entity set, rather than of the individual entities. Any
two individual entities in the set are prohibited from having the same value of the
key attributes at the same time.
There are various types of keys that are as follows:
• Super Key
• Candidate Key
• Primary Key
• Alternative Key
• Unique Key
• Foreign Key

[Link] Super Key

It is a set of one or more attributes that, taken collectively, allows us to identify
uniquely an entity in the entity set. It has the property of Uniqueness and
Reducibility.
For Example: SK1{Roll_No}, SK2{SSNO}, SK3{Roll_No,SSNo},
SK3{SSNo,Father_Name}, SK4{Roll_No,Student_Name},
SK5{Roll_No,Student_Name,Father_Name} ………..

[Link] Candidate Key

If K is the Super Key, then so is any superset of K. We are basically interested in
Super keys for which no proper subset is a super key. Such minimal super keys are
called candidate keys. It has the property of Uniqueness and Irreducibility.
For Example:CK1{Roll_No},CK2{SSNO},CK3{Roll_No,Father_Name}…

[Link] Primary Key

A candidate key that is chosen by the database designer as the principal means of
identifying entities within an entity set. Such minimal super key or candidate key
that has been selected as primary key is known as Primary Key. It has the property
of Uniqueness and Irreducibility.
For Example: PK1 {Roll_No}

[Link] Alternate Key

Rest of the Candidate keys that are not taken as primary key are alternate keys.
AK1 {SSNO}, AK2 {Roll_No, Father_Name}…..

[Link] Unique
It is same as primary key. The only difference is that Unique allows NULLs but
Primary key does not allow NULL.

[Link] Foreign Key

Let R1 be a relation (table) in which PK1 has been assigned as primary key. Let R2
be another relation in which PK2 is the primary key But PK1 (From Relation R1)
has been marked as Foreign Key. The value of PK1 in any row of relation R2 must
match with the value of PK1 of any row in relation R1.

68
BCA Sem-4 PAPER: BCAB2204T

For Example: In relation PGDCA_Student_Records {Roll_No} is acting as Primary

Key. We have another relation Fee_Records (Slip_ID, Roll_No, Fee_paid,
Date_of_Payment), Slip_Id is acting as Primary Key and PK1 {Roll_No} is acting as
Foreign Key.

Self-Check Exercise-II
Q3: Which keys consist of primary key?
Ans…………............................................................................................................
…...........................................................................................................................
…...........................................................................................................................
Q4: Is it possible to have a foreign key when there is only one table in a database?
Ans…………............................................................................................................
…...........................................................................................................................
…...........................................................................................................................

1.5.5 DBMS Languages

There is variety of users supported by a DBMS. Thus, DBMS must provide
appropriate languages and interfaces for each category of users.
Following are the languages provided for various categories of users for different
operations:
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language (DCL)

1.5.6 Data Definition Language

Once the design of a database is completed and a DBMS is chosen to implement
the database, the first order of the day is to specify conceptual and internal schema
for the database and any mappings between the two. In many DBMSs where no
strict separation of levels is maintained, one language, called the Data Definition
Language (DDL), is used by the DBA and by database designers to define both
schemas. The DBMS will have a DDL compiler whose function is to process DDL
statements in order to identify descriptions of the schema constructs and to store
the schema description in the DBMS catalog. DDL statements are used to define,
alter, or drop database objects. The following table gives an overview about the
usage of DDL statements;

[Link]. Purpose SQL DDL

Statement
1. Create database schema objects Create
2. Alter database schema objects Alter
3. Delete database Schema objects Drop
4. Rename schema objects Rename

69
BCA Sem-4 PAPER: BCAB2204T

1.5.7 Data Manipulation Language

Once the database schemas are compiled and the database is populated with data,
users must have some means to manipulate the database. Typical manipulations
include retrieval, insertion, deletion, modification of the data. The DBMS provides a
data manipulation language (DML) for these purposes. There are two main types of
DMLs - A high-level or nonprocedural DML and a low-level or procedural DML. A
high-level or nonprocedural DML can be used on its own to specify complex
database operations in a concise manner. Many DBMSs allow high-level DML
statements either to be entered interactively from a terminal or to be embedded in a
general-purpose programming language. In the latter case, DML statements must
be identified within the program so that they can be extracted by a pre-compiler
and processed by the DBMS. A low-level or procedural DML must be embedded in a
general-purpose programming language. This type of DML typically retrieves
individual records or objects from the database and processes each separately.
Hence, it needs to use programming language constructs, such as looping, to
retrieve and process each record from a set of records. Low-level DMLs are also
called record-at-a-time DMLs because of this property. High-level DMLs, such as
SQL, can specify and retrieve many records in asingle DML statement and are
hence called set-at-a-time or set-oriented DMLs. A query in a high-level DML often
specifies which data to retrieve rather than how to retrieve it; hence, such
languages are also called declarative.

The following are basic SQL commands that used as DML statements:
[Link]. Purpose SQL statement
1. Retrieve data from one or more tables Select
2. Add new rows in a table Insert
3. Removes rows from table Delete
4. Change data in rows in a table Update

1.5.8 Data Control Language

DBMS also provides a language named as Data Control Language (DCL) for
granting and revoking the privileges for using the database. A privilege can either
be granted to a user with the help of GRANT statement. The privileges assigned can
be Select, Alter, Delete, Execute, Insert, Index etc. In addition to granting of
privileges, you can also revoke it by using Revoke command. Following are the SQL
Statements that can be used as DCL:

[Link]. Purpose SQL statement

1. Granting the privileges Grant
2. Taking away the privileges Revoke
3. Adding a comment to the data Comment
dictionary

70
BCA Sem-4 PAPER: BCAB2204T

1.5.9 Summary
The main purpose of data modeling is to help in understanding the meaning of the
data and to facilitate communication about information requirements. A data model
supports communication between the users and database designers. The goal of
data model is to make sure that all data objects required by the database are
completely and accurately represented because the data model uses easily
understood notations and natural language which can be reviewed and verified as
correct by the end users. a key is a property of the entity set, rather than of the
individual entities. Any two individual entities in the set are prohibited from having
the same value of the key attributes at the same time. There is different type of keys
– Super, Candidate, Primary, Unique and Foreign.
DBMS must provide appropriate languages for various categories of DBMS
users. After database design, a DBMS is chosen to implement the database. Most
common DBMS languages are DDL, DML, and DCL. Data Definition Language
(DDL) is used by the DBA and by database designers to define database schemas.
The DBMS provides a data manipulation language (DML) for manipulating data in
database. Data Control Language (DCL) is used for granting and revoking the
privileges to the users for using the database.

1.5.10 Keywords
DDL: DDL stands for Data Definition Language, which is a subset of SQL
(Structured Query Language) used for defining and managing the structure of a
relational database.
DML: DML stands for Data Manipulation Language, which is a subset of SQL
(Structured Query Language) responsible for managing data within a relational
database.
DCL: DCL stands for Data Control Language, which is a subset of SQL (Structured
Query Language) responsible for managing access to data within a relational
database.

1.5.11 Short Answer Type Questions

1. What do you mean by database utility? Explain the various database
utilities.
2. What do you mean by data model? Explain in brief various data models.
3. Define key. Explain different types of keys along with suitable examples.
4. What is the difference between Unique and Primary key?
5. Define Foreign Key? Is it possible to have a foreign key when there is only
one table in a database?

1.5.12 Long Answer Type Questions

1. Write short notes on DDL, DML, DCL.
2. What are types of DML? Explain each of them.
3. List down the characteristics of Primary Key.

71
BCA Sem-4 PAPER: BCAB2204T

1.5.13 Suggested Readings

• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition,
Addison Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.

72
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.6 AUTHOR : VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

RELATIONAL ALGEBRA

1.6.0 Objectives
1.6.1 Introduction
1.6.2 Relational Algebra
1.6.3 Summary
1.6.4 Keywords
1.6.5 Short Answer Type Questions
1.6.6 Long Answer type Questions
1.6.7 Suggested Readings

1.6.0 Objectives
After reading this lesson you will be able to learn the concepts of Relational
Algebra.
1.6.1 Introduction:
Relational Algebra is a procedural language that can be used to tell the DBMS
how to build a new relation from one or more relations in the database. While using the
relational algebra user has to specify what is required and what are the procedure or
steps to obtain the required output. Relational algebra is a formal and user friendly
language. It is used as the basis for other high level Data Manipulation Languages
(DMLs) for relational databases. It illustrates the basic operations required of any DML
and serve as the standard of comparison for other relational databases.
1.6.2 Relational Algebra
The relational algebra is a theoretical language with operations that work on one
or more relations to define another relation without changing the original relation(s).
Thus, both the operands and the results are relations and so the output from one
operation can become the input to another operation. This allows expressions to be
nested in the relational algebra just as we nest arithmetic operations. This property is
called closure: relations are closed under the algebra just as numbers are closed under
arithmetic operations.
There are many variations of the operations that are included in relational
algebra Codd originally proposed Eight operations, but several others have been
developed.
The five fundamental operations in relational algebra are :
1) Selection

73
BCA Sem-4 PAPER: BCAB2204T
2) Projection
3) Cartesian Product
4) Union
5) Difference
They perform most of the data retrieval operations, which can be expressed in
terms of the five basic operations.
In relational algebra each operation takes one or more relations as its operands
and produces another relation as its result. Consider an example of mathematical
algebra as shown below :
3+5=8
Here 3 and 5 are operands and + is an arithmetic operator which gives result as 8.
Similarly, in relational algebra R1+ R2 = R3. Here R1 and R2 are relations
(operands) and + is the relational operator which gives R3 as a resultant relation.
A) BASIC RELATIONAL ALGEBRA OPERATIONS
Basic relational algebra operations are also called as traditional set operators,
the various traditional set operators are :
1) UNION
2) INTERSECTION
3) DIFFERENCE
4) CARTESIAN PRODUCT
UNION
In mathematical set theory, the union of two sets is the set of all elements
belonging to both sets. The set, which results from the union, must not of course
contain duplicate elements. It is denoted by U. Thus the union of sets:
S1 = { 1 , 2 , 3 , 4, 5 } and
S2 = { 4 , 5 , 6, 7 , 8 }
would be the set { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } .
A union operation on two relational tables follows the same basic principle but
is more complex in practice. In order to perform the Union operation, both operand
relations must be union compatible i.e. they must have same number of columns
drawn from the same domain (means must be of same data type}
Suppose two tables, R and S have the following tuples at some instant in time
and that their header parts are as shown below:
R
Cust_name Cust_status

Sham Good
Rahul Excellent
Mohan Bad
Sachin Excellent
Dinesh Bad

74
BCA Sem-4 PAPER: BCAB2204T
S
Cust_name Cust_status
Karan Bad
Sham Good
Sachin Excellent
Rohan Average

These can certainly be combined into one table containing a valid relation by the
relational union operator (R U S) as follows :
RUS
Cust_name Cust_status
Sham Good
Rahul Excellent
Mohan Bad
Sachin Excellent
Dinesh Bad
Karan Bad
Rohan Average

INTERSECTION
In mathematics an intersection of two sets produces a set, which contains all
the elements that are common to both sets. Thus the intersection of two sets:
S1 = { 1 , 2 , 3 , 4 , 5 } and
S2 = { 4 , 5 , 6 , 7 , 8 }
would be { 4 , 5 } .
In above example both the tables are union compatible and can be intersected
together. The intersection operation on the R and S tables defined above would be :
Cust_name Cust_status

Sham Good

Sachin Excellent

The intersection operator is used in the similar fashion to the union operator,
but provides an ‘and ‘ function.
DIFFERENCE
In mathematics, the difference between two sets S1 and S2 produces a set,
which contains all the members of one set, which are not in the other. It is denoted by
“ – “ sign. The order in which the difference is taken is obviously significant. Thus the
difference between two sets:
S1 = { 1 , 2 , 3 , 4 , 5 }
Minus

75
BCA Sem-4 PAPER: BCAB2204T
S2 = { 4 , 5 , 6 , 7 , 8 }
Would be { 1 , 2 , 3 } and between
S2 = { 4 , 5 , 6 ,7 , 8 }
Minus
S1 = { 1, 2 , 3 , 4 , 5 }
would be { 6 , 7 , 8 }
As for the other set operations discussed so far, the difference operation can also be
performed on tables that are union compatible. The difference operation on the R and S (R –
S) defined above would return.
R–S
Cust_name Cust_status

Rahul Excellent

Mohan Bad

Dinesh Bad

And for S – R
Cust_name Cust_status

Karan Bad

Rohan Average

It is used in a similar fashion to the union and intersection operators , but

provides a qualifying “not” function.
Minus is not associative
In order to prove this mathematically consider three sets A, B, C with following
members :
A ={1,2,3,4,5}
B ={2,3}
C ={1,4}
(A MINUS B ) MINUS C = { 1 , 4 , 5 } MINUS { 1 , 4 } = { 5 }
A MINUS { B MINUS C ) = { 1 , 2 , 3 , 4 , 5 } MINUS { { 2 , 3 }MINUS { 1 , 4 }} = { 1
, 2 , 3 , 4 , 5 } MINUS { 2 , 3 } = { 1 , 4 , 5 }
Both the cases give different result. So minus is not an associative operator.
Minus is not commutative
It means that A MINUS B is different from B MINUS A . In order to prove it we
again take the above values of A and B .
A MINUS B = { 1 , 4 , 5 }
B MINUS A is empty or null because there is not any value, which is in B but
not in A.

76
BCA Sem-4 PAPER: BCAB2204T

Self-Check Exercise-I
Q1: What are the basic algebric operations?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
………………………..…..............................................................................................
Q2. What is difference between algebric operations and boolean operations?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
………………………..…..............................................................................................

CARTESIAN PRODUCT
In mathematics, the Cartesian product of two sets is the set of all ordered pairs
of elements such that the first element in each pair belongs to the first set and the
second element in each pair belongs to the second set. It is denoted by cross (x). It is
for example, given two sets:
S1 = { 1 , 2 , 3 } and
S2 = { 4 , 5 , 6 }
The Cartesian product S1 x S2 is the set :
{ ( 1, 4 ), (1, 5 ), (1 , 6 ), ( 2, 4 ), (2, 5 ), (2 , 6 ), ( 3, 4 ), (3, 5 ), (3 , 6 ) }
Consider the two tables with sample population as below

Female
Name Job

Komal Clerk

Amita Sales

Sonia Production

Nidhi Clerk

Male
Name Job

Rohit Clerk

Amit Sales

Sohan Production

Nitin Clerk

77
BCA Sem-4 PAPER: BCAB2204T
Assume that the tables refer to male and female staff respectively. Now, in order
to obtain all possible inter-staff marriages, the Cartesian product can be taken giving
the Table MALE_FEMALE.

Male-Female

Female_Name Female_Job Male_Name Male_Job

Komal Clerk Rohit Clerk

Komal Clerk Amit Sales

Komal Clerk Sohan Production

Komal Clerk Nitin Clerk

Amita Sales Rohit Clerk

Amita Sales Amit Sales

Amita Sales Sohan Production

Amita Sales Nitin Clerk

Sonia Sales Rohit Clerk

Sonia Sales Amit Sales

Sonia Sales Sohan Production

Sonia Sales Nitin Clerk

Nidhi Clerk Rohit Clerk

Nidhi Clerk Amit Sales

Nidhi Clerk Sohan Production

Nidhi Clerk Nitin Clerk

In order to preserve unique names for attributes, the original attribute names
have had to be concatenated with the original table names. The new table has also
been given an identity.
B) SPECIAL RELATIONAL OPERATIONS
There are four special relational algebra operations which are as under
1) SELECTION
2) PROJECTION
3) JOIN
4) DIVISION
Selection
The selection operator yields a horizontal subset of a given relation that is that
subset of tuples or rows of a table should be selected within the given relation for
which a particular condition is satisfied.

78
BCA Sem-4 PAPER: BCAB2204T
In mathematics a set can have any number of subsets. A set is said to be a
subset of another if all its members are also members of the other set. Thus, in the
following example:
S1 = { 1 , 2 , 3 , 4 , 5 }
S2 = { 2 , 3 , 4 }
S2 is a subset of S1. Since the body part of a table is a set, it is possible for it to
have subsets, that is a selection from its tuples can be used to form another relation.
However, this would be a meaningless operation of no new information were to
be gained from the new relation. On the other hand a subset if say an EMPLOYEE
relation , which contained all tuples where the employee represent those employees
who earn more than some given values of salary, would be useful. What is required is
that some explicit restriction be placed on the sub-setting operation.
Restriction as originally defined was defined on relations only and is achieved
using the comparison operators such as equal to ( = ), not equal to ( != ), greater than (
> ), less than ( < ), greater than or equal to (>=) and less than or equal to ( <= ).
Example : Consider the database having following tables :
The Supplier table

SNo Sname Status City

S1 Suneet 20 Qadian
S2 Ankit 10 Amritsar
S3 Amit 30 Amritsar
S4 Raj 20 Amritsar

The Parts table

Pno Pname Color Weight City

P1 Nut Red 12 Qadian

P2 Bolt Red 17 Amritsar
P3 Screw Blue 17 Jalandhar
P4 Screw Red 14 Qadian

The Shipment table

SNo Pno Qty
S1 P1 250
S2 P2 300
S3 P3 500
S4 P1 250
S5 P2 500
S6 P2 300
Here in Supplier table

79
BCA Sem-4 PAPER: BCAB2204T
Sno - Supplier number of supplier that is unique
Sname - Supplier name
City - City of the supplier
Status - Status of the city e.g A grade cities may have status 10 , B
grade cities may have status 20 and so on .
Examples :
S WHERE CITY = ‘ Qadian ‘

Sno Sname Status City

S1 Suneet 20 Qadian

P WHERE WEIGHT < 15

Pno Pname Color Weight City

P1 Nut Red 12 Qadian

P4 Screw Red 14 Qadian

SP where Sno = ‘ S1’ and Pno = ‘P1’

Sno Pno Qty

S1 P1 300

PROJECTION
The projection operation on a table simply forms another table by copying
specified columns (both header and body parts) from original table eliminating any
duplicated rows. The projection operator yields a vertical subset of a given relation –
that is, the subset obtained by selecting specified attributes, in a specified left to right
order, and then eliminating duplicate tuples within the attributes selected. It is
denoted by pi (π). For example consider the table EMPLOYEE as shown:
Table Employee
Personnel_number Name Age Salary

123 Sham 23 7500

124 Karan 43 10000
125 Rahul 23 10000
The projections of the ‘ age ‘, the ‘ age and salary ‘ and the ‘ personnel _number
and name ‘ columns would return the three tables , say , A , B and C respectively :
A
π age ( employee)
Age

80
BCA Sem-4 PAPER: BCAB2204T
23
43

B
π age,salary (employee)

Age Salary

23 7500

43 10000

23 10000

C
π personnel_number,name (employee)

Personnel_number Name

123 Sham

124 Karan

125 Rahul

JOIN
The most general form of join operation is called a theta join, where theta has
the same meaning as ‘compares with’ as it was used in the context of the restriction
operation. That is, it stands for any of the comparative operators equals, not equals,
greater than and so forth. A theta join is performed on two tables, which have one or
more columns in common which are domain compatible.
It forms a new table which contains all the columns from both the joined tables
whose tuples are those defined by the restriction applied.
For example consider the tables:
EMPLOYEE_PRODUCT

Name Product
Raja Pen
Sparsh Pen
Raja Pencil
Sparsh Rubber
PRODUCT_CUSTOMER

C_Product Customer
Pen Karan

81
BCA Sem-4 PAPER: BCAB2204T
Pen Suneet
Pencil Suneet

The tables list employees who make products and customers who buy those
products and can be joined over the columns ‘product’ and ‘c_product’ in both tables
since the values in both columns are domain compatible. The result of a theta join,
where the restriction is that the product attribute values in EMPLOYEE_PRODUCT
should be equal to the product attribute values in PRODUCT_CUSTOMER would be:

Table EMPLOYEE_PRODUCT_CUSTOMER

Name Product C_Product Customer

Raja Pen Pen Karan

Raja Pen Pen Suneet

Raja Pencil Pencil Suneet

Sparsh Pen Pen Karan

Sparsh Pen Pen Suneet

Note: If both tables have same common column then one of the common column
has to be renamed in the resultant table to preserve the uniqueness of the names in its
header part.
In the above example the theta operator was ‘equals’ and this , the most
common form of theta join is referred to as an equi-join. Note that an equi-join must
always result in a table which has pairs of columns like ‘product; and ‘c_product’ in the
above example, which contain identical lists of attribute values.
By far the most common form of join is a variation of the equi-join where this
duplication of column values is eliminated by taking a projection of the table which
includes only one of the duplicated columns. This is referred to as a natural join.
The natural join of the tables in the last example would give the table:

Name Product Customer

Raja Pen Karan
Raja Pen Suneet
Raja Pencil Suneet
Sparsh Pen Karan
Sparsh Pen Suneet

82
BCA Sem-4 PAPER: BCAB2204T
It may help in understanding the different types of join if the operation is looked
at from a different point of view. The join is actually a composite operator. The theta
join is a Cartesian product operation on the two tables followed by a restriction
operation on the resultant table.

Self-Check Exercise-II
Q3: Which algebric operation is used to find common elements in two tables?
Ans……………………………………………………………………………………………………
…………………………………………………………………………………………………………
………………………..…..........................................................................................
Q4. What is join?
Ans……………………………………………………………………………………………………
…………………………………………………………………………………………………………
………………………..…...........................................................................................

The tuples of the Cartesian product of the two tables in the earlier example
would be :
Name Product C_Product C_Customer
Raja Pen Pen Karan
Raja Pen Pen Suneet
Raja Pen Pencil Suneet
Sparsh Pen Pen Karan
Sparsh Pen Pen Suneet
Sparsh Pen Pencil Suneet
……. …… …… …..
Raja Pencil Pencil Suneet

The restriction operation on this product selects only those tuples from this
relation, which confirm to the restriction. In the example, the restriction was that the
‘product’ attributes should have equal values in each tuple and the result of this as
shown below:

Name Product C_Product Customer

Raja Pen Pen Karan

Raja Pen Pen Suneet

Raja Pencil Pencil Suneet

Sparsh Pen Pen Karan

Sparsh Pen Pen Suneet

83
BCA Sem-4 PAPER: BCAB2204T

Since theta equated to ‘equals’ this was an equi-join. By carrying out a further projection
operation which eliminates one of the duplicated ‘product’ column resulting from the equi-join,
the natural join is obtained.
Thus, Join operator is combination of Cartesian product, Selection and
Projection operator.
The examples given so far have all been of so-called inner joins. The fact that
Jones makes Rubbers is not recorded in any of the resultant tables from the joins,
because the joining values must exist in both tables. If it suffices that the value exist in
only one table, then a so-called outer join is produced.
An outer join of the EMPLOYEE_PRODUCT and PRODUCT_ CUSTOMER tables
exemplified above would return :
Employee_name Product_name Customer_name
Raja Pen Karan
Raja Pen Suneet
Sparsh Pen Karan
Sparsh Pen Suneet
Raja Pencil Suneet
Sparsh Rubber -

The expression A JOIN B is defined if and only if, for every unqualified attribute-name
that is common to A and B, the underlying domain is the same for both relations. Assume that
this condition is satisfied. Let the qualified attribute –names for A and B, in their left-to-right
order, be A.A1,.............[Link] AND B.B (m+1)……………., B.B (m+n) respectively;
Let Ci …….,Cj be the unqualified attribute name that are common to A and B
and let Br………..Bs be the unqualified attribute- names remaining for b (with
their relative order undisturbed) after removal of Ci,………..Cj.
Then A JOIN B defined to be equivalent to (A TIMES B ) [A.A1 ……….[Link] ,
[Link]……….[Link] ]
where [Link] = [Link]
and ……………..
and [Link] = [Link]……….
Apply this definition to JOIN operation on Emp and Dept tables with following
attributes:
EMP(empno,ename,job,sal,deptno)
DEPT(deptno,dname,loc)
EMP JOIN DEPT = EMP TIMES DEPT
[[Link],[Link],[Link],[Link],[Link],[Link],[Link]]
where [Link] = DEPT. deptno
So, w can say that JOIN is a combination of Product, Selection and Projection
operators. JOIN is an associative operator, which means:
(A JOIN B ) JOIN C = A JOIN ( B JOIN C ) .
JOIN is also commutative .

84
BCA Sem-4 PAPER: BCAB2204T
A JOIN B = B JOIN A
DIVISION
The division operator divides a dividend relation A of degree (means number of
columns in a relation ) m+n by a divisor relation B of degree n and produces a
resultant relation of degree m .
Relation A

Sno Pno
S1 P1
S1 P2
S1 P3
S1 P4
S1 P5
S1 P6
S2 P1
S2 P2
S3 P2
S4 P2
S4 P4
S4 P5
Relation B
CASE 1 CASE 2

Pno
P1
Pno

CASE 3
Pno
P1
P2
P3
P4
P5
P6
A DIVIDED BY B

CASE 1 CASE 2 Case 3

Sno

85
BCA Sem-4 PAPER: BCAB2204T
S1

Sno Sno

S1 S1

In this example dividend relation A has two attributes of Sno,Pno (of degree 2) and
division relation B has only one attribute Pno (of degree 1 ). Then A divided by B gives
a resultant relation of degree 1. It means it has only one attribute of Sno.
A SNO * PNO
--- = ------------------ = SNO
B PNO
The resultant relation has those tuples that are common values of those
attributes, which appears in the resultant attribute sno .
For example ,in CASE 2,
P2 has Snos S1,S2,S3,S4
P4 has Snos S1,S4
S1, S4 are the common supplier who supply both P2 and P4. So the resultant
relation has tuples S1 and S4.
In CASE 3
There is only one supplier S1 who supply all the parts from P1 to P6.
1.6.3 Summary
Relational Algebra is a procedural language which specifies the operations to be
performed on the existing relations to derive result relations. Relational Algebric
operations can be divided into basic and special relational operators. Relational
Calculus is a non procedural language which is an alternate way of formulating
queries. It is based on Predicate Calculus which means to formulate set of predicates
to which the answer to a query must conform instead of specifying a series of
subsequent singular operations together with objects involved in these operations.
1.6.4 Keywords
Universal Set: a "universal set" is a set that contains all the elements or objects under
consideration for a particular discussion or problem.
Union: it is a fundamental operation that combines the elements of two or more sets
to create a new set.
Intersection: the intersection operation in set theory refers to the set of elements that
are common to two or more sets.
Difference: It refers to the set of elements that belong to one set but not to another.
Cartesian product: It is an operation in set theory and mathematics that combines
elements from two sets to create a new set of ordered pairs. It is denoted by the symbol
"×".
1.6.5 Short Answer Type Questions
Q1. What is relational Algebra and what are its uses?
Q2. Explain the following operations with examples:

86
BCA Sem-4 PAPER: BCAB2204T
1. Union 2. Intersection 3. Difference
4. Cartesian Product 5. Division
1.6.6 Long Answer Type Questions
Q1: Consider the following relational database schema consisting of the four
relation schemas:
passenger ( pid, pname, pgender, pcity)
agency ( aid, aname, acity)
flight (fid, fdate, time, src, dest)
booking (pid, aid, fid, fdate)
Answer the following questions using relational algebra queries;
a) Get the complete details of all flights to New Delhi.
b) Get the details about all flights from Chennai to New Delhi
c) Find only the flight numbers for passenger with pid 123 for flights to Chennai
before 06/11/2020.
d) Find the passenger names for passengers who have bookings on at least one
flight.
1.6.7 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New
Delhi.
• C. J. D ate, An introduction to database Systems, Sixth Edition, Addison Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems,
Addison Wesley.

87
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.7 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

DATA BASE DESIGN

1.7.0 Objectives
1.7.1 Introduction
1.7.2 Functional Dependency
[Link] Basic Concepts
[Link] Closure of a Set of Functional Dependencies
[Link] Closure of Attribute Sets
1.7.3 Decomposition
[Link] Desirable Properties of Decomposition
[Link] Dependency Preservation
[Link] Repetition of Information
1.7.4 Problems arising out of bad database Design
1.7.5 Summary
1.7.6 Keywords
1.7.7 Short Answer Type Questions
1.7.8 Long Answer Type Questions
1.7.9 Suggested Readings

1.7.0 Objectives
After completing this lesson, you will be able to:
• Define Functional Dependency and its importance in database design
• Understand decomposition of relation
• Understand the problems that arise due to bad database design

1.7.1 Introduction
The concept of functional dependency is the basis for Normalization. The
functional dependencies are the consequence of the interrelationships among
attributes of a relation (table) represented by some link or association. It must be taken
care that the database design must be very good and that needs careful decomposition

88
BCA Sem-4 PAPER: BCAB2204T
of the relations into further relations. In the following sections we will study how to
decompose the relations so that it leads to good database design. And if we do not do
decomposition with care it will result in bad database design which includes repetition
of data like problems.
1.7.2 Functional Dependencies
Functional dependencies play a key role in differentiating good database designs
from bad database design. A functional dependency is a type of constraint that is a
generalization of the notion of key.
[Link] Basic Concepts
Functional dependencies are constraints on the set of legal relations. They allow
us to express facts about the enterprise that we are modeling with our database.
Functional Dependency is a many-to-one relationship from one set of attributes
to another within a given relation.
We define the notion of a super-key as follows. Let R be a relation schema. A
subset K of R is a super-key of R if, in any legal relation r(R), for all pairs t1 and t2 of
tuples in r such that t1 ≠ t2, then t1[K] ≠ t2[K]. That is no two tuples in any legal relation
in r (R) may have the same value on attribute set K.
The notion of functional dependency generalizes the notion of super-key.
Consider a relation schema R, and let α  R and β  R. The functional dependency
α→β
holds on schema R if, in any legal relation r(R) for all pairs of tuples t1 and t2 in r such
that t1[α] = t2 [α], it is also the case that t1[β] = t2[β].
Using the functional-dependency notation, we say that K is a super-key of R if
K→ R. That is K is a super-key if, whenever t1[K] = t2 [K] it is also the case that t1[R] =
t2 [R] (that is t1 = t2).
Functional dependencies allow us to express constraints that we cannot express
with super-keys. Consider the schema :
Loan-info-schema = (loan-number, branch-name, customer-name, amount) which
is simplification of the lending-schema that we saw earlier. The set of functional
dependencies that we expect to hold on this relation schema is :
loan-number → amount
loan-number → branch-name
We would not, however, expect the functional dependency
loan-number → customer-name
to hold, since in general a given loan can be made to more than one customer (for
example, to both members of a husband – wife pair).
We shall use functional dependencies in two ways:
 To test relations to see whether they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.

89
BCA Sem-4 PAPER: BCAB2204T
 To specify constraint on the set of legal relations. We shall thus concern
ourselves with only those relations that satisfy a given set of functional
dependencies. If we wish to constrain ourselves to relations on schema R
that satisfy a set F of functional dependencies, we say that F holds on R.
Let us consider the relation r of figure below:

A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
Sample relation r
to see which functional dependencies are satisfied. Observe that A→C is satisfied.
There are two tuples that have an A value of a1. These have the same C value – namely
c1. Similarly, the two tuples with an A value of a2 have the same C value, c2. There are
not other pairs of distinct tuples that have the same a value. The functional
dependency C—A is not satisfied however. To see that it is not, consider the tuples t1 =
(a2, b3,c2, d3) and t2 = (a3, b3, c2, d4) these two tuples have the same C values c2, but
they have different A values a2 and a3, respectively. Thus we have found a pair of
tuples t1 and t2 such that t1 [C] = t2 [C] but t1[A]≠t2[A].
Many other functional dependencies are satisfied by r, including, for example,
the functional dependency AB→D. Note that we use AB as shorthand for {A, B}, to
conform with standard practice. Observe that there is no pair of distinct tuples t1 and
t2 such that t1[AB] = t2[AB]. Therefore, if t1[AB] = t2[AB], it must be that t1 = t2 and thus
t1[D] = t2[D]. So satisfies AB→D.
Some functional dependencies are said to be trivial because they are satisfied by all relations.
For example, A→A is satisfied by all relations involving attribute A. Reading the definition of
functional dependency literally, we see that, for all tuples t1 and t2 such that t1[A] = t2[A] it is the
case that t1[A] = t2[A]. Similarly, AB → A is satisfied by all relations involving attribute A. In
general a functional dependency of the form α → β is trivial if β ≤ α.
To distinguish between the concepts of a relation satisfying a dependency and a
dependency holding on a schema, we return to the banking example. If we consider the
customer relation (on customer-schema) in Figure below, we see that customer-street →
customer-city is satisfied. However, we believe that in the real world, two cities can have
streets with the same name.
Customer-name Customer-street Customer-city
Jones Main Harrison
Smith North Rye

90
BCA Sem-4 PAPER: BCAB2204T
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford
The customer relation
Thus, it is possible, at some time to have an instance of the customer relation in which
customer-street→ customer-city is not satisfied. So we would not include customer-street→
customer-city in the set of functional dependencies that hold on Customer-schema.

Self-Check Exercise-I
Q1: What is functional dependency?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
…………..….................................................................................................................
Q2. Why is it necessary to identify functional dependencies?
Ans…………………………………………………………………………………………………………
……………………………………………………………………………………………………………
…………..…...............................................................................................................

In the loan relation (on loan-schema) of figure below, we see that the dependency loan-
number → amount is satisfied. In contrast to the case of customer-city and customer-street in
customer-schema, we do believe that the real world enterprise that we are modeling requires
each loan to have only one amount. Therefore we want to require that loan-number→ amount
be satisfied by the loan relation at all times. In other words, we require that the constraint loan
number→ amount hold on loan-schema.
The loan relation:
Loan-number Branch-name Amount
L-17 Downtown 1000
L-23 Redwood 2000
L-15 Perryridge 1500

91
BCA Sem-4 PAPER: BCAB2204T
L-14 Downtown 1500
L-93 Mianus 500
L-11 Round Hill 900
L-29 Pownal 1200
L-16 North Town 1300
L-18 Downtown 2000
L-25 Perryridge 2500
L-10 Brighton 2200
In the branch relation of Figure below, we see that branch-name→ assets is
satisfied, as is assets→ branch-name. We want to require that branch-name→ assets
hold on branch-schema. However we do not wish to require that assets→ branch-name
hold since it is possible to have several branches that have the same asset value.

Branch-name Branch- Assets

city
Downtown Brooklyn 9000000
Redwood Palo Alto 2100000
Perryridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
The branch relation
In what follows, we assume that, when we design a relational database, we first
list those functional dependencies that must always hold. In the banking example our
list of dependencies includes the following:
• On branch-schema:
Branch-name→ branch-city
Branch-name→ assets
• On customer-schema:
customer-name→ customer-city
customer-name→ customer-street
• On Loan-schema:
Loan-number→ amount
Loan-number→ branch-name
• On Borrower-schema:
No functional dependencies
• On Account-schema:

92
BCA Sem-4 PAPER: BCAB2204T
Account-number→ branch-name
Account-number→ balance
• On depositor-schema:
No functional dependencies

[Link] Closure of a set of Functional dependencies

It is not sufficient to consider the given set of functional dependencies. Rather,
we need to consider all functional dependencies that hold. We shall see that given a set
F of functional dependencies, we can prove that certain other functional dependencies
hold. We say that such functional dependencies are “logically implied” by F.
More formally given a relational schema R, a functional dependency f on R is
logically implied by a set of functional dependencies F of R if every relation instance
r(R) that satisfied F also satisfies f.
Suppose we are given a relation schema R = (A, B, C, G, H, I,) and the set of
functional dependencies
A→ B
A→C
CG→ H
CG→ I
B→ H
The functional dependency
A→ H
is logically implied. That is, we can show that, whenever our given set of functional
dependencies holds on a relation, A → H must also hold on the relation. Suppose that
t1 and t2 are tuples such that
t1[A] = t2 [A]
since we are given that A → B, it follows from the definition of functional dependency
that
t1[B] = t2 [B]
then, since we are given that B → H, it follows from the definition of functional
dependency that
t1[H] = t2 [H]
Therefore it shows that whenever t1 and t2 are tuples such that t1 [A] = t2[A] it
must be that t1[H] = t2[H]. But that is exactly the definition of A → H.
Let f be a set of functional dependencies logically. The closure of F, denoted by F+, is
the set of all functional dependencies implied by F. Given F, we can compute f directly from
the formal definition of functional dependency. If F were large, this process would be lengthy
and difficult. Such a computation of F+ requires arguments of the type just used to show
that A → H is in the closure of our example set of dependencies.

93
BCA Sem-4 PAPER: BCAB2204T
Axioms or rules of inference provide a simpler technique for reasoning about functional
dependencies. In the rules that follow, we use Greek letters for sets of attributes, and
uppercase Roman letters from the beginning of the alphabet for individual attributes. We use
αβ to denote α U β.
We can use the following three rules to find implied functional dependencies. By
applying these rules repeatedly, we can find all of F+, given F. This collection of rules is
called Armstrong’s axioms in honor of the person who first proposed it.
• Reflexivity rule. If α is a set of attributes and β  α, then
α → β holds.
• Augmentation rule. If α → β holds and γ is a set of attributes, then
γα → γβ holds.
• Transitivity rule. If α → β holds and β→ γ holds, then α → γ holds.
Armstrong’s axioms are sound, because they do not generate any incorrect
functional dependencies. They are complete, because for a given set F of functional
dependencies, they allow us to generate all F+.
Although Armstrong’s axioms are complete, it is tiresome to use them directly
for the computation of F+. To simplify matters further, we list additional rules. It is
possible to use Armstrong’s axioms to prove that these rules are correct.
• Union rule. If α → β holds and α → γ holds, then α → βγ holds.
• Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.
• Pseudotransitivity rule. If α → β holds and γβ→ δ holds, then αγ→ δ
holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A→ B, A→ C, CG→ H, CG→ I, B→ H}. We list several
members of F+ here.
• A→ H. Since A→ B and B→ H hold, we apply the transitivity rule. Observe
that it was much easier to use Armstrong’s axioms to show that A→ H
holds than it was to argue directly from the definitions, as we did earlier
in this section.
• CG→ HI. Since CG→ H and CG→ I, the union rule implies that CG→ HI.
• AG→ I. Since A→ C and CG→ I, the pseudotransitivity rule implies that
AG→ I holds.
Another way of finding that AG→ I holds is as follows. We use the augmentation
rule on A→ C to infer AG→ CG. Applying the transitivity rule to this dependency and
CG→ I, we infer AG→ I.
Figure below shows a procedure that demonstrates formally how to use
Armstrong’s axioms to compute F+. In this procedure, when a functional dependency is
added to F+, it may be already present, and in that case there is no change to F+. We
will also see an alternative way of computing F in next section.

94
BCA Sem-4 PAPER: BCAB2204T

F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
until F does not change any further
+

The left-hand and right-hand sides of a functional dependency are both subsets
of R. Since a set of size n has 2n subsets, there are a total of 2 x 2 n = 2 n+1 possible
functional dependencies, where n is the number of attributes in R. Each iteration of
the repeat loop of the procedure, except the last iteration, adds at least one functional
dependency to F+. Thus, the procedure is guaranteed to terminate.
[Link] Closure of Attribute Sets
To test whether a set α is a super-key, we must devise an algorithm for computing
the set of attributes functionally determined by α. One way of doing this is to compute F+,
take all functional dependencies with α as the left-hand side, and take the union of the
right-hand sides of all such dependencies. However doing so can be expensive, since F+ can
be large.
An efficient algorithm for computing the set of attributes functionally determined by
α is useful not only for testing whether α is a super-key, but also for several other tasks, as
we will see later in thus section.
Let α be a set of attributes. We call the set of all attributes functionally
determined by α under a set F of functional dependencies the closure of α under F; we
denote it by α+. Figure below shows an algorithm written in pseudocode to compute α +.
The input is a set F of functional dependencies and the set α of attributes. The output
is stored in the variable result.

result := α
while (changes to result) do
for each functional dependency β→ γ in F do
begin
if β ≤ result then result := result U γ
end

To illustrate how the algorithm works, we shall use it to compute (AG)+ with the
functional dependencies defined in preceding section. We start with result = AG. The
first time that we execute the while loop to test functional dependency, we find that

95
BCA Sem-4 PAPER: BCAB2204T
• A→ B cause us to include B in result. To see fact, we observe that A→ B is in F,
A ≤ result (which is AG), so result := result U B.
• A→ C causes result to become ABCG.
• CG→ H causes result to become ABCGH.
• CG→ I causes result to become ABCGHI.
The second time that we execute the while loop, no new attributes are added to
result, and the algorithm terminates.
Let us see why the algorithm of Figure above is correct. The step is correct, since α→ α
always holds (by the reflexivity rule). We claim that for any subset β of result, α→β. Since we
start the while loop with α→ result being true, we can add γ to result only if β ≤ result and β→
γ. But then result → β by the reflexivity rule, so α→ β by transitivity. Another application of
transitivity shows that α→ γ (using α→ β and β→ γ). The union rule implies that α→ result U γ,
so functionally determines any new result generated in the while loop. Thus any attribute
returned by the algorithm is in α+.
It is easy to see that the algorithm finds all α+. If there is an attribute in α+ that is not
yet in result, then there must be a functional dependency β→ γ for which β ≤ result, and at
least one attribute in γ is not in result.
It turns out that, in the worst case, this algorithm may take an amount of time
quadratic in the size of F.
There are several uses of the attribute closure algorithm:
• To test if α is a super-key, we compute α+, and check if α+ contains all
attributes of R.
• We can check if a functional dependency α→ β holds (or, in other words,
is in F+) by checking if β  α+. That is we compute α+ by using attribute
closure and then check if it contains β. This test is particularly useful, as
we will see later in this chapter.
• It gives us an alternative way to compute F+: for each γ  R, we find the
closure γ+, and for each S  γ+, we output a functional dependency
γ → S.
1.7.3 Decomposition
The bad design of database suggests that we should decompose a relation schema that has
many attributes into several schemas with fewer attributes. Careless decomposition, however may
lead to another form of bad design.
Consider an alternative design in which we decompose Lending-schema into the
following two-schemas:
Branch-customer-schema = (branch-name, branch-city, assets, customer-name)
Customer-loan-schema = (customer-name, loan-number, amount)

96
BCA Sem-4 PAPER: BCAB2204T
Using the lending relation described in “Problems arising out of bad database
design” topic (Figure W) that we will discuss next, we construct our new relations
branch-customer (Branch-customer) and customer-loan (customer-loan-schema):
branch–customer = Π branch-name, branch-city, assets, customer-name (lending)
customer-loan = Π customer-name, loan-number, amount (lending)
Figure X and Y respectively show the resulting branch-customer and customer-
name relations.
Branch-name Branch-city Assets Customer-name
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Mianus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
Pownal Bennington 300000 Williams
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson
Perryridge Horseneck 1700000 Glenn
Brighton Brooklyn 7100000 Brooks
Figure X: The relation branch-customer

Customer-name Loan-number Amount

Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-14 1500
Jones L-93 500
Turner L-11 900
Williams L-29 1200
Hayes L-16 1300
Johnson L-18 2000
Glenn L-25 2500
Brooks L-10 2200
Figure Y: The relation customer-loan
Of course, there are cases in which we need to reconstruct the loan relation. For
example, suppose that we wish to find all branches that have loans with amounts less
than $1000. No relation in our alternative database contains these data. We need to
reconstruct the lending relation. It appears that we can do so by writing
branch-customer  customer-loan

97
BCA Sem-4 PAPER: BCAB2204T
Figure Z below shows the result of computing branch-customer  customer-loan.
Branch- Branch-city Assets Customer- Loan- Amount
name name number
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Perryridge Horseneck 1700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-17 1000
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-15 1500
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200

When we compare this relation and the lending relation with which we started
(Figure W), we notice a difference: Although every tuple that appears in the lending
relation appears in branch-customer  customer-loan, there are tuples in branch-
customer  customer-loan that are not in lending. In our example, branch-customer
 customer-loan has the following additional tuples:
(Downtown, Brooklyn, 9000000, Jones, L-93, 5000
(Perryridge, Horseneck, 1700000, Hayes L-16, 1300)
(Mianus, Horseneck, 400000 Jones, L-17, 1000)
(North Town, Rye 3700000, Hayes L-15, 1500)
Consider the query “Find all bank branches that have made a loan in an
amount less than $1000. If we look back at Figure W, we see that the only branches
with loan amounts less than $1000 are Mianus and Round Hill. However, when we
apply the expression :
Π branch-name (σ amount <1000 (branch-customer  customer-loan)
We obtain three branch names Mianus, Round Hill and Downtown.
A closer examination of this example shows why. If a customer happens to have
several loans from different branches, we cannot tell which loan belongs to which
branch. Thus when we join branch-customer and customer-loan, we obtain not only the
tuples we had originally in lending, but also several additional tuples. Although we
have more tuples in branch-customer  customer-loan, we actually have less information.

98
BCA Sem-4 PAPER: BCAB2204T
We are no longer able, in general, to represent in the database information about which
customers are borrowers from which branch. Because of this loss of information, we
call the decomposition of lending-schema into Branch-customer-schema and customer-
loam-schema a lossy decomposition, or a lossy-join decomposition. A decomposition
that is not a lossy join decomposition is a lossless join decomposition. It should be
clear from our example that a lossy-join decomposition is, in general a bad database
design.
Why is the decomposition lossy? There is one attribute in common between
branch customer-schema and customer-loan-schema:
Branch-customer-schema ∩ customer-loan-schema = {customer-name}
The only way that we can represent a relationship between, for example, loan number
and branch-name is through customer-name. This representation is not adequate because a
customer may have several loans, yet these loans are not necessarily obtained from the same
branch.
Let us consider another alternative design, in which we decompose Lending-
schema into the following two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
There is one attribute in common between these two schemas:
Branch-loan-schema ∩ customer-loan-schema = {branch-name}
Thus the only way that we can represent a relationship between for example
customer-name and asset is through branch-name. The difference between this example
and the preceding one is that the assets of a branch are the same, regardless of the
customer to which we are referring, whereas the lending branch associated with a
certain loan amount does depend on the customer to which we are referring. For a
given branch-name, there is exactly one assets value and exactly one branch-city;
whereas a similar statement cannot be made for customer-name. That is the functional
dependency
Branch-name→ {assets, branch-city}
holds, but customer-name does not functionally determine loan-number.
The notion of lossless joins is central to much of relational database design.
Therefore, we restate the preceding examples more concisely and more formally. Let r
be a relational schema. A set of relational schema {R1, R2, … , Rn} is a decomposition of
R if
R = R1 U R2 U … U Rn
That is {R1, R2, … , Rn} is a decomposition of R if, for i = 1,2,……n, each Ri is a subset
of R, and every attribute in R appears in at least one Ri.
Let r be a relation on schema r, and let ri = ΠRi (r) for i = 1,2….n . That is {r1, r2,
… , rn} is the database that results from decomposition of r into {R1, R2,……Rn} it is
always the case that

99
BCA Sem-4 PAPER: BCAB2204T
r ≤ r1 ∞ r 2 ∞ … ∞ r n
To see that this assertion is true consider a tuple t in relation r,. When we compute
the relations r1, r2,…rn the tuple t gives rise to one tuple ti in each ri, i = 1,2...n .These n
tuples combine to regenerate t when we compute r1 ∞ r2 ∞ … ∞ rn. The details are left for you
to complete as an exercise. Therefore every tuple in r appears in r1 ∞ r2 ∞ … ∞ rn.
In general r ≠ r1 ∞ r2 ∞ … ∞ rn. As an illustration, consider our earlier example in
which
• n=2
• R = Lending –schema.
• R1 = Branch-customer-schema.
• R2 = customer-loan-schema.
• r = the relation shown in Figure W.
• r1 = the relation shown in Figure X.
• r2 = the relation shown in Figure Y.
• r1 ∞ r2 = the relation shown in Figure Z.
Note that the relations in Figure W and Z are not the same.
To have a lossless-join decomposition, we need to impose constraints on the set
of possible relations. We found that the decomposition of Lending-schema into Branch-
schema and Loan-info-schema is lossless because the functional dependency.
branch-name → branch-city assets
holds on branch-schema We say that a relation is legal if it satisfies all rules, or
constraints that we impose on our database.
Let C represent a set of constraints on the database and let R be a relation
schema. A decomposition {R1, R2…….Rn} of R is a lossless join decomposition if for all
relations r on schema R that are legal under C,
r = ΠR1 (r) ∞ ΠR2 (r) ∞ … ∞ ΠRn (r)

Self-Check Exercise-II
Q3. What is transitive closure?
Ans…................................................................................................................
..…..................................................................................................................
…...................................................................................................................
Q4. Which algebric notation is used to list tuples??
Ans…................................................................................................................
….....................................................................................................................
….....................................................................................................................

100
BCA Sem-4 PAPER: BCAB2204T
[Link] Desirable Properties of Decomposition
We can use a given set of functional dependencies in designing a relational
database in which most of the undesirable properties discussed above do not occur.
When we design such systems, it may become necessary to decompose a relation into
several relations.
Lending-schema = (branch-name, branch-city, assets, customer-name, loan-
number, amount)
The set F of functional dependencies that we require to hold Lending schema
are : branch-name → {branch-city, assets}
loan-number → {amount, branch-name}
Lending-schema is an example of a bad database design. Assume that we
decompose it to the following three relations:
Branch-schema = (branch-name, branch-city, assets)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
We claim that this decomposition has several desirable properties, which we
discuss next.
Lossless-join decomposition
When we decompose a relation into a number of smaller relations, it is crucial
that the decomposition be lossless. We must first present a criterion for determining
whether decomposition is lossy.
Let R be a relation schema, and let F be a set of functional dependencies on R.
Let R1 and R2 form a decomposition of R. This decomposition is a lossless-join
decomposition of R if at least one of the following functional dependencies is in F+.

o R1 ∩ R2 → R1
o R1 ∩ R2 → R2
In other words if R1 ∩ R2 forms a super-key of either R1 or R2 the decomposition
of r is a lossless-join decomposition. We can use attribute closure to efficiently test for
super-keys as we have seen earlier.
We now demonstrate that our decomposition of Lending-schema is a lossless-
join decomposing Lending-schema into two schemas:
Branch-schemas = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since branch-name → {branch-name, assets} the augmentation rule for
functional dependencies implies that
Branch-name→ {branch-name, branch-city, assets}
Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our
initial decomposition is a lossless-join decomposition.
Next we decompose loan-info-schema into

101
BCA Sem-4 PAPER: BCAB2204T
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
This step results in a lossless-join decomposition since loan-number is a
common attribute and loan-number→ amount branch-name.
For the general case of decomposition of a relation into multiple parts at once
the test for lossless join decomposition is more complicated.
While the test for binary decomposition is clearly a sufficient condition for
lossless join, it is a necessary condition only if all constraints are functional
dependencies.
[Link] Dependency Preservation
There is another goal in relational database design: dependency preservation.
When an update is made to the database, the system should be able to check that the
update will not create an illegal relation-that is, one that does not satisfy all the given
functional dependencies. If we are to check updates efficiently, we should design
relational-database schemas that allow update validation without the computation of
joins.
To decide whether joins must be computed to check an update, we need to determine
what functional dependencies can be tested by checking each relation individually. Let F be
a set of functional dependencies on a schema R and let R1, R2…..Rn be a decomposition of R.
The restriction of F to Ri is the set Fi of all functional dependencies in F+ that include only
attributes of Ri. Since all functional dependencies in a restriction involve satisfaction of only
one relation schema, it is possible to test such a dependency for satisfaction by checking
only one relation.
Note that the definition of restriction uses all dependencies in F+, not just those in
F. For instance, suppose F= {A→B, B→C} and we have a decomposition into AC and AB.
The restriction of F to AC is then A→C, since A→C is in F+ even though it is not in F.
The set of restrictions F1, F2….Fn is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F’ =
F1 U F2 U …..U Fn. F’ is a set of functional dependencies on schema R but in general F
≠ F. However even if F’ ≠ F may be that F’+ = F+. If the latter is true, then every
dependency in F is logically implied by F’ and if we verify that F’ is satisfied we have
verified that F is satisfied. We say that a decomposition having the property F’+ = F+ is a
dependency preserving decomposition.
Figure V shows an algorithm for testing dependency preservation. The input is a set
D = {R1, R2….Rn} of decomposed relation schemas and a set F of functional dependencies.
This algorithm is expensive since it requires computation of F; we will describe another
algorithm that is more efficient after giving an example testing for dependency preservation.
compute F+;
for each schema Ri in D do
begin

102
BCA Sem-4 PAPER: BCAB2204T
Fi = the restriction of F+ to Ri
end
F’:=Φ
for each restriction Fi do
begin
F’ = F’ U Fi
end
compute F’+;
if (F’+ = F+) then return (true)
else return (false);
Figure V: Testing for dependency preservation

We can now show that our decomposition of Lending-schema is dependency

preserving. Instead of applying the algorithm of Figure V, we consider an easier
alternative; We consider each member of the set F of functional dependencies that we
require to hold on Lending-schema and show that each one can be tested in at least
one relation in the decomposition.
• We can test the functional dependency: branch-name → {branch-city
assets} using Branch-schema = (branch-name, branch-city, assets).
• We can test the functional dependency : loan-number → {amount branch-
name} using Loan-schema = (branch-name, loan-number, amount)
If each member of F can be tested on one of the relations of the decomposition,
then the decomposition is dependency preserving. However there are cases where even
though the decomposition is dependency preserving, there is a dependency in F that
cannot be tested in any one relation in the decomposition. The alternative test can
therefore be used as a sufficient condition that is checked. If it fails we cannot
conclude that the decomposition is not dependency preserving instead we will have to
apply the general test.
We now give a more efficient test for dependency preservation, which avoids
computing F+. The idea is to each functional dependency α → β in F by using a
modified form of attribute closure to see if it is preserved by the decomposition. We
apply the following procedure to each → in F.
result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩ Ri)+ ∩ Ri
result = result U t
The attribute closure is with respect to the functional dependencies in F. If
result contains all attribute in β then the functional dependency α → β is preserved.
The decompositions is dependency preserving if and only if all the dependencies in F
are preserved.

103
BCA Sem-4 PAPER: BCAB2204T
Note that instead of precomputing the restriction of F on Ri and using it for
computing the attribute closure of result, we use attribute closure on (result ∩ Ri) with
respect to F, and then intersect it with Ri, to get an equivalent result. This procedure
takes polynomial time, instead of the exponential time required to compute F+.
[Link] Repetition of Information
The decomposition of Lending-schema does not suffer from the problem of
repetition of information that we will discuss in section about Bad Database Design. In
Lending-schema, it was necessary to repeat the city and assets of a branch for each
loan. The decomposition separates branch and loan data into distinct relations,
thereby eliminating this redundancy. Similarly observe that, if a single loan is made to
several customers, we must repeat the amount of the loan once for each customer (as
well as the city and assets of the branch) in Lending-schema. In the decomposition, the
relation on schema Borrower-schema contains the loan-number, customer-name
relationship and not other schema does. Therefore we have one tuple for each
customer for a loan in only the relation on Borrower-schema. In the other relational
involving loan-number (those on schemas Loan-schema and Borrower-schema) only one
tuple per loan needs to appear.
1.7.4 Problems arising out of bad database design (Pitfalls in Relational-Database
design)
Let us look at what can go wrong in a bad database design. Among the
undesirable properties that a bad design may have are:
• Repetition of information
• Inability to represent certain information.
We shall discuss these problems with the help of a modified database design for
our banking example: Suppose the information concerning loans is kept in one single
relation, lending which is defined over the relation schema
Lending-schema = (branch-name, branch-city, assets, customer-
name, loan-number, amount)
Figure below shows an instance of the relation lending (Lending-schema). A tuple t in
the lending relation has the following intuitive meaning:
• t[assets] is the asset figure for the branch named t[branch-name]
• t[branch-city] is the city which the branch named t[branch-name] is
located
• t[loan-number] is the number assigned to a loan made by the branch
named t[branch-name] to the customer named t[customer-name]
• t[amount] is the amount of the loan whose number is t[loan-number]
Suppose that we wish to add a new loan to our database. Say that the loan is
made by the Perryridge branch to Adams in the amount of $1500. Let the loan-number
be L-31. In our design, we need a tuple with values on all the attributes of Lending

104
BCA Sem-4 PAPER: BCAB2204T
schema. Thus we must repeat the asset and city data for the Perryridge branch, and
must add the tuple
(Perryridge, Horseneck, 1700000, Adams, L-31, 1500)
to the lending relation. In general, the asset and city data for a branch must appear
once for each loan made by that branch.
Branch-name Branch-city Assets Customer-name Loan- Amount
number
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
Figure W: Sample lending relation
The repetition of information in our alternative design is undesirable. Repeating
information wastes space. Furthermore it complicates updating the database. Suppose
for example, that the assets of the Perryridge branch change from 1700000 to
1900000. Under our original design, one tuple of the branch relation need to be
changed. Under our alternative design many tuples of the lending relation need to be
changed. Thus updates are more costly under the alternative design than under the
original design. When we perform the update in the alternative design database, we
must ensure that every tuple pertaining to the Perryridge branch is updated, or else
our database will show two different asset values for the Perryridge branch.
That observation is central to understanding why the alternative design is bad.
We know that a bank branch has a unique value of assets, so given a branch name we
can uniquely identify the assets value. On the other hand, we know that a branch may
make many loans, so given a branch name, we cannot uniquely determine a loan
number. In other words, we say that the functional dependency.
branch-name → assets
Holds on Lending-schema, but we do not expect the functional dependency
branch-name → loan- number to hold. The fact that a branch has particular value of
assets and the fact that a branch makes a loan are independent, and, as we have seen,
these facts are best represented in separate relations. We shall see that we can use
functional dependencies to specify formally when a database design is good.

105
BCA Sem-4 PAPER: BCAB2204T
Another problem with the Lending-schema design is that we cannot represent
directly the information concerning a branch 9branch-name, branch-city, assets)
unless there exists at least one loan at the branch. This is because tuples in the
lending relation require value for loan-number, amount and customer-name.
One solution to this problem is to introduce null values, as we did to handle updates
through views. However, Null values are difficult to handle. If we are not willing to deal with Null
values, then we can create the branch information only when the first loan application at that
branch is made. Worse, we would have to delete this information when all the loans have been
paid. Clearly, this situation is undesirable, since, under our original database design, the branch
information would be available regardless of whether or not loans are currently maintained in
the branch, and without restoring to null values.
1.7.5 Summary
Functional dependencies play a key role in differentiating good database designs
from bad database design. An attribute Y of a relation R is said to be functionally
dependent upon attribute X of relation R if and only if for each value of X in R has
associated with it only one of Y in R at any given time. It is represented by as X-> Y,
where X attributes is known as determinant and Y is known as determined. Using the
concept of Functional Dependencies we decompose the relations. The bad design of
database suggests that we should decompose a relation schema that has many
attributes into several schemas with fewer attributes. Careless decomposition, however
may lead to another form of bad design. When we decompose a relation into a number
of smaller relations, it is crucial that the decomposition be lossless.
1.7.6 Keywords
Determinant: In a functional dependency X -> Y, X is referred to as the determinant.
The determinant uniquely determines the values of the dependent attribute Y.
Dependent: In a functional dependency X -> Y, Y is referred to as the dependent. The
values of Y depend on the values of the determinant X.
Transitivity: It refers to a property of functional dependencies. It describes how
dependencies can be inferred or propagated through a set of functional dependencies.
1.7.7 Short Answer Type Questions
Q1. Decompose relation employee(ID,name,street,Credit,street,city,salary) .
Q2. How can we combine two functional dependecies:
a) A->BC b) A->B
Q3. How many superkeys are in the given relation R(A,B,C,D,E) with the
following functional dependecies:
a) ABC -> DE and b) D -> AB
1.7.8 Long Answer Type Questions
Q1. What do you mean by Functional Dependency? What is its importance in
Database design? Explain with example.

106
BCA Sem-4 PAPER: BCAB2204T
Q2. Why we need decomposition? What is its need? What are the steps involved
in decomposing the relations.
Q3. What are the various problems that arise due to bad database design?
1.7.9 Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.

107
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.8 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

NORMALIZATION

1.8.0 Objectives
1.8.1 Introduction
1.8.2 Normalization
1.8.3 First Normal Form
1.8.4 Second Normal Form
1.8.5 Third Normal Form
1.8.6 Boyce-Codd Normal Form
1.8.7 Multi-valued Dependency
1.8.8 Fourth Normal Form
1.8.9 Join Dependencies and Fifth Normal Form
1.8.10 Database Design Process
1.8.11 Summary
1.8.12 Keywords
1.8.13 Short Answer Type Questions
1.8.14 Long Answer Type Questions
1.8.15 Suggested Readings

1.8.0 Objectives
After completing this lesson, you will be able to:
• Define Normalization, its need and importance
• Various types of Normal Forms
• Define Multivalued Functional Dependency
• Understand database design process
1.8.1 Introduction:
In this lesson, we will discuss the normalization process and define the first
three normal forms for relation schemas. The definitions of second and third normal
form presented here are based on the functional dependencies and primary keys of a
relation schema. More general definitions of these normal forms, which take into
account all candidate keys of a relation rather than just the primary key, are also
presented. We also define Boyce-Codd Normal Form (BCNF), and further normal
forms that are based on other types of data dependencies. We first informally discuss
what normal forms are and what the motivation behind their development was. We
then present first normal form (1NF). Then we present definitions of second normal

108
BCA Sem-4 PAPER: BCAB2204T
form (2NF) and third normal form (3NF) respectively that are based on primary keys.
Then we will proceed for multivalued dependency and further the fourth and fifth
Normal Forms that are based on MVDs. In the last we will discuss about the
database design process.

1.8.2 Normalization
The normalization process as first proposed by Codd (1972) takes a relation
schema through a series of tests to “certify” whether or not, it belongs to a certain
normal form. Initially Codd proposed three normal forms, which he called first,
second and third normal form. A stronger definition of 3NF was proposed later by
Boyce and Codd and is known as Boyce-Codd normal form. All these normal forms
are based on the functional dependencies among the attributes of a relation. Later
fourth normal form (4NF) and a fifth normal forms (5NF) were proposed, based on the
concepts of multi-valued dependencies and join dependencies, respectively.
Normalization of data can be looked on as a process during which unsatisfactory
relation schemas are decomposed by breaking up their attributes into smaller
relation schemas that possess desirable properties. One objective of the original
normalization process forms provides database designers with:
• A formal framework for analyzing relation schemas based on their keys and on
the functional dependencies among their attributes.
• A series of tests that can be carried out on individual relation schemas
so that the relational database can be normalized to any degree. When
a test fails, the relation violating that test must be decomposed into
relations that individually meet the normalization tests.
• To free relations from undesirable insertion, deletion and update
anomalies.
Normal forms, when considered in isolation from other factors, do not guarantee a
good database design. It is generally not sufficient to check separately that each relation
schema in the database is, say, in BCNF or 3NF. Rather, the process of normalization
through decomposition must also confirm the existence of additional properties that the
relation schemas, taken together, should possess. Two of these properties are:
• The loss less join or no additive join property, which guarantees that
the spurious tuple problem does not occur
• The dependency preservation property, which ensures that all
functional dependencies are represented in some of the individual
resulting relations.
In this section we concentrate on an intuitive discussion of the normalization
process. Notice that the normal forms mentioned in this section are not only the possible
ones. Additional normal forms may be defined to meet other desirable criteria, based on
additional types of constraints. The normal forms up to BCNF are defined by considering
only the functional dependency and key constraints, whereas 4NF considers an additional
constraint called a multi-valued dependency and 5NF considers an additional constraint
called a join dependency. The practical utility of normal forms becomes questionable when

109
BCA Sem-4 PAPER: BCAB2204T
the constraints on which they are based are hard to understand or to detect by the
database designers and users who must discover these constraints.
Another point worth noting is that the database designers need not normalize
to the highest possible normal form. Relations may be left in lower normal forms for
performance reasons.
Before proceedings further, we recall the definitions of keys of a relation
schema. A super key of a relation schema R = {A1, A2,…………, An} is a set of
attributes S  (sub set of) R with the property that no two tuples t1 and t2 in any
legal relation state r of R will have t1[S] = t2[S]. A key K is a super-key with the
additional property that removal of any attribute from K will cause K not to be a
super-key any more. The difference between a key and super key is that a key has to
be “minimal” that is, if we have a key K = {A1, A2……., Ak} then K – A is not a key for
1<=i<=k. In figure given below {SSN} is a key for EMPLOYEE, whereas {SSN}, {SSN,
ENAME}, {SSN, ENAME, BDATE} etc. are all super keys.

EMPLOYEE
f.k.
ENAME SSN BDATE ADDRESS DNUMBER

p.k.

If relation schema has more than one “minimal” key, each is called a
candidate key. One of the candidates keys is arbitrarily designated to be the primary
key, In figure above {SSN} is the only candidates key for EMPLOYEE, so it is also the
primary key.
An attribute of relation schema R is called a prime attribute of R if it is a
member of any key of R. An attribute is called nonprime if it is not a prime attribute-
that is, if it is not a member of any candidate key.
We now present the first three normal forms: 1NF, 2NF and 3NF. These were
proposed by Codd (1972) as a sequence to ultimately achieve the desirable state of
3NF relations by progressing through the intermediate states of 1NF and 2NF if
needed.
1.8.3 First Normal Form (1 NF)
First normal form is now considered to be part of the formal definition of a
relation; historically, it was defined to disallow multi-valued attributes, composite
attributes, and their combinations. It states that the domains of attributes must
include only atomic (simple, indivisible) values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute.
Hence, 1NF disallows having a set of values, a tuple of values or a combination of
both as an attribute value for a single tuple. In other words, 1NF disallows “relations
within relations” or “relations as attributes of tuples”. The only attribute values
permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT relation schema shown in following figure, whose
primary key is DNUMBER, and suppose that we extend it by including the

110
BCA Sem-4 PAPER: BCAB2204T
DLOCATIONS attribute shown within dotted lines. We assume that each department
can have a number of locations. The DEPARTMENT schema and example extension
are shown in Figures that follow. As we can see, this is not in 1NF because
DLOCATIONS is not an atomic attribute, as illustrated by the first tuple in Figure (b).
There are two ways we can look at the DLOCATIONS attribute:
• The domain of DLOCATIONS contains atomic values, but some tuple
can have a set of these values. In this case, DNUMBER*→
DLOCATIONS.
• The domain of DLOCATIONS contains sets of values and hence in
monatomic. In this case, DNUMBER→DLOCATIONS, because each set is
considered a single member of the attribute domain (In this case we can
consider the domain of DLOCATIONS to e the power set of single locations;
that is, the domain is made up of all possible subsets of the set of single
locations).

a)
DEPARTMENT
DNAME DNUMBER DMGRSSN DLOCATIONS

b)
DEPARATMENT
DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 {Bellaire, Sugarland,

Houston}
Administration 4 987654321 {Stafford}
Headquarters 1 888665555 {Houston}

c) DEPARTMENT
DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 Bellaire

Research 5 333445555 Sugarland
Research 5 333445555 Houston
Administration 4 987654321 {Stafford}
Headquarters 1 888665555 {Houston}

Figure showing Normalization into 1NF. (a) A relation schema that is not in 1NF. (b)
Example relation instance. (c) 1NF relation with redundancy.

111
BCA Sem-4 PAPER: BCAB2204T
In either case, the DEPARTMENT relation of figures above is not in 1NF; in
fact, it does not even qualify as a relation, we break up its attributes into the two
relations DEPARTMENT and DEPT_LOCATIONS shown in Figure here:

DEPARTMENT DEPT_LOCATIONS

DNAME DNUMBER DMGRSSN DNUMBER DLOCATIONS

The idea is to remove the attribute DLOCATIONS that violates 1NF and place
it in a separate relation DEPT_LOCATIONS along with the primary key DNUMBER of
DEPARTMENT. The primary key of this relation is the combination {DNUMBER,
DLOCATION}, as shown in Figure above. A distinct tuple in DEPT_LOCATIONS exists
for each location of a department. The DLOCATIONS attribute is removed from the
DEPARTMENT relation of Figure showing the normalization into 1NF, decomposing
the non-1NF relation into two 1NF relations DEPARTMENT and DEPT_DLOCATIONS
of Figure above.
Notice that a second way to normalize into 1NF is to have a tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in Figure (c). In
this case, the primary key becomes the combination {DNUMBER, DLOCATION}, and
redundancy exists in the tuples. The first solution is superior because it does not suffer
from this redundancy problem. In fact, if we choose the second solution, it will be
decomposed further during subsequent normalization steps into the first solution.
The first normal form also disallows composite attribute that are themselves
multi-valued. These are called nested relations because each tuple can have a
relation within it. Figure A below shows how an EMP_PROJ relation can be shown if
nesting is allowed. Each tuple represents an employee entity, and a relation PROJS
(PNUMBER, HOURS} within each tuple represents the employee’s projects and the
hours per week that the employee works on each project. The schema of the
EMP_PROJ relation can be represented as follows:
EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})

Self-Check Exercise-I
Q1: What are the challenges in non-normalized data handling?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
Q2. Who proposed the concept of Normalization?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................

112
BCA Sem-4 PAPER: BCAB2204T
The set braces {} identify the attribute PROJS as multi-valued, and we list the component
attribute that form PROJS between parentheses (). Interestingly, recent research into the
relational model is attempting to allow and formalize nested relations, which were
disallowed early on by 1NF.
Notice that SSN is the primary key of the EMP_PROJ relation in Figure A(a)
and (b), while PNUMBER is the partial primary key of each nested relation; that is,
within each tuple, the nested relation attributes into a new relation and propagate
the primary key into; the primary key of the new relation will combine the partial key
with the primary key of the original relation. Decomposition and primary key
propagation yield the schemas shown in Figure A(c).
Here is the figure A:
a)
EMP_PROJ
SSN ENAME PROJS PNUMBER HOURS

b) EMP_PRO
J
SSN ENAME PNUMBER HOURS

123456789 Smith, John

B. 1 32.5 2 7.5 666884444 Narayan, Ramesh
K. 3 40.0 453453453 English, Joyce
A. 1 20.0 2 20.5 333445555 Wong, Franklin
T. 2 10.5 3 10.5 10 10.5 20 10.5

c) EMP_PROJ1

SSN ENAME

EMP_PROJ2
SSN PNUMBER HOURS

This procedure can be applied recursively to a relation with multi-valued level

nesting to unnest the relation into a set of 1NF relations. This is useful in converting
hierarchical schemas into 1NF relations. As we shall see in the coming topics,

113
BCA Sem-4 PAPER: BCAB2204T
restricting relations to 1NF leads to the problems associated with multi-valued
dependencies and 4NF.
1.8.4 Second Normal Form (2NF)
Second Normal form is based on the concept of full functional dependency. A
functional dependency X→Y is a full functional dependency if removal of any
attribute a from X means that the dependency does not hold any more; that is, for
any attribute A ε X,
(X-{A}) *→ Y. A functional dependency X→Y is a partial dependency if some attribute
A ε X can be removed from X and the dependency still holds; that is for some A ε X,
(X – {A})→Y. In figure below, {SSN, PNUMBER} → HOURS is a full dependency (neither
SSN →HOURS nor PNUMBER→HOURS holds). However, the dependency {SSN
PNUMBER}→ENAME is partial because SSN→ENAME holds.
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATION

fd1

fd2

fd3

A relation schema R is in 2NF if every nonprime attribute A in R is fully

functionally dependent on the primary key of R. The EMP_PROJ relation in figure
above is in 1NF but is not in 2NF. The nonprime attribute ENAME violates 2NF
because of fd2, as do the nonprime attribute PNAME and PLOCATION because of fd3.
The functional dependencies fd2 and fd3 make ENAME, PNAME and PLOCATION
partially dependent on the primary key {SSN, PNUMBER} of EMP_PROJ thus violating
2NF.
If a relation schema is not in 2NF it can be further normalized into a number
of 2NF relations in which nonprime attributes are associated only with the part of the
primary key on which they are fully functionally dependent. The functional
dependencies fd1, fd2 and fd3 in Figure above hence lead to the decomposition of
EMP_PROJ into the three relation schemas EP1, EP2 and EP3 shown in Figure B
each of which is in 2NF. We can see that the relations Ep1, Ep2 and EP3 are devoid
of the update anomalies from which EMP_PROJ of Figure above suffers.

Figure B:

114
BCA Sem-4 PAPER: BCAB2204T

a)
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATION

fd1

fd2

fd3

2NF Normalization

EP1 EP2
SSN PNUMBER HOURS SSN ENAME

fd1 fd2

EP3 PNUMBER PNAME PLOCATION

fd3

1.8.5 Third Normal Form (3NF)

Third Normal form is based on the concept of transitive dependency. A
functional dependency X →Y in a relation schema R is a transitive dependency if
there is a set of attributes Z that is not a subset of any key of R, and both X→Z and
Z→Y hold. The dependency SSN→DMGRSSN is transitive through DNUMBER in
EMP_DEPT of Figure here:
EMP_DEPT

ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN

DNUMBER is not a subset of the key of EMP_DEPT. Intuitively; we can see

that dependency of DMGRSSN on DNUMBER is undesirable in EMP_DEPT since
DNUMBER is not a key of EMP_DEPT.
According to Codd’s original definition, a relation schema R is in 3NF if it
is in 2NF and no nonprime attribute of R is transitively dependent on the
primary key. The relation schema EMP_DEPT in Figure above is in 2NF since no
partial dependencies on a key exist. However EMP_DEPT is not in 3NF because of the
transitive dependency of DMGRSSN (and also DNAME) on SSN via DNUMBER. We
can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1
and Ed2 shown in Figure below:

115
BCA Sem-4 PAPER: BCAB2204T

EMP_DEPT

ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN

3NF Normalization

ED1
ENAME SSN BDATE ADDRESS DNUMBER

ED2
DNUMBER DNAME DMGRSSN

Intuitively, we see that Ed1 and Ed2 represent independent entity facts about
employees and departments. A NATURAL JOIN operation on ED1 and ED2 will
recover the original relation EMP_DEPT without generating spurious tuples.
1.8.6 Boyce-Codd Normal Form (BCNF)
Boyce-Codd normal form is stricter than 3NF, meaning that every relation in
BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF,
intuitively, we can see the need for a stronger normal form than 3NF by going back to
the LOTS relation schema of Figure below with its four functional dependencies fd1
through fd4.

LOTS
PROPERTY_ID# COUNTY_NAME LOT# AREA PRICE TAX_RATE

fd1

fd2

fd3

fd4

Suppose that we have thousands of lots in the relation but the lots are from
only two counties; Marion County and Liberty County. Suppose also that lot size in
Marion County are only 0.5, 0.6, 0.7, 0.8, l 0.9, and 2.0 acres. In such a situation we

116
BCA Sem-4 PAPER: BCAB2204T
should have the additional functional dependency fd5; AREA → COUNTY_NAME. If
we add this to the other dependencies, the relation schema LOTS1A still is in 3NF
because COUNTY_NAME is a prime attribute.
The area versus county relationship represented by fd5 can be represented by
16 tuples in a separate R (AREA, COUNTY_NAME) since there are only 16 possible
AREA values. This representation reduces the redundancy of repeating the same
information in the thousands of LOTS1A tuples. BCNF is a stranger normal form that
would disallow LOTS1A and suggest the need for decomposing it.
This definition of Boyce-Codd differs slightly from the definition of 3NF. A
relation schema R is in BCNF if whenever a functional dependency X → A holds
in R, then X is a super-key of R. The only difference between BCNF and 3NF is that
condition (b) of 3NF, which allows A to be prime if X is not a super-key, is absent
from BCNF.
In our example, fd5 violates BCNF in LOTS1A because AREA is not a super-
key of LOTS1A. Note that fd5 satisfies 3NF LOTS1A because COUNTY_NAME is a
prime attribute (Condition (b)), but this condition does not exist in the definition of
BCNF. We can decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY,
shown in Figure C(a).
In practice most relation schema that are in 3NF are also in BCNF. Only if a
dependency X  A exists in a relation schema R with X not a super-key and A a prime
attribute will R be in 3NF but not in BCNF. The relation schema R shown in Figure
C(b) illustrates the general case of such a relation.
It is best to have relation schemas in BCNF, if that is not possible, 3NF will do.
However, 2NF and 1 NF are not considered good relation schema designs. These
normal forms were developed historically as stepping stones to 3NF and BCNF.
Here is Figure C:

(a) LOTS1A
PROPERTY_ID# COUNTY_NAME LOT# AREA

fd1

fd2

fd3

BCNF Normalization

LOTS1AX LOTS1AY
PROPERTY_ID# AREA LOT# AREA COUNTY_NAME

117
BCA Sem-4 PAPER: BCAB2204T

A B C

fd1

fd2

Self-Check Exercise-II
Q3. How to identify if the data is in which normal form?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................
Q4. Does normalization reduce the no of tables?
Ans….............................................................................................................
…..................................................................................................................
…...................................................................................................................

1.8.7 Multi-valued Dependencies

Multi-valued dependencies are a consequence of first normal form, which
disallowed an attribute in a tuple to have a set of values. If we have two or more
multi-valued independent attributes in the same relation schema, we get into a
problem of having to repeat every value of one the attribute with every value of the
other attribute to keep the relation instance consistent. This constraint is specified
by a multi-valued dependency.
For example, consider the relation EMP shown in Figure D (a). A tuple in this
EMP relation represents the fact that an employee whose name is ENAME works on
the project whose name is PNAME and has a dependent whose name is DNAME. An
employee may work on several projects and may have several dependents, and the
employees project and dependents are not directly related to one another. To keep the
tuples in the relation consistent, we must keep a tuple to represent every
combination of an employee’s dependent and an employee’s project. This constraint
is specified as a multi-valued dependency on the EMP relation. Informally, whenever
two independent 1: N relationships A: B and A: C are mixed on the same relation, an
MVD may arise.
Figure D:

118
BCA Sem-4 PAPER: BCAB2204T

(a) EMP
ENAME PNAME DNAME

Smith X John Smith Y Anna Smith X Anna Smith Y John

(b) EMP_PROJECTS EMP_DEPENDENTS

ENAME PNAME ENAME DNAME

Smith X Smith Y Smith John Smith Anna

Smith Bolt ProjX Smith Nut ProjY Adamsky Bolt ProjY

Walton Nut ProjZ Adamsky Nail ProjX Adamsky Bolt Pr
ojX Smith Bolt ProjY

(d) R1 R2 R3
SNAME PARTNAME SNAME PROJNAME PARTNAME PROJNAME

Smith Bolt Smith Smith ProjX Smith Bolt ProjX Nut ProjY
Nut Adamsky Bolt ProjY Adamsky Proj Bolt ProjY Nut ProjZ
Walton Nut Ada Y Walton ProjZ A Nail ProjX
msky Nail damsky ProjX

Formal Definition of Multi-valued Dependency

Formally a multi-valued dependency (MVD) X→→Y specified on relation
schema R where X and Y are both subsets of R, specifies the following constraint on
any relation r of R. If two tuples t1 and t2 exist in r such that t1[X] = t2[X] then two
tuples t3 and t4 should also exist in r with the following properties:

119
BCA Sem-4 PAPER: BCAB2204T
• t3 [X] = t4[X]=t1[X]=t2[X]
• t3[4]=t1 [Y] and t4[Y]=t2[Y]
• t3[R-(XY)] = t2[R-(XY)] and t4[R-(XY)] = t1[R-(XY)].
Whenever X→→Y holds, we say that X multi-determines Y. Because of the
symmetry in the definition, whenever X→→Y holds in R, so does X→→ (R-(XY)).
Recall that (R-XY) is the same as R-(X U Y) = Z. Hence X→→Y implies X→→Z and
therefore it is sometimes written as X→→Y/Z.
The formal definition specifies that given a particular value of X, the set
of values of Y determined by this value of X is completely determined by X
alone and does not depend on the values of the remaining attributes Z of the
relation schema R. Hence whenever two tuples exist that have distinct values of Y
but the same value of X these values of Y must be related with every distinct value of
Z that occurs with that same value of X. This informally corresponds to Y being a
multi-valued attribute of the entities represented by tuple in R.
In Figure D (a) the MVDs ENAME→→PNAME and ENAME→→DNAME or
ENAME→→PNAME/DNAME hold in the EMP relation. The employee with ENAME
Smith works on project with PNAME ‘X’ and ‘Y’ and has two dependents with DNAME
John and ‘Anna’. If we stored only the first two tuples in EMP (< Smith’, ‘X’, ‘John’>
and <Smith’, ‘Y’ ‘Anna’> and <Smith’, ‘Y’, ‘John’>) to show that {‘X’, ‘Y’} and {John’,
‘Anna} are associated only ‘Smith’ that is there is no association between PNAME and
DNAME.
An MVD X→→Y in R is called a trivial MVD if (a) Y is a subset of X or (b) X U
Y=R, for example the relation EMP_PROJECTS in Figure D (b) has the trivial MVD
ENAME→→PNAME. An MVD that satisfies neither (a) nor (b) is called a nontrivial
MVD. A trivial MVD will hold in any relation instance r of R, it is called trivial
because it does not specify any constraint on R.
If we have a nontrivial MVD in relation, we may have to repeat values
redundantly in the tuples. In the EMP relation of Figure the values ‘X’ and ‘Y’ of
PNAME are repeated with each value of DNAME (or by symmetry, the values ‘John’
and ‘Anna’ of DNAME are repeated with each value of PNAME). This redundancy is
clearly undesirable. However; the EMP schema is in BCNF because no functional
dependencies hold in EMP. Therefore, we need to define a fourth normal form that is
stronger than BCNF and disallows relation sch4emas such as EMP. We first discuss
some of the properties of MVDs and consider how they are related to functional
dependencies.
1.8.8 Fourth Normal Form
We now present the definition of 4NF which is violated when a relation has
undesirable multi-valued dependencies, and hence can be used to identify and
decompose such relations. A relation schema R is in 4NF respect to a set of
dependencies F if for every nontrivial multi-valued dependency X→→Y in F+, X
is a super-key for R.
The EMP relation of Figure D (a) is not 4NF because in the nontrivial MVDs
ENAME→→PNAME and ENAME →→ DNAME, ENAME is not a super-key of EMP. We

120
BCA Sem-4 PAPER: BCAB2204T
decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS shown in Figure D(b).
Both EMP PROJECTS and EMP_DEPENDENTS are in 4NF, because
ENAME→→PNAME is a trivial MVD in EMP PROJECTS and ENAME→→DNAME is a
trivial MVD in EMP_DEPENDENTS. In fact no nontrivial MVDS hold in either
EMP_PROJECTS or EMP_DEPENDENTS. No FDs hold in these relation schemas
either.
To illustrate why it is important to keep relations in 4NF, Figure E(a) shows
the EMP relation with an additional employee Brown who has three dependents (‘Jim’
‘Joan’, and ‘Bob) and works on four different projects (‘W’, ‘X’, ‘Y’) and ‘Z’). There are
16 tuples in EMP in figure E(a). I few decompose EMP into EMP_PROJECTS and
EMP_DEPENDENTS as shown in Figure E(b) we need only store a total of 11 tuples in
both relations. More ever these tuples are much smaller than the tuples in EMP. In
addition the update anomalies associated with multi-valued dependencies are
avoided. For example, if Brown starts working on another project, we must insert
three tuples in EMP – one for each dependent. If we forgot to insert any one of those,
the relation becomes inconsistent in that is incorrectly implies a relationship between
project and dependent. However only a single tuple need be inserted in the 4NF
relation EMP_PROJECTS. Similar problems occur with deletion and modification
anomalies if a relation is not in 4NF.
The EMP relation in Figure D (a) is not in 4NF because it represents two independent 1: N
relationships—one between employees and the projects they work on and the other
between employees and their dependents. We sometimes have a relationship between
three entities that depends on all three participating entities, such as the SUPPLY relation
shown in Figure D (c) (Consider only the tuple in Figure D(c) above the dotted line for now).
In this case a tuple represents a supplier supplying a specific part to a particular project,
so there are no nontrivial MVDs. The SUPPLY relation is already in 4NF and should not be
decomposed. Notice that relations containing nontrivial MVDs tend to be all key relations;
that is, their key is all their attributes taken together.

Figure E:

121
BCA Sem-4 PAPER: BCAB2204T

(a) EMP (b) EMP_PROJECTS

ENAME PNAME DNAME ENAME PNAME

Smith X John Smith Y Ann Smith X Smith Y B

a Smith X Anna Smith Y rown W Brown X B
John Brown W Jim Brown
rown Y Brown Z
X Jim Brown Y Jim Bro
wn Z Jim Brown W Joan
Brown X Joan Brown Y Joa
n Brown Z Joan Brown W
Bob Brown X Bob Brown
Y Bob Brown Z Bob

EMP_DEPENDENTS
ENAME DNAME

Smith John Smith A

nna Brown Jim Bro
wn Joan Brown Bob

Lossless Join Decomposition into 4NF Relations

Whenever we decomposed a relation schema R into R1 = (X U Y) and R2 = (R – Y) based
on an MVD X →→ Y that holds in R, the decomposition has the lossless join property. It can
be shown that this is a necessary and sufficient condition for decomposing a schema into two
schemas that have the lossless join property as given by property.
PROPERTY Lj1’
The relation schemas R1 and R2 form a lossless join decomposition of R if and
only if (R1 ∩ R2) →→ (R1 - R2) (or by symmetry, if and only if (R1 ∩ R2) →→ (R2 –
R1).
This is similar to property Lj1 of Section 12.1.3 except that Lj1 dealt with FDs only,
whereas Lj1’ deals with both FDs and MVDs. We can use algorithm below which
creates lossless join decomposition into relation schemas that are in 4NF(rather than
in BCNF). Algorithm below does not necessarily produce a decomposition that
preserves FDs.
ALGORITHM Lossless join decomposition into 4NF relations
Set D :={ R};
While there is a relation schemas Q in D that is not in 4NF do
begin
Choose a relation schema Q in D that is not in 4Nf;
Find a nontrivial MVD X→→Y in Q that violates 4NF;
Replace Q in D by two schemas (Q-Y) and (X U Y)
End;

122
BCA Sem-4 PAPER: BCAB2204T
1.8.9 Join Dependencies and Fifth Normal Form
We saw that Lj1 and Lj1’ give the condition for a relation schemas R to be
decomposed into two schemas R1 and R2 where the decompositions has the lossless
join property. However in some cases there may be no lossless join decomposition
into two relation schemas but there may be a lossless join decomposition into more
than two relation schemas. These cases are handled by join dependency and fifth
normal form. It is important to note that these cases occur very rarely and are
difficult to detect in practice.
A join dependency (JD) denoted by JD (R1, R2……….Rn) specified on relation
schema R, specifies a constraint on instances r of R. The constraint states that every
legal instance r of R should have a lossless join decomposition into R1, R2 ….. Rn that
is,
* ( Π<R1>(r), Π<R2>(r), …, Π<Rn>(r) ) = r
Notice that a MVD is a special case of a JD where n=2. A join dependency
JD(R1 R2,………Rn) specified on relation schema R, is a trivial if one of the relation
schemas Ri in JD(R1, R2…….Rn) is equal to R. Such dependency is called trivial
because it has the lossless join property for any relation instance r of R and hence
does not specify any constraint on R. We can now specify fifth normal form, which is
also called project join normal form. A relation schema R is in fifth normal form (5NF)
(or project join normal form (PJNF)) with respect to a set functional multi-valued and
join dependencies if for every nontrivial join dependency JD (R1, R2……..Rn) in F+
(that is implied by F) every R, is a super-key of R.
For an example of a consider once again the SUPPLY relation of Figure D (c). If it
does not have a lossless decomposition into any number of smaller tables. Suppose that
the following additional constraint always holds: Whenever a supplier supplies part p and
a project j uses part p and the supplies at least one part to project j, then supplier will also
be supplying part p to project j. This constraint can be restated in other ways and specifies
a join dependency JD (R1, R2, R3) among the three projections R (SNAME, PARTNAME),
R2 (SNAME, PROJNAME) and R3 (PARTNAME, PROJNAME) of supply. If this constraint
holds the tuples below the dotted line in Figure D (c) must exist in any legal instance of the
SUPPLY relation with the join dependency is decomposed into three relations R1, R2 and
R3 that are each in 5NF. Notice that applying NATURAL JOIN to any two of these relations
produces spurious tuples, but applying NATURAL JOIN to all three together does not. The
reader should verify this on the example relation of Figure D(c) and its projections in
Figure D(d). This is because only the JD exists but no MVDS are specified. Notice too that
the JD (R1, R2, R3) is specified on all legal relation instance not just on the one shown in
Figure D(c).
Discovering JDs in practical data based with hundreds of attributes is
difficult; hence current practice of data base design pays scant attention to them.
1.8.10 Overall Database design process
In Normalization we have assumed that we have a schema R, and proceeded
to normalize it. There are several ways in which we could have come up with the
schema R:

123
BCA Sem-4 PAPER: BCAB2204T
 R could have been generated when converting an E-R Diagram to a set
of tables.
 R could have been a single relation containing all the attributes that
are of interest. The normalization process breaks up R into smaller
relations.
 R could have been the result of some ad hoc design of relations, which we then
test to verify that it satisfies a desired normal form.
No we examine the implications of these approaches and also the practical
issues in database design, including de-normalization for performance and example
of bad database design not detected by normalization.
E-R model and Normalization
We carefully define an E-R Diagram, identifying all entities correctly; the tables
generated from the E-R diagram should not need further normalization. However, there
can be functional dependencies between the attributes of an entity. For instance, suppose
an employee entity had attributes department-number and department-address, and there
is a functional dependency department-number → department-address. We would then
need to normalize the relation generated from employee.
Most examples of such dependencies arise out of poor E-R diagram design. In
the above example, if we did the E-R diagram correctly, we would have created a
department entity with attribute department-address and a relationship between
employee and department. Similarly, a relationship involving two or more than two
entities many not be in a desirable normal form, since most relationships are binary,
such cases are relatively rare. (In fact, some E-R diagram variants actually make it
difficult or impossible to specify non-binary relations.).
Functional dependencies can help us detect poor E-R design. If the generated
relations are not in desired normal form, the problem can be fixed in the E-R
diagram. That is normalization can be done formally as part of data modeling.
Alternatively, normalization can be left to the designer’s intuition during E-R
modeling and can be done formally on the relations generated from the E-R model.
De-normalization for Performance
Occasionally database designers choose a schema that has redundant
information; that is, it is not normalized. They use the redundancy to improve
performance for specific applications. The penalty paid for not using a normalized
schema is the extra work in terms of coding time and execution time) to keep
redundant data consistent.
For instance, suppose that the name of an account holder has to be displayed
along with the account number and balance every time the account is accessed. In
our normalized schema, this requires a join of account with depositor.
One alternative to computing the join on the fly is to store a relation containing all
the attribute of account and depositor. This makes displaying the account information
faster. However the balance information for an account is repeated for every person who
owns the account and all copies must be updated by the application, when ever the
account balance is updated. The process of taking a normalized schema and making it

124
BCA Sem-4 PAPER: BCAB2204T
non0normalized is called de-normalization, and designers use it to tune performance of
systems to support time-critical operations.
A better alternative, supported by many database systems today, is to use the
normalized schema, and additionally store the join or account and depositor as a
materialized view. (Recall that a materialized view is a view whose result is stored in
the database, and brought up to date when the relations used in the view are
updated.) Like de-normalization, using materialized view does have space and time
overheads; however, it has the advantage that keeping the view up to date is the job
of the database system, not the application programmer.
Other Design Issues
There are some aspects of database design that are not addressed by
normalization and can thus lead to bad database design. We give examples here
obviously, such designs should be avoided.
Consider a company database, where we want to store earnings of companies
in different years. A relation earnings (company-id, year, amount) could be used to
store the earnings information. The only functional dependency on this relation is
company-id, year→ amount, and the relation is in BCNF.
An alternative design is to use multiple relations, each storing the earnings for a
different year. Let us say the years of interest are 2000, 2001 and 2002; we would then
have relations of the form earnings-2000, earnings-2001, and earnings-2002, all of which
are on the schema (company-id, earnings). The only functional dependency here on each
relation would be company-id→ earnings so these relations are also in BCNF.
However this alternative design is clearly a bad idea—we would have to create a
new relation every year, and would also have to write new queries every year, to take each
new relation into account. Queries would also be more complicated since they may have to
refer to many relations.
Yet another way of representing the same data is to have a single relation
company-year (company-id, earnings-2000, earnings-2001, earnings-2002). Here the
only functional dependencies are from company-id to the other attributes, and again
the relation is in BCNF. This design is also a bad idea since it has problems similar to
the previous every year. Queries would also be more complicated, since they may
have to refer many attributes.
Representations such as those in the company-year relation with one column
for each value on an attribute, are called crosstab; they are widely used in
spreadsheets and reports and in data analysis tolls. While such representations are
useful for display to users, for the reasons just given, they are not desirable in a
database design SQL extensions have been proposed to convert data from a normal
relational representation to a crosstab, for display.
1.8.11 Summary
Normalization is a design technique that is widely used as a guide in designing
relational databases. It is a process of decomposing a relation into relation(s) with fewer
attributes by minimizing the redundancy of data and minimizing insertion, deletion and
updation anomalies. It may be defined as step by step reversible process of transforming an
unnormalized relation into relations with progressively simpler structures. The relation is in

125
BCA Sem-4 PAPER: BCAB2204T
first normal form if all the attribute values are atomic and non decomposable. A relation is in
2NF if it is in 1 NF and non key attributes should be fully functionally dependent on the
primary key. A relation is in 3 NF if it is 2 NF and non key attributes should not be transitively
functionally dependent on Primary key. A relation is in BCNF if and only if every determinant
is a candidate key. A relation is 4 NF if it is in BCNF and it contains no multivalued
dependencies. And finaly a relation is in 5NF or Project Join Normal form if it cannot have a
lossless decomposition into any number of smaller tables.
1.8.12 Keywords
Multi-valued dependencies: is a concept in relational database theory that extends
the idea of functional dependencies. While functional dependencies express
relationships between attributes within a single tuple (row), multi-valued
dependencies describe relationships between sets of attributes across multiple
tuples.
Lossless Decomposition: Lossless decomposition is a property in database
normalization, specifically in the context of decomposing a relation (table) into
multiple smaller relations while preserving the ability to reconstruct the original
relation through a join operation.
1.8.13 Short Answer Type Questions
Q1. Does every relation having two attributes satisfy Boye Codd Normal form? If
Yes, justify your answer giving suitable example.
Q2. What do you mean by Normalization? Why there is a need for normalization?
Q3. Define Join Dependency with example.
Q4. Define Multi-valued dependency giving example.
1.8.14 Long Answer Type Questions
Q1. Explain First, Second and third Normal Forms with the help of examples.
Q2. Explain Boyce-Codd Normal Form with example. How it is different from
3rd Normal form.
Q3. Explain Fourth Normal form with example.
Q4. Explain fifth Normal form using Join Dependency using suitable example.
1.8.15 Suggested Readings
• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.

126
BCA SEM-4 PAPER: BCAB2204T
Relational Database Management Systems

LESSON NO. : 1.9 AUTHOR : Dr. VISHAL GOYAL

VETTED BY: MRS. NEHA SOOD
Last Updated March, 2024

DATA BASE INEGRITY AND RECOVERY

1.9.0 Objectives
1.9.1 Introduction
1.9.2 Database Integrity
1.9.3 Domain Constraints
1.9.4 Referential Integrity
[Link] Referential Integrity and ER Model
[Link] Database Modification
[Link] Referential Integrity in SQL
1.9.5 Assertions
1.9.6 Database Recovery
1.9.7 ACID Properties
1.9.8 System Recovery
1.9.9 Summary
1.9.10 Keywords
1.9.11 Short Answer Type Questions
1.9.12 Long Answer Type Questions
1.9.13 Suggested Readings

1.9.0 Objectives
After completing this lesson, you will be able to:
• Understand database Integrity
• Understand database recovery

1.9.1 Introduction
After completing the database we need to take measures for protecting the database.
For protecting the database we have to take care of database integrity and in the
coming section we will study the various methods for maintaining the integrity of the
database. Database Protection also includes data recovery that means if database get
corrupted due to some reasons like Hard Disk failure or other reasons – how to
recover the database.

1.9.2 Database Integrity

127
BCA Sem-4 PAPER: BCAB2204T
The term integrity refers to the correctness or accuracy of data in database. Integrity
constraints ensure that changes made to the database by authorized users do not result
in a loss of data consistency. Thus integrity constraints guard against accidental damage
to the database. We have already seen two forms of integrity constraints: key
declarations – the stipulation that certain attributes form a candidate key for a given
entity set.
• Form of a relationship- many to many, one to many, one to one. In
general, an integrity constraint can be an arbitrary predicate pertaining to
the database. However arbitrary predicates may be costly to test. Thus we
concentrate on integrity constraints that can be tested with minimal
overhead. In addition to protecting against accidental introduction of
inconsistency, the data stored in the database needs to be protected from
unauthorized access and malicious destruction or alteration.

1.9.3 Domain Constraints

Domain Constraints state that a range of possible values must be associated with
every attribute. There are a number of standard domain types, such as integer types,
character types and date/time types defined in SQL. Declaring an attribute to be of a
particular domain acts as a constraint on the values that it can take. Domain
constraints are the most elementary form of integrity constraint. They are tested
easily by the system whenever a new data item is entered into the database.
It is possible for several attributes to have the same domain. For example the
attribute customer-name and employee-name might have the same domain: the set of
all person names. However, the domains of balance and branch-name certainly ought
to be distinct. It is perhaps less clear whether customer-name and branch-name
should have the same domain. At the implementation level both customer names and
branch names are character strings. However we would normally not consider the
query “Find all customers who have the same name as a branch” to be a meaningful
query. Thus if we view the database at the conceptual, rather than the physical level,
customer-name and branch-name should have distinct domains.
From the above discussion, we can see that a proper definition of domain
constraint not only allows us to test values inserted in the database, but also permits
us to test queries to ensure that the comparisons made make sense. The principle
behind attribute domains is similar to that typing of variables in programming
languages. Strongly typed programming languages allow the compiler to check the
program in greater detail.
The create domain clause can be used to define new domains. For example
the statements:
create domain Dollars numeric (12,2)
create domain Pounds numeric (12,2)
Define the domains Dollars and Pounds to be decimal numbers with a total of 12
digits, two of which are placed after the decimal point. An attempt to assign a value
of type Dollars to a variable of type Ponds would result in a syntax error, although
both are of the same numeric type. Such an assignment is likely to be due to

128
BCA Sem-4 PAPER: BCAB2204T
programmer error, where the programmer forgot about the differences in currency.
Declaring different domains for different currencies helps catch such errors.
Values of one domain can be cast (that is, converted) to another domain. If the
attribute A in relation r is of type Dollars, we can convert it to Pounds by writing
cast r.A as Pounds
In a real application we would of course multiply r.A by a currency conversion
facts before casting it to pounds. SQL also provides drop domain and after domain
clauses to drop or modify or modify domains that have been created earlier.
The check clause in SQL permits domains to be restricted in powerful ways
that most programming language type systems do not permit. Specifically the check
clause permits the schema designer to specify a predicate that must be satisfied by
any value assigned to a variable whose type is the domain. For instance a check
clause can ensure that an hourly wage domain allows only values greater than a
specified value (such as the minimum wage):
create domain HourlyWage numeric(5,2)
constraint wage-value-test check (value >=4.00)
The domain HourlyWage has constraint that ensures that the hourly wage is greater
than 4.00. The clause constraint wage-value-test is optional, and is used to give the name
wage-value-test to the constraint. The name is used to indicate which constraint an update
violated.
The check can also be used to restrict a domain to not contain any null
values:
create domain AccountNumber char(10)
constraint account-number-test check (value not null)
Another example, the domain can be restricted to contain only a specified set
of values by using the in clause:
create domain Account type char(10)
constraint account-type-test
check (value in (‘Checking’, ‘Saving’))
The preceding check conditions can be tested quite easily when a tuple is
inserted or modified. However in general the check conditions can be more complex
(and harder to check), since sub queries that refer to other relations are permitted in
the check condition. For example this constraint could be specified on the relation
deposit.
check (branch-name in (select branch-name from branch))
The check condition verifies that the branch-name in each tuple in the deposit
relation is actually the name of a branch in the branch relation. Thus the condition has to
be checked not only when a tuple is inserted or modified in deposit but also when the
relation branch changes (in this case, when a tuple is deleted or modified in relation
branch).
The preceding constraint is actually an example of a class of constraints called
referential-integrity constraints.
Complex check conditions can be useful when we want to ensure integrity of
data but we should use them with care, since they may be costly to test.
1.9.4 Referential Integrity

129
BCA Sem-4 PAPER: BCAB2204T
Often, we wish to ensure that a value that appears in one relation for a given
set of attributes also appears for a certain set of attributes in another relation. This
condition is called referential integrity
Basic Concepts
Consider a pair of relations r(R) and s(S) and the natural join r  s. There may
be a tuple tr in r that does not join with any tuple in s. That is, there is no ts in s
such that tr [R ∩ S ] = ts [R ∩ S]. Such tuples are called dangling tuples. Depending on
the entity set or relationship set being modeled dangling tuples may or may not be
acceptable.
Suppose there is a tuple t1 in the account relation with t1[branch-name] =
"Lunartown" but there is no tuple in the branch relation for the “Lunartown” branch.
This situation would be undersirable. We expect the branch relation to list all bank
branches. Therefore tuple t1 would refer to an account at a branch that does not
exist. Clearly we would like to have an integrity constraint that prohibits dangling
tuples of this sort.
Not all instances of dangling tuples are undersirable however. Assume that there is a
tuple t2 in the branch relation with t2 [branch-name] = “Mokan” but there is no tuple in the
account relation for the Mokan branch. In this case a branch exists that has no accounts.
Although this situation is not common it may arise when a branch is opened or its about to
close. Thus we do not want to prohibit this situation.
The distinction between these two examples arises from two facts :
• The attribute branch-name in Account schema is a foreign key
referencing the primary key of Branch schema.
• The attribute branch name in Branch schema is not a foreign key. (Recall
that a foreign key is a set attribute in a relation schema that forms a
primary key for another schema.)
In the Lunartown example, tuple t1 in account has a value on the foreign key branch-
name that does not appear in branch. In the Mokan-branch example tuple t2 in branch has a
value on branch-name that does not appear in account, but branch-name is not a foreign key.
Thus the distinction between our two examples of dangling tuples is the presence of a foreign
key.
Let r1 (R1) and r2 (R2) be relations with primary keys K1 and K2 respectively. We
say that a subset α of R2 is a foreign key referencing K1 in relation r1 if it is required
that for every t2 in r2 there must be a tuple t1 in r1 such that t1 [K1] = t2 [α].
Requirements of this form are called referential integrity constraints or subset
dependencies. The latter term arises because the preceding referential-integrity
constraint can be written as Πα (r2) ≤ ΠK1(r1). Note that for a referential-integrity
constraint to make sense either must be equal to K1 or α and K1 must be compatible
sets of attributes.

130
BCA Sem-4 PAPER: BCAB2204T

Self-Check Exercise-I
Q1. What are dangling tuples?
Ans…................................................................................................................
…......................................................................................................................
…......................................................................................................................
Q2. What is check clause?
Ans…................................................................................................................
…......................................................................................................................
…......................................................................................................................

[Link] Referential Integrity and the E-R Model

Referential-integrity constraints arise frequently. If we derive our relational-
database schema by constructing tables from E-R diagrams, then every relation
arising from a relationship set has referential-integrity constraints. Figure below
shows an n-ary relationship set R, relating entity sets E1, E2, … ,En. Let Ki denote the
primary key of Ei. The attributes of the relation schema for relationship set R include
K1 U K2 U ... U Kn. The following referential integrity constraints are then present: For
each i, Ki in the schema for R is a foreign key referential Ki in the relation schema
generated from entity set Ei.

R
.
.
.
En-1

An n-ary relationship set

Another source of referential-integrity constraints is weak entity sets. Recall
that the relation schema for a weak entity set must include the primary key of the
entity set on which the weak entity set depends. Thus the relation schema for each
weak entity set includes a foreign key that leads to a referential integrity constraint.
Database Modification
Database modifications can cause violations of referential integrity. We list
here the test that we must make each type of database modification to preserve the
following referential integrity constraint:
Πα (r2) ≤ Πk (r1)

131
BCA Sem-4 PAPER: BCAB2204T
• Insert. If a tuple t2 is inserted into r2, the system must ensure that
there is a tuple t1 in r1 such that t1 [K] = t2[α] :
t2[α] ε Πk (r1)
• Delete. If a tuple t1 is deleted from r1 the system must compute the set
of tuples in r2 that reference t1:
σα = t1[k] (r2)
If this set is not empty, either the delete command is rejected as an error, or
the tuples that reference t1 must themselves be deleted. The latter solution may lead
to cascading deletions, since tuples may reference tuples that reference t1 and so on.
Update : We must consider two cases for update: updates to the referencing relation
and updates to the referenced relation (r1).
❑ If a tuple t2 is updated in relation r2 and the update modifies values for
the foreign key then a test similar to the insert case is made. Let t2'
denote the new value of tuple t2. The system must ensure that
t2'[a] ε Πk (r1)
❑ If a tuple t1 is updated in r1 and the update modifies values for the
primary key (K), then a test similar to the delete case is made. The
system must compute.
σα =t1[K] (r2)
Using the old value of t1 (the value before the update is applied). If this set is not
empty, the update is rejected as an error or the update is cascaded in a manner
similar to delete.

[Link] Referential Integrity in SQL

Foreign keys can be specified as parts of the SQL create table statement by
using the foreign key clause. We illustrate foreign-key declarations by using the SQL
DDL definition of part of our bank database shown in Figure K.
By default a foreign key references the primary key attributes of the referenced
table. SQL also supports a version of the references clause where a list of attributes
of the referenced relation can be specified explicitly. The specified list of attributes
must be declared as a candidate key of the referenced relation
We can use the following short form as part of an attribute definition to
declare that the attribute forms a foreign key:
branch-name char(15) references branch
When a referential-integrity constraint is violated, the normal procedure is to reject
the action that caused the violation. However a foreign key clause can specify that if a delete
or update action on the referenced relation violates the constraint, then instead of rejecting
the action, the system must take steps to change the tuple in the constraint on the relation
account:
create table account
(…
Foreign key (branch-name) references branch
on delete cascade
on update cascade,
…)

132
BCA Sem-4 PAPER: BCAB2204T
Because of the clause on delete cascade associated with the foreign key
declaration if a delete of a tuple in branch results in this referential integrity
constraint being violated the system does not reject the delete. Instead the delete
"cascades" to the account relation, deleting the tuple that refer to the branch tuple
that was deleted. Similarly, the system does not reject an update to a field referenced
by the constraint even if it violates the constraint; instead the system updates the
field branch-name of the referencing tuples in account to the new value as well. SQL
also allows the foreign key clause to specify actions other than cascade, if the
constraint is violated. The referencing field (here, branch-name) can be set to null (by
using set null in place of cascade), or to the default value for the domain (by using set
default).
If there is a chain of foreign key dependencies across multiple relations, a deletion or
update at one end of the chain can propagate across the entire chain. An interesting
case where the foreign key constraint on a relation references the same relation
appears in Exercise. If a cascading update or delete cause a constraint violation that
cannot be handled by a further cascading operation, the system aborts the
transaction. As a result, all the changes caused by the transaction and its cascading
actions are undone.
create table customer
(customer-name char(20)
customer-street char(30)
customer-city char(30)
primary key (customer-name))
create table branch
(branch-name char(15)
branch-street char(30)
assets integer
primary key (branch-name)
check (assets >=0))
create table account
(account-number char(10)
branch-name char(15)
balance integer
primary key (account-number)
foreign key (branch-name) references branch,
check (balance >=0))
create table depositor
(customer-name char(20)
account-number char(10)
primary key (customer-name, account-number)
foreign key (customer-name) references customer,
foreign key (account-number) references account)

Figure K: SQL data definition for part of the bank database

133
BCA Sem-4 PAPER: BCAB2204T
Null values complicate the semantics of referential integrity constraint in SQL.
Attributes of foreign keys are allowed to be null, provided that they have not other
wise been declared to be non-null. If all the columns of a foreign key are non-null in a
given tuple, the usual definition of foreign key constraint is used for that tuple. If any
of the foreign key columns is null, the tuple is defined automatically to satisfy the
constraint.
This definition may not always be the right choice, so SQL also provides
constructs that allow you to change the behavior with null values, we do not discuss
the constructs here. To avoid such complexity, it is best to ensure that all columns of
a foreign key specification are declared to be non-null.
Transactions may consist of several steps, and integrity may be violated
temporarily after one step but a later step may remove the violation. For instance,
suppose we have a relation-married person with primary key-name, and an attribute
spouse, and suppose that spouse is a foreign key on married person. That is the
constraint says that the spouse attribute must contain a name that is present in the
person table. Suppose we wish to note the fact that John and Mary are married to
each other by inserting two tuples one for John and one for Mary in the above
relation. The insertion of the first tuple violate the foreign key constraint, regardless
of which of the two tuples is inserted first. After the second is inserted the foreign key
constraint would hold again.
To handle such situations integrity constraints are checked at the end of a
transaction and not at intermediate steps.
1.9.5 Assertions
An Assertion is a predicate expressing a conditions that we wish the database
always to satisfy. Domain constraint and referential integrity constraints are special
forms of assertions. We have paid substantial attention to these forms of assertions
because they are easily tested and apply to a wide range of database applications.
However there are many constraints that we cannot express by using only these
special forms. Two examples of such constraints are:
• The sum of all loan amounts for each branch must be less than the
sum of all account balances at the branch.
• Every has at least one customer who maintains an account with a
minimum balance of $1000.00.
An assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
Here is how the two examples of constraints can be written. Since SQL does
not provide a "for all X P (X)" construct (where P is a predicate) we are forced to
implement the construct by an equivalent "not exists X" such that not P(X) construct
which can be written in SQL. We write
create assertion sum-constraint check
(not exists (select * from branch
where (select sum (amount) from loan
where loan. branch-name = [Link]-name)
>=(select sum (balance) from account
where [Link]-name = branch,branch-name)))

134
BCA Sem-4 PAPER: BCAB2204T

create assertion balance-constraint check

(not exists (select * from loan
where not exists(select *
from borrower, depositor, account
where [Link]-number = [Link]-number
and [Link]-name = [Link]-name
and [Link]-number = [Link]-number
and [Link] >= 1000)))
When an assertion is created, the system tests it for validity. If the assertion is
valid, then any future modification to the database is allowed only if it does not cause
that assertion to be violated. This testing may introduce a significant amount of
overhead if complex assertions have been made. Hence, assertions should be used
with great care. The high overhead of testing and maintaining assertions has led
some system developers to omit support for general assertions, or to provide
specialized forms of assertions that are easier to test.
1.9.6 Database Recovery
Recovery in database system means, primarily, recovering the database itself:
that is, restoring the database to a state that is known to be correct (or rather,
consistent) after some failure has rendered the current state inconsistent.
• Transactions
We begin our discussions by examining the fundamental notion of a
transaction. A transaction is a logical unit of work. Consider the following example.
Suppose the parts relation P includes an additional attribute TOTQTY, representing
the total shipment quantity for the part in question; in other words, the value of
TOTQTY for any given part is supposed to be equal to the sum of all QTY values,
taken over all shipments for that part. Now consider the pseudocode procedure
shown in Figure below, the intent of which is to add a new shipment for supplier S5
and part P1, with quantity 1000, to the database (the INSERT inserts the new
shipment, the UPDATE updates the TOTQTY value for part P1 accordingly).

135
BCA Sem-4 PAPER: BCAB2204T

BEGIN TRANSACTION;

INSERT INTO SP
RELATION { TUPLE {S# S# (‘S5’),
P# P# (‘P1’),
QTY QYT (1000) } } ;
IF any error occurred THEN GO TO UNDO; END IF;

UPDATE P WHERE P# = P# (‘P1’)

TOTQTY := TOTQTY + QTY (1000);
IF any error occurred THEN GO TO UNDO; END IF;

COMMIT;
GO TO FINISH;

UNDO:
ROLLBACK;

FINISH:
RETURN;
The point of the example is that what is presumably intended to be a single
atomic operation- “add a new shipment”- in fact involves two updates to the
database, one INSERT operation and one UPDATE operation. What is more, the
database is not even consistent between those two updates; it temporarily violates
the constraint that the value of TOTQTY for part P1 is supposed to be equal to the
sum of all QTY values for part P1. Thus a logical unit of work (i.e., a transaction) is
not necessarily just a single database operation; rather, it is a sequence of several
such operations, in general that transforms a consistent state of the database into
another consistent state, without necessarily preserving consistency at all
intermediate points.
Now, it is clear that what must not be allowed to happen in the example is for
one of the updates to be executed and the other not, because that would leave the
database in an inconsistent state. Ideally of course we would like a cast iron
guarantee that both updates will be executed. Unfortunately, it is impossible to
provide such a guarantee-there is always a chance that things will go wrong, and go
wrong moreover at the worst possible moment. For example, a system crash occur
between the INSERT and the UPDATE, or an arithmetic overflow might occur on the
UPDATE, etc. But a system that support transaction management does provide the
next best thing to such a guarantee. Specifically, it guarantees that if the transaction
reaches some updates and then a failure occurs (for whatever reason) before the
transaction reaches its planned termination, then those updates will be undone.
Thus the transaction either executes in its entirety or is totally canceled i.e. made as
if it never executed at all. In this way, a sequence of operations that is fundamentally
not atomic can be made to look as if it were atomic from an external point of view.
The system component that provides this atomicity- or resemblance of
atomicity- is known as the transaction manager (also known as the transaction

136
BCA Sem-4 PAPER: BCAB2204T
processing monitor or TP monitor ) and the COMMIT and ROLLBACK operations are
the keep to the way it works;
• The COMMIT operation signals successful end of transaction; it tells
the transaction manager that a logical unit of work has been
successfully completed and the database is (or should be) in a
consistent state again and all of the updates made by that unit of work
can now be committed or made permanent.
• By contrast the ROLLBACK operation signals unsuccessful end of
transaction; it tells the transaction manager that something has gone
wrong. The database might be in an inconsistent state and all of the
updates made by the logical unit of work so far must be rolled back or
undone.
In the example therefore we issue a COMMIT if we get through the two
updates successfully which will commit the changes in the data base and make them
permanent. If any thing goes wrong however- i.e., if either of the updates raises an
error condition- then we issue a ROLLBACK instead to undo any changes made so
far. Note: Even if we issue a commit instead the system should in principal check the
database integrity constraint. It detects the fact that the database is inconsistent and
force a ROLLBACK any way. However we don’t assume that the system is aware of all
pertinent constraint and so the users issued ROLLBACK is necessary. Commercial
DBMSs do not do very much COMMIT time integrity checking at the time of writing.
Incidentally we should point out that a realistic application will not only
update the database (or attempt to) but will also send some kind of message back to
the end user indicating what has happened. In the example we might send the
message shipment added if the COMMIT is reached or the message error shipment
not added otherwise. Message handling in turn has additional implications for
recovery.
Note: At this juncture you might be wondering how it is possible to undo and
update. The answer of course is that the system maintains a log or journal on tape or
(more commonly) disk on which details of all updates- in particular before and
images of the updated objects- are recorded. Thus, if it becomes necessary to undo
some particular update the system can use the corresponding log entry to restore the
updated object to its previous value.
(Actually the fore going paragraph is somewhat over simplified . In practice
the log will consist of two portions an active or online portion and an archive or
offline portion. The online portion is used during normal system operation to record
details of updates as they are performed and is normally held on disk. When the
online portion becomes full its contents are transferred to the offline portion which-
because it is always processed sequentially- can be held on the tape.
One further point; the system must guarantee that individual statements are
themselves atomic (all or nothing). This consideration becomes particularly
significant in relational system, where statements are set-level and typically operate
on many tuples at a time; it must not be possible for such a statement to fail in the
middle and leave the database in an inconsistent state (e.g. with some tuples update

137
BCA Sem-4 PAPER: BCAB2204T
and some not). In other words if an error does occur in the middle of such a
statement, then the database must remain totally unchanged.
• Transaction Recovery
A transaction begins with successful execution of a BEGIN TRANSACTION
statement and it ends with successful execution of either COMMIT or a ROLLBACK
statement. COMMIT establishes what is called, among many other things, a commit point
(also especially in commercial products-known as a synch point). A commit point thus
corresponds to the end of a logical unit of work, and hence to a point at which the
database is or should be in a consistent state. ROLLBACK, by constraint rolls the
database back to the state it was in at BEGIN TRANSACTION which effectively means
back to the previous commit point. (The phrase “the previous commit point” is still
accurate, even in the case of the first transaction in the program, if we agree to think of he
first BEGIN TRANSACTION in the program as tacitly establishing an initial “ commit
point”.
Note: Throughput this section the term “database” really means just that portion of
the database being accessed by the transaction under consideration. Other
transactions might be executing in parallel with that transaction and making
changes to their own portions, and so “the total database” might not be in a fully
consistent state at a commit point. However we are ignoring the possibility does not
materially affect the issue at hand, of course.
When a commit point is established:
 All updates made by the executing program since the previous commit
points are committed; that is, they are made permanent. Prior to the
commit point, all such updates should be regarded as tentative only—
tentative in the sense that they might subsequently be undone (i.e.
rolled back). Once committed an update is guaranteed never to be
undone (this is the definition of “ committed”).
 All database positioning is lost and all tuple locks are released.
“Database poisoning” here refers to the idea that at any given time an
executing program will typically have address ability to certain tuples
(e.g., via certain cursors in the case of SQL, this address ability is lost
at a commit point. “Tuple locks” are explained in the next chapter. Note
some systems do provide an option by which the program in fact might
be able to retain address ability to certain tuples (and therefore retain
certain tuple locks) from one transaction to the next.
Paragraph 2 here – excluding the remark about possibly retaining some
address ability and hence possibly retaining certain tuple locks—also applies if a
transaction terminates with ROLLBACK instead of COMMIT. Paragraph 1 of course
does not.

138
BCA Sem-4 PAPER: BCAB2204T

Self-Check Exercise-II
Q3. What is transaction recovery?
Ans…..................................................................................................................
…........................................................................................................................
…........................................................................................................................
Q2. What is the difference between Rollback and commit?
Ans…..................................................................................................................
…........................................................................................................................
…........................................................................................................................

1st TRANSACTION

Program BEGIN COMMIT

Initiation TRANSACTION

2nd TRANSACTION (cancelled)

BEGIN ROLLBAC
TRANSACTION K

3rd TRANSACTION

BEGIN COMMIT Program

TRANSACTION Termination

Program execution is a sequence of transactions

Note carefully that COMMIT and ROLLBACK terminate the transaction, not
the program. In general a single program execution will consist of a sequence of
several transactions running one after another, as illustrated in Figure above. Now let
us return to the example of the previous section. In that example we include explicit
tests for errors, and issued an explicit ROLLBACK if any error was detected. But of
course the system cannot assume that application programs will always include
explicit tests for all possible errors. Therefore the system will issue an implicit
ROLLBACK for any transaction that fails for any reason to reach its planned
termination (where “planned termination” means either an explicit COMMIT or an
explicit ROLLBACK).
We can now see therefore, that transactions are not only the unit of works but
also the unit of recovery. For if a transaction successfully commits, then the system
will guarantee that its updates will be permanently installed in the database, even if
the system crashed the very next moment. It is quite possible, for instance, that the

139
BCA Sem-4 PAPER: BCAB2204T
system might crash after the COMMIT has been honored but before the updates have
been physically written to the database- they might still be waiting in a main memory
buffer and so be lost at the time of crash. Even if that happens the system’s restart
procedure will still install those updates in the database; it is able to discover the
values to be written by examine the relevant entries in the log. (it follows that the log
must be physically written before COMMIT processing can complete- the write ahead
log rule.) Thus the restart procedure will recover any transactions that completed
successfully but did not manage to get their updates physically written prior to the
crash; hence as stated earlier transaction are indeed the unit of recovery.
Note: In the next chapter we will see there is a unit on concurrency also.
Further since they are supposed to transform a consistent state of the database in to
another consistent state they can also be regarded as a unit of integrity.
1.9.7 The ACID Properties
Transactions have four important properties- atomicity, consistency, isolation and
durability (referred to colloquially as “the ACID properties”)
• Atomicity: Transaction are atomic (all or nothing)
• Consistency: Transaction preserves database consistency. That is a
transaction transforms a consistent state of the database in to another
consistent state without necessarily preserving consistency at all
intermediate points.
• Isolation: Transactions are isolated from one another. That is even
though in general there will be many transactions running
concurrently at any given transaction updates are concealed from all
the rest until that transaction commits. Another way of seeing the
same thing of that for any two distinct transactions T1 and T2, T1
might see T2’s updates (after T2 has committed) or T2 might see T1’s
updates (after T1 has committed ) but certainly not both.
• Durability: Once a transaction commits it updates survive in a
database even if there is subsequent system crash.
1.9.8 System Recovery
The system must be prepared to recover not only from purely local failures
such as occurrence of an over flow condition with in an individual transaction but
also from “Global” failures such as power outage. A local failure by definition effects
only the transaction in which the failure has actually occurred. A global failure, by
contrast, affects all of the transactions in progress at the time of failure and hence
has significant system wide implications. In this section and the next we briefly
consider what is involved in recovering from a global failure. Such failures fall in to
two categories :
• System failures: (e.g., power outage), which effect all transactions, currently
in progress but do not physically damage the database. A system failure is
sometimes called a soft crash.
• Media failures: (e.g. head crash on disk), which do cause damage to
the database or to some portion of it and effect at least those
transactions currently using that portion. A media failure is sometimes
called a hard crash.

140
BCA Sem-4 PAPER: BCAB2204T
The key point regarding system failure is that the contents of main memory
are lost (in particular the database buffers are lost). The precise state of any
transaction can therefore never been successfully completed and so must be undone-
i.e. rolled back- when the system restarts.
Further more it might also be necessary to re do certain transactions at restart
time that did successfully complete prior to the crash but did not manage to get their
updates transferred from the database buffers to the physical database.
The obvious question therefore arises; how does the system know at restart
time which transactions to undo and which to redo? The answer is as follows, at
certain prescribed intervals typically whenever some prescribed numbers of entries
have been written to the log- the system automatically take a check point. Taking a
check point involves (a.) Physically writing “(force writing”) the content of the
database buffers out to the physical database and (b) physically writing a special
check point record out to the physical log. The check point record gives a list of all
transactions that were in progress at the time the check point was taken. To see how
this information is used consider the following Figure which is read as follows(note
that time in the fig. Flows from left to right)
• A system failure has occurred at time tf.
• The most recent check point prior to time tf was taken at a time tc.
• Transaction of type t1 completed prior to time tc.
• Transaction of type T2 started prior to time tc and completed after time
tc and before time tf.
• Transaction of type T3 also started prior to time tc but did not complete
by time tf.
• Transaction of type T4 started after time tc and completed before time
tf.
• Finally transaction of type T5 also started after time tc but did not
complete by time tf.

Time tc tf
T
r T1
a
n T2
s
a
T3
c
t
i T4
o
n T5
s

Checkpoint System Failure

(time tc) (time tf)

Five transaction categories

141
BCA Sem-4 PAPER: BCAB2204T
It should be clear that when the system is restarted transaction of type T3 and
T5 must be undone, and transaction of types T2 and T4 must be redone. Note
however that transactions of type T1 do not enter in to the restart process at all
because its updates were forced to the database at time tc as part of the check point
process. Note two that transaction that completed unsuccessfully (i.e. with the
rollback) before time tf also do not enter into the restart process at all(why not?).
At restart time therefore the system first goes through the following procedure
in ordered to identify all transaction of types T2 to T5 :
 Start with two lists of transactions the undo list and the redo list. Set the
undo list equal to the list of all transactions given in the most recent check
point record; set the redo list is empty.
 Search forward through the log starting from the check point record.
 If a BEGIN TRANSACTION log entry is found for transaction T add T to
the undo list.
 If COMMIT log entries found for transaction T move T from the UNDO
list to the REDO list.
 When the end of log is reached the UNDO and REDO list, identify
respectively transactions of types T3 and T5 and transaction of types
T2 and T4.
The system now works backward through the log undoing the transactions in
the UNDO list; then it works forward again redoing in the transaction in the REDO
list. Note: Restoring the database to consistent state by undoing work is some times
called backward recovery. Similarly restoring it to a consistent state by redoing work
is some times called forward recovery.
Finally when all such recovery activities are complete, then (and only then) the
system is ready to accept new work.
• Media Recovery
A media failure is a failure such as a disk head crash or a disk controller
failure in which some portion of a database has been physically destroyed. A recovery
from such a failure basically involves reloading (or restoring) the database from a
backup copy (or dump) and then using the log; both active and achieve portions in
general – to redo all transactions that completed since that backup copy was taken.
There is no need to undo transactions that were still in progress at the time of the
failure since by definition all updates of such transactions have been undone
(actually lost) any way.
The need to be able to perform media recovery implies the need for a
dump/restore (or unload/reload) utility. The dump portion of that utility is used to
make backup copies of the database on demand. (such copy can be kept on tape or
other archival storage; it is not necessary that they be on direct access media ). After
a media failure the restore portion of the utility is used to recreate the database from
a specified backup copy.
1.9.9 Summary
The term integrity refers to the correctness or accuracy of data in database.
Integrity constraints ensure that changes made to the database by authorized users do not
result in a loss of data consistency. In general an integrity constraint can be an arbitrary

142
BCA Sem-4 PAPER: BCAB2204T
predicate pertaining to the database. Domain constraints are the most elementary form of
integrity constraint. Often, we wish to ensure that a value that appears in one relation for
a given set of attributes also appears for a certain set of attributes in another relation. This
condition is called referential integrity. Recovery in database system means, primarily,
recovering the database itself: that is, restoring the database to a state that is known to be
correct (or rather, consistent) after some failure has rendered the current state
inconsistent. Transactions have four important properties- atomicity, consistency, isolation
and durability. The system must be prepared to recover not only from purely local failures
such as occurrence of and over flow condition with in an individual transaction but also
from “Global” failures such as power outage. A media failure is a failure such as a disk
head crash or a disk controller failure in which some portion of a database has been
physically destroyed.
1.9.10 Keywords
ACID: Transaction properties Atomicity, Consistency, Isolation and Durability.
Transaction: It refers to a logical unit of work that is performed on a database.
Domain: A universe of discourse which defines the data in the problem
1.9.11 Short Answer Type Questions:
Q1 What do you understand by data integrity? Explain various types of
integrity constraints along with suitable example.
Q2. What do you understand by database recovery? Explain various types of
recovery techniques.
Q3. What do you understand by a transaction? Explain the ACID properties of
transactions.
1.9.12 Long Answer Type Questions:
Q1. What is refrential integrity?
Q2. Consider the schema:
employee(employee-name, street, city)
works(employee-name, company-name, salary)
company(company-name, city)
manages(employee-name, manager-name)
Give an SQL DDL definition for the tables of this database. Identify referential-
integrity constraints that should hold and include them in the DDL definition
1.9.13 Suggested Readings:
• Bipin C. Desai, An introduction to Database System, Galgotia
Publication, New Delhi.
• C. J. Date, An introduction to database Systems, Sixth Edition, Addison
Wesley.
• Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database
Systems, Addison Wesley.

Last Updated March, 2024

143
BCA Sem-4 PAPER: BCAB2204T

Mandatory Student Feedback Form

[Link]

Note: Students, kindly click this google form link, and

fill this feedback form once.

144

III Bscfinaldbms
No ratings yet
III Bscfinaldbms
84 pages
Unit-1 & 2 & 3
No ratings yet
Unit-1 & 2 & 3
249 pages
Dbms Notes
No ratings yet
Dbms Notes
202 pages
Veerachary CBCS DBMS IV SEM-watermark
No ratings yet
Veerachary CBCS DBMS IV SEM-watermark
127 pages
DBMS Introduction
No ratings yet
DBMS Introduction
66 pages
DBMS
No ratings yet
DBMS
120 pages
Bcom 4sem Dbms Withoracle Ok
No ratings yet
Bcom 4sem Dbms Withoracle Ok
75 pages
Chapter 1 DBMS DJSCE
No ratings yet
Chapter 1 DBMS DJSCE
27 pages
B.Tech ECE Database Systems Guide
No ratings yet
B.Tech ECE Database Systems Guide
86 pages
Database Engineering Lecture Notes
No ratings yet
Database Engineering Lecture Notes
121 pages
DBMS Lecture 1 File System
No ratings yet
DBMS Lecture 1 File System
37 pages
USIT304 Database Management Systems
No ratings yet
USIT304 Database Management Systems
222 pages
Module 1
No ratings yet
Module 1
212 pages
Database Management System
No ratings yet
Database Management System
58 pages
DBMS Unit I
No ratings yet
DBMS Unit I
30 pages
CS8492 - DBMS - 1
No ratings yet
CS8492 - DBMS - 1
131 pages
Week-1 LECTURE Intro To Module Database and Database Systems
No ratings yet
Week-1 LECTURE Intro To Module Database and Database Systems
72 pages
Chapter7DBMS 177274
No ratings yet
Chapter7DBMS 177274
86 pages
DB 1
No ratings yet
DB 1
72 pages
Veerachary CBCS RDBMS IV SEM-watermark
100% (3)
Veerachary CBCS RDBMS IV SEM-watermark
115 pages
DB Lec1
No ratings yet
DB Lec1
39 pages
Module 1
No ratings yet
Module 1
77 pages
BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR
No ratings yet
BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR
111 pages
DB Week 1
No ratings yet
DB Week 1
27 pages
20it007-Database Management Systems
No ratings yet
20it007-Database Management Systems
231 pages
Unit 1db
No ratings yet
Unit 1db
29 pages
Dbms Unit 1 2
No ratings yet
Dbms Unit 1 2
70 pages
DBMS & Data Structure - BCA
No ratings yet
DBMS & Data Structure - BCA
155 pages
RDBMS Notes
88% (108)
RDBMS Notes
68 pages
Intro to Database Management Systems
No ratings yet
Intro to Database Management Systems
6 pages
CSE031.Lecture - 05.introduction To Database - Fall 2019
No ratings yet
CSE031.Lecture - 05.introduction To Database - Fall 2019
29 pages
Lecture 1 - Database Concepts and File Base Approach
No ratings yet
Lecture 1 - Database Concepts and File Base Approach
41 pages
DBMS U1
No ratings yet
DBMS U1
16 pages
L01-Introduction To Database
No ratings yet
L01-Introduction To Database
66 pages
DBMS Notes for B.Tech Students
No ratings yet
DBMS Notes for B.Tech Students
135 pages
Database vs File-Based Systems Explained
No ratings yet
Database vs File-Based Systems Explained
54 pages
Dbms
No ratings yet
Dbms
42 pages
6th Sem-DBMS-Reference Up To Unit 5
No ratings yet
6th Sem-DBMS-Reference Up To Unit 5
81 pages
Unit I Introtodbms
No ratings yet
Unit I Introtodbms
160 pages
Database Fundamentals and Design Process
No ratings yet
Database Fundamentals and Design Process
69 pages
Unit 5
No ratings yet
Unit 5
42 pages
DBMS Lecture Notes for B.Tech Students
No ratings yet
DBMS Lecture Notes for B.Tech Students
122 pages
Dbms Nirali Pub
No ratings yet
Dbms Nirali Pub
117 pages
CS8492 DataBase Management Systems Question Bank Watermark
No ratings yet
CS8492 DataBase Management Systems Question Bank Watermark
213 pages
Understanding Data and Database Systems
No ratings yet
Understanding Data and Database Systems
21 pages
II BSC Dbms Volume1
No ratings yet
II BSC Dbms Volume1
35 pages
Introduction to Relational Database Management
No ratings yet
Introduction to Relational Database Management
71 pages
Rdbms Notes
No ratings yet
Rdbms Notes
71 pages
Chapter 1 Introduction To DB
No ratings yet
Chapter 1 Introduction To DB
50 pages
DBMS Unit-I
No ratings yet
DBMS Unit-I
27 pages
B.A (Computer App.) - II, Sem-IV, P-Bap203u1e
No ratings yet
B.A (Computer App.) - II, Sem-IV, P-Bap203u1e
94 pages
CIS 472 Database System
No ratings yet
CIS 472 Database System
131 pages
9946module 1 (Introduction) - 5th Semester - Computer Science and Engineering
No ratings yet
9946module 1 (Introduction) - 5th Semester - Computer Science and Engineering
25 pages
DBMS Intro 1
No ratings yet
DBMS Intro 1
44 pages
1 Chapter 1 Updated
No ratings yet
1 Chapter 1 Updated
50 pages
Chapter 4
No ratings yet
Chapter 4
6 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
13 pages
Database
No ratings yet
Database
61 pages
CS 12 Half Yearly 2024 25
No ratings yet
CS 12 Half Yearly 2024 25
10 pages
ADB Chapter 1
No ratings yet
ADB Chapter 1
26 pages
TSM For VE 7.1.1 Installation Cookbook - Linux (v2) PDF
No ratings yet
TSM For VE 7.1.1 Installation Cookbook - Linux (v2) PDF
124 pages
C Pointers for Programming Students
No ratings yet
C Pointers for Programming Students
8 pages
Social Impact of Togel Gambling in Bojonegoro
No ratings yet
Social Impact of Togel Gambling in Bojonegoro
1 page
Mobile Phone: User Manual
No ratings yet
Mobile Phone: User Manual
194 pages
Configuration TO Enterprise Structure (Fi) : Welcome
No ratings yet
Configuration TO Enterprise Structure (Fi) : Welcome
50 pages
The History of Artificial Intelligence Author Chris Smith, Brian McGuire, Ting Huang
No ratings yet
The History of Artificial Intelligence Author Chris Smith, Brian McGuire, Ting Huang
27 pages
XII Computer Ut Paper Bajoria School
No ratings yet
XII Computer Ut Paper Bajoria School
3 pages
Sap On db2 Commands: S.no Command Description
No ratings yet
Sap On db2 Commands: S.no Command Description
4 pages
Course File OS Session 2022-23
No ratings yet
Course File OS Session 2022-23
34 pages
Testing and Implementation of Siti Software
No ratings yet
Testing and Implementation of Siti Software
53 pages
Business Model of Startup
No ratings yet
Business Model of Startup
12 pages
Social Media Growth in The Philippinesin Brief: Global Data
No ratings yet
Social Media Growth in The Philippinesin Brief: Global Data
7 pages
Ee40034 2025 Dlvu
No ratings yet
Ee40034 2025 Dlvu
2 pages
Os Virtual Memory
No ratings yet
Os Virtual Memory
5 pages
Diagnostic Alarms Guide
No ratings yet
Diagnostic Alarms Guide
2 pages
AA Coding Standards and BP
No ratings yet
AA Coding Standards and BP
13 pages
OOP Inheritance Concepts Guide
No ratings yet
OOP Inheritance Concepts Guide
35 pages
PC-PRO+ Advanced 9 00 Release Notes
No ratings yet
PC-PRO+ Advanced 9 00 Release Notes
8 pages
MarkView For Oracle 6.4 Admin Guide
No ratings yet
MarkView For Oracle 6.4 Admin Guide
368 pages
Web Technologies Lab Manual New
No ratings yet
Web Technologies Lab Manual New
50 pages
Complete Per Field
No ratings yet
Complete Per Field
142 pages
8018 Operation Manual Rev CB English PDF
No ratings yet
8018 Operation Manual Rev CB English PDF
122 pages
Daikin VRV IV-i Installation and Operation Manual Eng
No ratings yet
Daikin VRV IV-i Installation and Operation Manual Eng
44 pages
Introduction To Modern Fortran: Nmm1@cam - Ac.uk
No ratings yet
Introduction To Modern Fortran: Nmm1@cam - Ac.uk
28 pages
Sap AFAB Depreciation Run Execution
No ratings yet
Sap AFAB Depreciation Run Execution
18 pages
TELEMAC Code Instructions
No ratings yet
TELEMAC Code Instructions
24 pages
Prototyping for Entrepreneurs
No ratings yet
Prototyping for Entrepreneurs
14 pages